The market for privacy-preserving technologies and privacy-enhancing computation (PEC) is multifaceted. According to Gartner’s definition, “PEC provides robust, sustainable measures to gain, pool, process or share information while data remains protected in use.” This is an umbrella term encapsulating multiple technical approaches which solve different challenges facing data and analytics leaders today.
The need for PEC solutions is clear, as organizations look to solve their unique data utilization challenges. However, it is important for data and analytics leaders to fully understand and weigh the options available today, how they can complement one another, and how their performance compares and contrasts in different scenarios.
WHAT DOES TRIPLEBLIND DO?
TripleBlind provides the most complete and scalable solution for privacy-enhancing computation built on several underlying advancements to well-understood concepts like federated computing and secure multi-party computing (MPC). The product is delivered via a set of intuitive, approachable APIs that integrate easily into existing data science processes. The capabilities available through the product are scalable and performant for a host of business cases across healthcare, financial services, and more. The solution enables organizations to ensure compliance with strict privacy regulations including HIPAA and GDPR while leveraging protected data to solve problems and drive progress.
WHAT DOES PRIVITAR DO?
Privitar is a vendor of several data privacy techniques including data masking, tokenization, generalization, perturbation, redaction, substitution, and homomorphic encryption. Most of these approaches involve altering the datasets so that they are considered “de-identified” or “anonymized” and can be distributed in compliance with HIPAA and other standards. Homomorphic encryption, though, allows for computation on encrypted data, but users may experience frustrations with computational performance and difficulty-of-use challenges. Additionally, no regulatory or compliance assurances can be made on top of homomorphic encryption, which is a cybersecurity technique, used in public clouds to keep data safe from other tenants on the same machine. It does not solve for regulatory privacy concerns: restricting access and usage of data for specified, permissioned purposes.
TripleBlind’s product is a comprehensive software-based solution enabling data users to compute analytics, query datasets, and train and infer on machine learning models using third party or protected data without “seeing”, copying, or hosting any raw data. Data owners leverage one-way data encryption and implement strict permissions controls over who can use their data and for what purpose. The solution works for all data types, structured and unstructured, including images and genomic sequences and across a wide range of computational tasks, including the training and inference of machine learning models.
Privitar’s “Modern Data Provisioning Platform” is a software-based platform specializing in helping organizations manage their data registration, policy lifecycle, approvals, access, and lineage reporting. Their Data Privacy Platform provides governance tools that apply policies, watermarks, audit logs, and de-identification techniques to data processes. Privitar SecureLink™ uses homomorphic encryption and blind matching to join distributed datasets without exposing the linking identifiers. The technologies employed by Privitar work only with structured tabular data.
PRIVACY ENHANCING COMPUTATION
Degree of Privacy
TripleBlind has undergone a detailed technical evaluation by MITRE, a federally-funded research and development center that works in the public interest across federal, state, and local governments, as well as industry and academia. Additionally, key partners and investors have conducted thorough analyses of the technology, and have concluded that TripleBlind provides the highest level of privacy and interoperability in the privacy-enhancing computation space. Additionally, TripleBlind holds formal mathematical proofs showing that the encryption is irreversible and quantum-safe. This means that TripleBlind meets the criteria for information-theoretic security (also known as Unconditional Security), a qualifier that refers to systems that are secure against adversaries with unlimited time and resources.
Privitar’s approach to privacy involves altering datasets so that they are considered “de-identified” or “anonymized” and can be distributed in compliance with HIPAA and other regulations. The techniques employed include a series of related techniques such as data masking, tokenization, generalization, perturbation, redaction, and substitution. These techniques either mask sensitive data elements, replace them with a token, or remove them altogether. However, in the case of HIPAA, one of the requirements for de-identification is that the data being shared cannot contain any information that could reasonably be used to re-identify an individual. Anonymized and tokenized datasets often fail to meet this requirement, as researchers have proven the existence of accurate techniques for re-identification of heavily anonymized datasets. Even when the 17 key HIPAA identifiers are removed from the dataset, context clues and the consideration of outside information can lead to re-identification. Therefore, tokenization or masking techniques are not always sufficient, especially in the cases of image data, genomics data, or rare diseases data.
Privitar is also a vendor of a homomorphic encryption solution. Homomorphic encryption is perhaps the most secure of the privacy approaches offered by Privitar and does not alter the underlying dataset, enabling higher fidelity usage. The encryption technique allows for simple data operations to occur on encrypted data without first needing to decrypt the data. However, this privacy scheme still involves the creation of a decryption key, and though that key remains in the possession of the data provider, the requirement of trust is not eliminated – the key could end up in the wrong hands through negligence or malicious activity. Anywhere trust is a requirement, risk is inherently present. Homomorphic encryption is also not proven to be quantum-safe, meaning that as the available computing power increases, the risk of cracking this scheme may increase.
TripleBlind enables organizations to easily add any number of counterparties to a data process without incurring computational burden or speed costs. In fact, some processes are actually faster with the addition of more parties than they would be on raw data, due to calculations being distributed among the participants, with the different pieces running in parallel. Even for complex multi-party computations, results are achieved in comparable time to executing the same operations locally on raw data. The solution exhibits strong performance even for computationally complex tasks including training deep neural networks on unstructured data.
Privitar, with de-identification techniques such as those provided by Privitar, adding counterparties to existing data processes can result in coordination burden and slower time-to-usage. Once the data is in a de-identified state, and the correct permissions are in place, the speed and compute resources required to operate on the data remain the same as if operating with raw data.
For cases in which Privitar’s homomorphic encryption offering is employed, only simple searches, joins, and analytics are practical. More advanced or complex tasks like machine learning model trainings are not currently supported, as they add substantial time penalties and computational burden. Even for less complex tasks, more resources are required and results are slower to obtain. According to an IBM study of the efficacy of homomorphic encryption, operating on fully homomorphically-encrypted (FHE) machine learning models requires roughly 40 to 50 times the compute power and 10 to 20 times the memory than doing the same work on unencrypted models.
Digital Rights Management (DRM)
TripleBlind enables auditable digital rights on how the data may be used by a counterparty, ensuring illegal or non-compliant use of the data is impossible. Settings can be configured such that permissions for data and algorithm usage occur on a per-use basis. The digital rights management enables any business agreement, regulation, or other set of restrictions to be overlaid on top of the data collaboration process, and permissions can be fine-tuned at the granular level.
Privitar provides capabilities for a central team or individual at an organization to define and manage privacy policies. These policies are meant to help balance utility and data privacy. They govern which datasets require de-identification, and which fields need to be masked, removed, or otherwise altered to achieve the proper level of privacy. This approach does not, however, appear to account for per-usage permissions and controls over how data can be used once it is de-identified. A recipient of a de-identified dataset may use that dataset for more purposes than specified by the legal agreements or policies in place. In other words, human trust is still required to account for potential misuse, whether intentional or unintentional.
Ability to Operate at Scale
TripleBlind makes it extremely easy to add new partners to an existing data and analytics collaboration. Because of the flexibly configurable digital rights management capabilities, adding a new data provider or data user is as easy as updating permissions settings, either in the web user interface or through API configuration, to allow the new partner to participate at the appropriate, agreed-upon level. With listings in cloud marketplaces like Azure, TripleBlind is now easier than ever to start using. Because the product itself is delivered in the exact same format for every customer, whether a data provider or data user, documentation and customer support are standardized and easy to access.
Privitar employs technologies that typically struggle when deployed at scale. Scale here refers to the use of these techniques on larger datasets (thousands of rows and/or columns) and with multiple coordinated parties. Most of the techniques require manual intervention steps to identify which fields of a dataset need to masked, tokenized, or removed. Some of these steps can be automated, but there is no ensuring that every piece of personal or identifiable information is removed or hidden until a human has double-checked.
Consider the plausible scenario in which personal or identifiable information is mistakenly entered into an unexpected column, which is then left in its raw state during computation or transmission because the column was not designated to be masked or removed. Without a human manually double-checking the work, using de-identification techniques could lead to personal information being leaked or shared in a non-compliant way. Additionally, the introduction of more parties to an existing process requires a level of harmonization and orchestration burden that suffers at scale.
Homomorphic encryption, though it avoids some of the manual steps listed above, is a slow and computationally intensive process. Additionally, the technique will only work on tabular data. It appears Privitar primarily uses homomorphic encryption to privately join two datasets, but the technology struggles when more parties/datasets are involved and when the types of operations being performed on the data are more complex.
Customers can acquire Privitar through cloud stores including the AWS and Microsoft Azure marketplaces.
Types of Data
TripleBlind protects data at the bits and bytes level. Rather than requiring a human to designate which fields should be protected and which can be exposed, TripleBlind takes the approach that any bit of data could contain private or sensitive information requiring protection. As a result, everything is one-way encrypted, and everything in the dataset remains computable, meaning datasets retain their full fidelity/utility and remain protected in-use.
A key consequence of this approach is that it allows for any type of data, including tabular, image, voice, video, genomics, and even proprietary data types to be processed with the same protections. Whereas typically image and genomics data are extremely difficult to de-identify due to their inherent inclusion of identifying information, TripleBlind makes it easy to compute on those data types and more, while providing the same protections afforded to tabular data.
Privitar works only with tabular or text data, according to their website. Each technique involves finding and replacing or removing identifiable information from datasets. This type of approach is unable to handle large or complex datasets, and does not apply to unstructured data like images or genomic sequences, which are impossible to de-identify.
Training New AI and ML Models
TripleBlind offers a capability called Blind Learning which makes training new artificial intelligence and machine learning models on sensitive data easier, faster, and compliant with laws that require data to remain in place. Blind Learning is TripleBlind’s alternative to federated learning, a well-known concept by which a model is trained individually at multiple data sources, and the resulting models are averaged. With Blind Learning, the data providers never get to access the full model, which protects the IP in the model. Blind Learning also protects against membership inference attacks, which seek to predict or uncover the data used to train a model.
Privitar does not offer tools for privacy-preserving machine learning. While their tools can play a role in prepping data for model training or inferences, they do not offer a comparable product to TripleBlind’s Blind Learning capability.
TripleBlind also provides protections for algorithms, which may contain sensitive intellectual property. Certain types of attacks such as reconstruction attacks target machine learning models, with the goal of recreating or guessing at the data used to train the model. Developing high-performing algorithms involves time and resources, so protecting algorithms from reverse-engineering has become an increasingly important objective. TripleBlind’s privacy-enhancing computation keeps the algorithms in a one-way encrypted state during computation. Therefore, the data provider is blind to the algorithm, the data user is blind to the raw data, and TripleBlind is blind to both – hence the name. Algorithm encryption is a distinguishing feature of TripleBlind’s solution which is rarely addressed in other solutions.
Privitar primarily tackles the data privacy side of the privacy-enhancing computation problem and does not claim to offer algorithm encryption capabilities, meaning that there is no ability to leverage proprietary algorithms within the Privitar platform. Data must be exposed to a third-party tool for data science activities, potentially removing any digital rights management from the shared dataset, increasing the need for Cyber/Third-Party Risk Management activities, and creating a storage burden for the data users. Data providers wishing to keep their data resident behind their firewall and allow operations to occur on that data are unable to make their data computation using Privitar’s toolset.
TripleBlind enables organizations to achieve compliance with data privacy laws and regulations, including GDPR and HIPAA. Because the technology allows computations on the ciphertext of one-way-encrypted data, raw personal data can remain safely in place while it is being used for computation. Where de-identification is required, such as in HIPAA, TripleBlind’s one-way encryption at the byte level ensures that every element of a dataset can be protected and invisible to the data user, who can still derive all of the actionable value they need from the full dataset. No data is ever copied, transmitted, or physically aggregated. TripleBlind holds formal third-party legal opinions stating that the technology can be used in compliance with GDPR and HIPAA.
Privitar can be used to help companies share tabular datasets with personally identifiable or sensitive data removed or masked. This can be sufficient for compliance with certain standards like HIPAA in specific situations, but one must be extremely careful in the application of these technologies, as misuse can easily lead to noncompliance or inadvertent data leakage. Additionally, it can be difficult to provide privacy guarantees when the methods applied rely on finding and replacing or removing sensitive information from the datasets. Finally, regulatory compliance often involves a heavy focus on the permissions surrounding how data can be used and who is allowed to use it. When one party sends a dataset to a counterparty, even when it is sufficiently de-identified, they lose the ability to enforce Digital Rights Management (DRM) over how the data may be used. Business agreements that often go along with de-identified datasets are not sufficient devices to ensure that the counterparty abides by the proper usage of the dataset.
TripleBlind is entirely software-based and API-driven, which makes it seamless for our customers to integrate privacy-enhancing computation into their existing tool suites and cloud solutions. The solution works on-prem or in the cloud, offering the flexibility customers need while reducing the requirements for extra data preparation and aggregation steps.
Privitar works both on-prem and in the cloud, and like TripleBlind, it is API-driven. This architecture allows for high degrees of interoperability compared to most privacy approaches because the solution can be easily implemented into existing processes, so long as the operations being performed do not add significant computational burden requiring additional compute resources like GPUs.
TripleBlind’s solution is delivered as containerized software that sits behind the firewalls of the data provider and data user, whether on-prem or on a cloud server. All data operations occur in a peer-to-peer way between the two or more counterparties involved. A TripleBlind “router” assists in establishing these connections, but never takes possession of any data or algorithms. This process eliminates the need for any specific hardware requirements for the users.
Privitar is delivered as software, meaning that it does not require specific hardware to run. However, some operations using homomorphic encryption may be computationally heavy, requiring additional compute resources, including hardware like GPUs, to run efficiently.
Accuracy Preservation at Speed
TripleBlind facilitates privacy-enhancing computation without sacrificing accuracy and stacks up extremely well compared to other approaches. In testing, compared to federated learning, Blind Learning was able to achieve more accurate model performance in much less time. Neural network inference using Blind Inference was also 15-2500% faster than other methods. The levels of accuracy achieved by the TripleBlind tools are acceptable in high-stakes environments like healthcare and financial services, where both speed and accuracy are critical to decision-making.
Privitar uses techniques that remove or mask certain data elements. This can lead to accuracy degradation due to the inherent trade-off it creates between utility and privacy. This tradeoff can be difficult to calibrate properly, especially when multiple regulations must be simultaneously considered. This approach to de-identification by default reduces the precision of the data and thus the accuracy of the analysis.
Book A Demo
TripleBlind keeps both data and algorithms in use private and fully computable. To learn more about Blind Learning, or to see it in action, please book a demo!