Social and economic online activity rocketed to sky-high levels over the past three decades, with an estimated 5 billion users on the internet. New internet viewership and participation continue to grow as well –– on any given day in the past five years, there were an average of 640,000 people online for the very first time. Users are constantly accessing, and generating data at record levels, but what efforts are governments making to protect their citizens’ and consumers’ private data?
The German state of Hessia enacted the world’s first data protection regulation in 1970. Since then, 137 out of 194 countries have adopted some level of legislation to protect data and privacy within their borders. Gartner® reports that by 2023, over 65% of the world’s population will have its personal data covered under modern privacy regulations.
Here’s a round-up of important global data protection changes enacted during 2022.
1. Europe Updates Standard Contractual Clauses under GDPR
The General Data Protection Regulation (GDPR) is one of the most comprehensive data protection regulations in the world. With applications for nations inside and outside of the EU, any organization seeking to work with European companies or users must comply with requirements set forth by the GDPR.
Standard contractual clauses (SCCs) ensure appropriate data protection safeguards can be used as a baseline for data transfers from the EU to third countries. Previous SCCs were repealed on September 27th, 2021, and replaced with the following:
- Businesses must use updated SCCs for all new contracts and processing activities entered into as of September 27th, 2021.
- Businesses must migrate all contracts entered into before September 27th, 2021, that use old SCCs into updated SCCs by December 27th, 2022.
- Data importers must confirm that they will only disclose personal data to third parties outside of the European Economic Area when the party has consented to these clauses or a specific derogation applies.
- Additional parties may be added to SCCs under a “docking clause,” which typically applies to new acquisitions.
2. Japan Amends The Act on The Protection of Personal Information
Japan’s Act on The Protection of Personal Information (APPI) was originally passed in 2003, rendering it one of the earliest privacy and data protection laws. Its most recent amendment focuses on further regulation of cross-border data transfers, requiring opt-in consent, and creating new categories of information regulated under the law. Additional changes include:
- Businesses that transfer personal information to third-party vendors overseas must ensure that the third party complies with safeguards and measures, including notice to the individual.
- Opt-in consent notifications must be effective and operative, meaning businesses must provide comprehensive information regarding transfers, safeguards, and maintenance for the protection of personal information.
- The implementation of “Personal related information” as a category, which includes any information related to an individual that does not fall within the scope of personal information, pseudonymous information, or anonymous information.
- Businesses must promptly report data breaches if the breach includes sensitive information, information that could result in significant economic loss or information collected through unjust means.
3. Kenya Updates 2019 Data Protection Act
Kenya’s Data Protection Act of 2019 sets out data subject rights, principles of data processing, obligations related to data transfers, direct marketing, and breach notifications. Additional sector-specific legislation addresses data protection in key areas such as the IT & Communications industry, the health sector, and the financial sector. Revisions to Kenya’s Data Protection Act, which came into effect in February of 2022, include
- Complaints Handling and Enforcement Procedures, which facilitate fair, impartial, and expeditious investigations and hearings of complaints.
- Registration of Data Controllers and Data Processors, which provides procedures and requirements for the registration of data controllers and processors in Kenya.
4. Eswatini Implements The Data Protection Act No. 5 of 2022
Eswatini’s first comprehensive privacy legislation governs the collection, processing, and disclosure of personal data. It establishes foundational data subject rights, such as the right to access and correct personal information. “The Act” also sets strict requirements in relation to retention periods and data security requirements, in addition to general provisions on unsolicited electronic communications and automated decision making.
5. Thailand’s Personal Data Protection Act Comes into Force
Thailand’s Personal Data Protection Act was initially enacted in 2019, with a grace period of one year for covered institutions. However, as a result of the COVID-19 pandemic, the Thai government issued royal decrees to extend compliance deadlines to June 1st, 2022. The PDPA applies to both entities in Thailand and abroad that process personal data for the provision of products and services in Thailand, much like the GDPR. Requirements and provisions include:
- Data controllers and processors must have a valid legal basis for processing personal data.
- If personal data includes sensitive personal data (such as health data, race, religion, sexual preference, criminal record, or biometric data), data controllers and processors must ensure data subjects grant explicit consent for the collection, use, or disclosure of data.
- Data subjects must be guaranteed foundational data rights, such as the right to be informed, access, rectify and update data; restrict or object processing; and the right to erasure and portability.
How could new privacy regulations impact your organization?
New and updated privacy regulations are necessary to protect sensitive consumer data, but could create a series of moving compliance targets for your organization. Navigating local, state, federal, and international regulations could give your legal team (or entire organization) quite the headache –– resulting in an unused repository of critical information. Not to mention that compliance is also costly. Industry spending on compliance is estimated at $270 billion per year, with 87% of business leaders expecting investment in compliance to increase over the next three years.
The rise in internet usage, data creation, data analysis, and machine learning provides golden opportunities for innovation across sectors, including healthcare and financial services. It’s more important than ever for organizations to harness the intellectual property value of sensitive data for novel solutions –– without compromising the privacy rights of individuals around the world.
What is the TripleBlind Solution?
The TripleBlind Solution solves for regulatory compliance with privacy-enhancing technology. We offer data collaboration without data transmission, allowing for regulated data to be used without violating regulations such as The GDPR or HIPAA. By one-way encrypting and never decrypting private data, the TripleBlind Solution affords a fast, secure, and simplified approach to data analysis, machine learning, and neural network training with sensitive information.
With privacy-enhancing computation, TripleBlind is able to provide robust, sustainable measures to analyze, pool, process, or collaborate with data. Imagine if your organization could develop new pharmaceuticals at a fraction of the cost, drastically reduce cases of credit card fraud, or simply analyze regulated data from antiquated legacy systems. As the data economy booms, so will use cases of our complete and scalable technology –– and we’d love to connect with your organization to foster innovation for the future.
Is a “data graveyard” the latest challenge haunting your business operations? It’s no question that data is the backbone of all modern organizations, driving ambitious projects forward through a foundation of rich insights. But what happens when an organization collects, stores, and then rarely uses valuable data about customer satisfaction, business operations, or the results of strategy implementation?
Data becomes buried under more data, collecting metaphorical dust and costing enterprises revenue opportunities, efficiency and productivity improvements, and more. In fact, Forrester reports that “between 60% and 73% of all data within an enterprise goes unused for analytics.” Privacy regulations add additional compliance targets for healthcare and financial services enterprises, leading to over 43 zettabytes of data that are stored and inaccessible for research and analytics purposes. The result? Data graveyards, high storage and maintenance costs, and missed opportunities for catalyzing business growth informed by intellectually-valuable data.
What is a data graveyard?
A “data graveyard” is a trending term understood to be a large repository of unused data, typically resulting from the collection of information without the capacity or resources to analyze it. Data graveyards differ from data silos in that data silos are usually controlled by one department or business unit and isolated from the rest of an organization. Data silos can become data graveyards over time –– if data sits unused, then an organization is unable to maximize data collection, storage, or analysis to its full potential.
Why do data graveyards exist?
Data graveyards are prevalent for a variety of reasons, ranging from data collection for AI initiatives and machine learning to informing more effective business decisions. Data is often collected for a specific purpose, but then one of three things might happen:
- The data is used for its specific purpose and then deleted
- The data is used for its specific purpose and perpetually stored
- The data is never used for its specific purpose but held for potential future use
This also applies to data that is collected without an intended application. Companies might collect data that could be used to inform product development, marketing strategies, revenue maximization, and more –– but may not have the operational resources, infrastructure, or organizational capacity to prioritize the use of such data in a given business quarter. In these cases, data is collected on a good-to-have basis, but left untouched as more informative or relevant data is collected.
Additionally, organizations might choose to retain data from legacy systems. As businesses prioritize digital transformation, outdated computing software or hardware might retain valuable or private information that an enterprise deems necessary for future contexts. The average merchant spends over 58% of their IT budget to maintain legacy systems, even as they develop new strategies to compete with e-commerce giants like Amazon and Alibaba. In the healthcare industry, hospitals that previously used “homegrown” software products may require software patches or updates to maintain old data –– especially as they transition to newer solutions from Cerner, Epic, or MEDITECH.
During these software and system transitions, organizations may feel challenges in determining which information to store or maintain for future purposes. For many, information on consumer behavior or long-stored healthcare data could yield future gains, justifying extended maintenance of legacy systems. However, allowing high-value data to rest in a data graveyard for eternity can inevitably lead to future risks for an organization. What are these risks?
What challenges do data graveyards pose?
Legal compliance and practical considerations apply to any organization maintaining a data graveyard. For global enterprises, compliance with The General Data Protection Regulation (GDPR) is a requirement for working with businesses and consumers in European regions. Following its implementation in 2018, businesses are now required to abide by the following principles when storing and handling personal data:
- The Storage Limitation Principle
Under this principle, personal data may not be kept for longer than necessary to achieve the specified purpose for collecting the data. Even if the collection and use of personal data is entirely lawful, organizations may not retain data beyond its intended use –– meaning that data graveyards, when filled with personal data, are non-compliant with the GDPR.
- The Purpose Limitation Principle
Under this principle, personal data can only be processed for specified, explicit, and legitimate purposes. Organizations must be transparent about the intended use of data and remain accountable to consumers, which means that data collected for “improved product features” cannot be used for targeted marketing purposes in the future, unless consumers have given explicit consent to both acts. This means that organizations seeking GDPR Compliance should collect data on a need-to-have basis, as opposed to a good-to-have basis.
- The Accuracy Principle
Under this principle, data must be accurate and kept up to date. Personal data must be complete and rectified without undue delay –– so if an organization holds data from 15 years ago and hasn’t touched it since, the GDPR requires either deletion or anonymization of this data. Inaccurate data can also lead to false insights and uninformed decision-making, so cleaning out a data warehouse could lead to more optimal business outcomes.Practical considerations also lead to unnecessary costs for businesses with large data graveyards. The average cost to store 1TB of file data per year is $3,351, with companies collecting and storing petabytes worth of data each year on-prem or in the cloud. Internal resources might also spend long hours maintaining an unused pool of data, which means an organization’s data graveyard could become a greater cost than source of revenue potential. How, then, can businesses revive data graveyards to drive future business growth?
How can my organization eliminate data graveyards?
- Determine privacy compliance requirements for your organization
The General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and California Consumer Privacy Act (CCPA) are a variety of regulations governing the collection, storage, and use of personal information. Depending on your organization’s scope and sector, requirements from these laws could apply to your business operations. Identifying specific opportunities for compliance can guide future data-centric decision making.
- Identify current costs and benefits from long-term data storage
Not all data storage is a bad thing, but storing all data certainly is. If the costs of retaining outdated or simply old information exceeds the potential value of using the data, then it’s time to clear the ghosts out of your organization’s closet. Note that the GDPR does not apply to anonymized data, so it is possible to retain valuable components of old information while maintaining legal compliance.
- Develop an internal “Retention Policy”
Even if these regulations don’t apply to your organization, internal data retention policies can reduce costs and improve insights garnered from data over the long term. How long does your team need data to achieve a specified purpose? Will you delete collected data on a specific date, or will the data be anonymized for future uses? Clarifying your business’s data practices can prevent the build-up of unused data, ultimately reducing the size and cost of your data graveyard.
- Collaborate with vendors to maximize data usage
Privacy-enhancing technologies have revolutionized how organizations can use long-stored data and maintain legal compliance. Using privacy enhancing computation (PEC), organizations can work together to derive insights from one-way encrypted and anonymized data sets. What does this mean? If your old and unused data has a use case, PEC can help unlock the intellectual property value of your data without compromising client, patient, or consumer privacy.
What is the TripleBlind Solution?
The TripleBlind Solution is the most complete and scalable approach to PEC techniques on the market. We provide robust and sustainable measures to analyze, pool, process, or collaborate while data remains protected in use. The best part? We never see, store, or share your data. The TripleBlind Solution offers the following additional advantages:
- Supports GDPR & HIPAA Compliance through Privacy-By-Design –– TripleBlind never stores or handles any personal data. TripleBlind’s technology permanently and irreversibly de-identifies data through a combination of one-way encryption and distributed computing, which allows the algorithm to generate the same outputs without requiring an Algorithm Provider to process or use any data within the scope of the GDPR’s definition of personal data.
- Exceptional AI/ML modeling and analysis toolset –– TripleBlind enables all data operations to occur on any type of data, without adding speed penalties or requiring additional storage. Train AI models and find enterprise solutions faster than and with greater accuracy than ever before.
- Collaborate securely & seamlessly with 3rd-party vendors –– TripleBlind provides robust digital rights management (DRM). Each data operation must be explicitly approved by the appropriate administrator. Once approved, the dataset is one-way encrypted for one-time use. Once the operation is complete and the result is returned to the appropriate party, the one-way encrypted data is rendered useless. Permissions can be set as broadly or specifically as needed, to govern both internal and external use of an organization’s information assets.
Our privacy-enhancing techniques keep data in place, allowing controllers and processors of data to interact in a peer-to-peer software environment. Your dead data doesn’t have to haunt your organization by sitting unused or in a silo. If you’re ready to learn more about how PEC can revive your data graveyard, download a complimentary copy of our whitepaper here.
Omdia, the research arm of Informa, owner of multiple technology media, events and research organizations, recently published a profile of TripleBlind, titled, “On the Radar: TripleBlind Enables Secure Data for Third-Party Processing.” The report is available on Omdia’s and our websites.
Rik Turner, principal analyst, Emerging Technologies and author of the report, offered several valuable insights:
- “This issue (data collaboration) has arisen of late because analysis of big datasets can achieve unique insights, that is, ones that analysis of smaller datasets simply cannot surface. This is particularly important in certain fields such as healthcare, where the analysis of the data of millions of patients can indicate general trends in an entire population or in particular demographic groups.”
- Turner adds that TripleBlind’s solution is complementary to confidential computing “in that it can deliver the encryption/anonymization capability that confidential computing itself does not.” He notes that TripleBlind is also complementary to differential privacy.
- “Healthcare is the place where third-party analytics delivered on securely private data has so far generated the most immediate interest. That said, tech such as TripleBlind’s is clearly relevant elsewhere as its financial services customers demonstrate.”
The report goes on to highlight TripleBlind’s innovation when comparing the company’s privacy-enhancing technologies (PET) to other solutions:
- Homomorphic encryption “tends to be quite slow and is very compute intensive. There are also academic debates about whether homomorphic encryption is quantum-resistant.”
- Confidential computing “does not address the issue of data anonymization and is hardware dependent.”
- Differential privacy “like homomorphic encryption, cannot operate on audio or video files.”
A final comment from the report: “TripleBlind addresses many of the shortcomings of other approaches to data privacy for third-party processing. Its pricing mechanism, which factors in the size of the company, makes it attractive to customers large and small …”
Be sure to check out the full report to learn more about how TripleBlind can enable private data collaboration. Contact us today to schedule a free demo!
While artificial intelligence (AI) has been introduced widely into healthcare, adoption at scale across the industry is still in its infancy. As a direct result of COVID-19, there has been an increased interest from both medical enterprises and data practitioners to find solutions to healthcare problems using technology and AI.
Acumen Research and Consulting predicts the global AI in healthcare market will surpass $8 billion by 2026. As AI and big data are becoming increasingly prevalent in healthcare, these recent and emerging trends will create lasting changes in the industry.
Digital pathology saw rapid adoption of AI in 2020 as a means to continue providing quality, necessary care for patients with pre-existing conditions and diseases while short-staffed medical facilities were overwhelmed with COVID patients. The increased need for remote work and treatment led to the CMA and FDA issuing temporary policies that have made digital pathology cheaper and more flexible by allowing for more remote diagnosis and relaxing regulations on whole slide imaging devices.
There has been considerable movement in AI research related to digital pathology since the pandemic began. Recently, in December 2021, clinical pathology company Sonic Healthcare was part of a $97 million funding round of AI company Harrison.ai, and announced a joint venture “to co-develop and commercialize new clinical AI solutions in pathology.”
Grand View Research reports that due to the increased demand for advanced diagnostics as a result of the climbing prevalence of chronic diseases, the global digital pathology market size is expected to reach $1.74 billion by 2030. The market size was at $311.8 million as of 2021.
Democratization of AI
Industry analysts predict continued growth in the worldwide low-code and no-code development technologies markets, citing democratization as one of the major drivers. A 2021 Gartner study forecasts the worldwide market will total $29 billion by 2025.
General AI democratization coupled with significant research and development in healthcare AI will inevitably lead to wider industry adoption. AI tools will become more accessible to medical professionals that are not highly specialized data practitioners, creating more synergy throughout diagnosis, care and data analysis by reducing the need for intervention by software professionals.
AI as a Service (AIaaS) companies have already begun partnered work with healthcare institutions to offer build-your-own AI models and low-code algorithms that better suit industry-specific needs for analysis and reporting.
Use of Patient Data
AI unlocks wide-reaching potential for better care and treatment by allowing medical professionals to efficiently analyze and compare massive amounts of data. As more healthcare institutions digitize their new and archived patient records, there is an ever-growing amount of medical data that can be used to identify patterns and similarities between patients.
Healthcare providers can employ AI to identify patterns of data that can signal a change in patient status or risk of developing certain diseases. There are already cases that prove AI has led to more accurate and faster diagnosis in cases of COVID, tuberculosis, Alzheimer’s and more, often weeks prior to patients experiencing traditionally-expected symptoms.
AI algorithms can also assist in drug development by allowing researchers to identify the most ideal patients to participate in the research and trial processes, significantly reducing time, cost and error in development.
Our previous blog posts discuss how leveraging data can fuel pharma innovation and reduce health disparities and improve health equality.
While still at the precipice of widespread AI adoption, innovation in healthcare is profound. To learn more about how TripleBlind can help further data collaboration and analysis for healthcare enterprises, please contact us today.
Pharmaceutical companies and other organizations that rely on clinical trials are increasingly pushing for greater informational transparency and sharing of patient data.
The rise of sophisticated analytics has allowed for more insights than ever to be extracted from clinical data. In addition to aiding research and development, clinical data can be used to benefit patients, years or even decades after they have participated in a trial.
Unfortunately, pharmaceutical companies and research institutions do not have unfettered access to clinical trial data, especially down to the level of the individual patient. Trial participants understandably want to retain their privacy and many regulations prohibit the improper disclosure or use of personal information, such as patient ages or specific medical conditions.
Where Access to Clinical Trial Data Provides Value
Datasets from most clinical trials contain detailed information on individual participants. Access to patient-level data can not only allow for more granular analyses, but patient-level data is also valuable when it comes to quality checks. If unexpected side effects suddenly become associated with a certain medication, going back through the trial data and performing analytics at the patient level can reveal insights into the development.
Access to patient-level trial data also helps to optimize the value of that information through secondary analysis. For example, both Pfizer and Moderna have massive amounts of clinical trial data related to the development of their individual COVID-19 vaccines. A secondary analysis involving trial data from both companies could theoretically provide many new insights about the novel coronavirus.
International Data Collaboration
Access to more data can also facilitate pharmaceutical research on an international scale. The United States, Europe, China, India, and other jurisdictions all have agencies that oversee the approval of new medications based on established guidelines. For a new drug to get approval by the FDA, for instance, the clinical trials for the drug must be run at U.S. institutions. If regulations can be effectively navigated, such as through clinical data anonymization techniques, it opens up broader possibilities for both research and drug approval.
A New Perspective on Existing Data
Access can also facilitate subsequent analyses of a clinical dataset with objectives that differ from that of the original analysis. Follow-up analyses can help researchers gain a deeper understanding of the original trial and possibly unlock additional insights.
Integrating Trial Data with Real-World Data
Access to clinical trial data can also facilitate more representative datasets. Clinical trial populations tend to be very unique cohorts that do not always reflect the broader population. With access, clinical trial data can be compared with subsequent real-world data.
Data Anonymization in Clinical Trials and Analytics
Access to clinical trial data must also be balanced with protections for the privacy of individual participants.
This can be done with data anonymization techniques that obfuscate or eliminate identifying aspects of patient records. Data anonymization in clinical trials should be calibrated such that the utility of data is maintained. Additionally, data anonymization techniques should maintain the integrity of the original dataset. If the integrity of pharmaceutical trial data is corrupted by poorly calibrated anonymization, it could pose a significant public health threat.
In the U.S., the Health Insurance Portability and Accountability Act (HIPAA) outlines two approaches for anonymization. The Expert Determination approach involves a clinical trials data anonymization analyst applying statistical techniques to make the possibility of identifying individuals incredibly difficult or impossible.
The Safe Harbor approach involves the removal of 18 specific types of identifying information from individual records, including name, Social Security number, telephone number, IP addresses, and license plate numbers. Many of these identifiers are not typically collected in the course of a pharmaceutical trial.
Accessing More Clinical Trial Data with Blind Query from TripleBlind
To address the challenges associated with sharing clinical data, TripleBlind has developed a unique set of data tools called Blind Query.
This innovative suite of data tools allows users to perform remote data queries while keeping privacy intact. Users can search datasets, join data sets, perform analyses, and create reports — all without needing to obtain direct access to sensitive data. With Blind Query tools, data operations are always performed remotely and can even be in multiple geographical or organizational silos.
The Blind Query suite of data tools can perform three main functions:
- Blind join. Users can apply SQL-like methods to private tabular datasets to identify specific values, then extract those values to join with their own dataset. Data providers control access to specific data columns, and non-matched data is never revealed by Blind Join operations. Blind Join can perform operations on millions of records and identify non-exact (fuzzy) matches.
- Blind string search. Users can conduct standard searches of text data without gaining access to non-matched text. Data providers are protected, and users can extract only the essential information they need.
- Blind stats. Users can generate a report of descriptive statistics on private datasets, which is an essential function for understanding the demographics of clinical trial populations. Blind Stats also enables multi-party data collaborations by allowing participants to understand the qualities of a dataset, without compromising privacy.
TripleBlind offers a wide range of privacy-preserving data tools, including the Blind Query suite. If you would like to learn more about how TripleBlind can facilitate data access and collaborations, please contact us today.
There’s been an explosion of interest in Big Data for digital health platforms in recent years, particularly as so much more data gets digitized.
This transition means more information is available to data scientists to find novel insights, drive clinical decision making, and to support or improve how we deliver healthcare.
However, big data bias is woven into this process from start to finish.
Suraj Kapa, SVP of healthcare and government at TripleBlind, recently hosted a webinar with a team of expert panelists, with the goal of providing an understanding of bias in big healthcare data. This includes learning what bias in data can look like, understanding the related issues (especially in regards to digital health platforms), and the scalable deployment of digital health tools to society in general.
This last part is becoming an increasingly important issue, especially as the resources and technology in the industry have started to mature enough to allow for this kind of deployment on a larger scale.
Keep reading to learn a few of the highlights, or watch the webinar for the full discussion.
Are you overestimating your ability to detect big data biases?
Most people, data scientists included, have limited knowledge of just how easily datasets can become biased, or where that bias can come from.
When considering this, some key questions need to be addressed: where’s the data coming from? How is it being used? How is it being integrated? All of this factors into the potential imposition of bias into these processes, which has the potential to actually widen the disparities of care that might exist.
Big data bias can hit every part of the data scientist’s world. It starts with the data gathering bias, which is easy enough to think about – are we fully represented in the training dataset, or in the validation dataset to approve the algorithm (such as the FDA or regulatory authority)?
This might be solved by having diverse enough datasets, but there’s also data analysis biases, and data application biases, both of which are critical to address.
Here are a few considerations we need take into account:
The best analysis is limited to a fortunate few players
Data has helped concentrate power within the digital economy. Take for instance the number of institutions (especially in healthcare) that have mature data platforms — where the data’s truly integrated, and silos have been broken down within the institution. These represent a small few of a much larger ecosystem where the data actually exists. This has the potential to create a big focus on those who have reached the level they need to with their data platforms, where they can enable their data to translate into the digital economy.
Spurious correlations are becoming a bigger problem
The data economy is changing our approach to accountability, from one based on direct causation to one based on correlation. As scientists, we don’t just want correlation: we want causation so we can do further studies and further analyses; to tweak or remove variables and see if it shifts results one way or the other.
When drawing a conclusion, we want to understand that this is a causal factor as opposed to a correlative factor.
But when we assume causation from correlation, all we have is this massive dataset with limited explainability. So if there’s bias in how we obtain the result, that’s going to magnify into the algorithm, and this can potentially lead to worse care outcomes, especially in specific populations where there’s not as much data, or they’re not as well represented.
It’s important to remember that we sometimes assume if we just have enough data, we’ll be free of bias. But the reality is, data systems often mirror the existing social world.
Take, for example, gender bias in one of Amazon’s algorithms.
Amazon had an AI algorithm where they looked at resumes, because they wanted to quickly identify a good resume versus a bad resume. But the algorithm tended to overemphasize masculine words in the resume, and downgraded anything that had to do more with female counterparts. This led Amazon to shut down the algorithm in 2017.
This is a prime example of the type of problematic correlation that can affect our algorithms if we’re not careful. Remember, humans determine which data is captured. We create algorithms and assemble datasets, and since humans are biased, this bias is naturally built into the process.
These problems are subtle, and potentially very hard to measure or account for when you’re trying to avoid bias. Beyond gender, there are other clear factors such as demographic bias, ethnicity bias, and socioeconomic bias.
Imagine what this might look like in healthcare. If you go to Iceland and you build a novel genetic disease algorithm, you could probably apply this to patients in Nigeria or India or China, but first you need to figure out whether you’ll need an additional step of validation to ensure the algorithm is truly applicable. Otherwise, you risk creating worse health outcomes in patients, because you’re over- or under-representing disease risk.
The application of biased algorithms
We need to consider the application of these algorithms once they’re approved, and once they’re delivered through digital platforms
These sources of bias exist throughout the data life cycle, even from the funding that led to the creation of the data assets that you’re using.
Say you created algorithms based on every clinical trial participant that’s ever been in a clinical trial globally, because of course that data’s very robust, right? It’s not perfectly clean, but more clean than real-world data. It seems reasonably representative of the population in general, and you can get enough patients to have ample data.
The problem is, there’s very specific types of people who tend to engage in clinical trials, and they’re not reflective of the larger portion of the population.
So how can we solve for this? This is what we get into with the panel.
Solving for data bias issues: panelist discussion
In the webinar, we brought on expert panelists to discuss these issues. (We’ve paraphrased and truncated these quotes for clarity and brevity). Watch the webinar for the complete explanations from each panelist.
Insights from Aashima Gupta (Director of Global Healthcare Solutions at Google Cloud)
On the cloud industry’s role in understanding how to scale appropriately to mitigate the bias issues.
There are three key things to emphasize. The first is having a common set of principles, making sure products and partnerships go through that AI review consistently, in a repeatable fashion.
Across Google, we work with AI using a common set of principles. There are eight of them and they govern all of our work, including in healthcare, but they are common across all domains. Much of our product development work goes through that repeatable process of applying the AI principles in the context of product, and the product in the context of the customer work
The second is around explainability, and sharing that as a community resource.
Consider the nutritional content in our food or medication. Regardless of the use case, we rely on information to make responsible decisions. But what about AI? Despite its ability and potential to transform so much of the way we work and live, machine-learning models are often distributed without a clear understanding of how they function.
So what we have built in that context is a framework called a model card or data card, and we share a common vision with the industry. Model cards are not a Google product. It is a framework that we’ve shared with the industry to define the explainability of the model.
Say for example we have a data model that performs consistently across a diverse range of people. That data model, in the framework that we have built, has come from years of research within Google, and helps define the conditions in which the model breaks, showing what data element has gone in to affect it. Much like tying it back to the nutrition label: what is a protein, sugar, a carbs? Same thing. What is the diversity of the data?
And the third emphasis is about machine learning operations.
When building the model, you need to think about AI operationalization. There’s a significant bottleneck in the effective operation of machine-learning platform building. Only one in two organizations have moved beyond pilots and proof-of-concept at scale. So how do you make an adjustment when a new set of data comes in? How does that change your machine learning data pipeline?
This problem is much bigger than Google alone, so this is where the partnership with academia, with the customers, with the ecosystem on large will be important.
Insights from Brian Anderson (Chief Digital Health Physician for MITRE)
On the cross section between industry, academia, and government to establish what should be the standards.
A lot of my work centers around guardrails. We recently started a coalition called Coalition for Health AI, or CHAI, and it brings together key stakeholders (academia, industry, and public sector government) around this common mission to set standards or guardrails, and promote the kind of trustworthiness and transparency into models and their applicability.
This is to further develop algorithms and models that are useful for all of us, and that are trained on data that is inclusive of all of us.
Part of building any kind of standards involves the coalition of a willing and critical mass of implementers. And so, in CHAI, what we are attempting to do is pull together, in the industry side (stakeholders like Google, like Microsoft), is to take those bodies of work that organizations are developing and publishing across the industry.
From there, we then look to develop a kind of agreeable framework that we can all as a coalition say, “yes, this is what we are going to move forward with to address data bias, or to address human application bias.” Or “this is how we are going to approach testability or promote transparency in algorithm development.”
And then to have a real technical framework that is implementable in industry.
And it’s that critical part of actually implementing these technical frameworks that then provides that iterative feedback into a standards process, and builds the kind of coalition and adoption curve that you like to see
These things tend to start with academia and private sector industry coming together. And the government sees its promise, and gets behind it. From there, they may be able to offer some input and some advice on some of the equities and concerns from a government standpoint.
Insights from Daniel Kraft, (Faculty & Sheriff for Medicine at Singularity University)
The VC industry leader perspective on digital health, and a framework for how people are looking to address bias issues.
Compared to late-stage organizations where the focus is more on guardrails and bias mitigation, the earlier stage private sector organizations have more of a “build fast, grow fast” mentality.
Often these are well-meaning guardrails from the past that haven’t operated so well. For example, HIPAA is a bit antiquated, and still it’s analog-era regulation in a digital connected age. We’ve seen patients die because we’re waiting for the HIPAA sign-off to get their EKG sent over from another hospital. So I think there’s probably some middle ground.
We still need to enable startups to, hopefully within good context and good faith, be able to hopefully collect, leverage, and learn from data, but also have the responsibility to do that in ways where they can optimally share it and understand where maybe there’s a role to educate startups, academics, and others, and even clinicians today, where the biases may occur.
I think none of us want more data as clinicians. We want the actionable insights and how those get presented again could start to inform us about the opportunity to maybe, with the patient in front of us, have them opt in, in appropriate ways, to share data if they’re a member of, let’s say, an unrepresented set, all the way to, again, where there might be bias, and give you that sort of little “check engine” light.
You’re in the path of managing a patient like you would the average European when they have a lot of other elements from their socioeconomic to their genetic determinants. So I think that the design piece is key, as well as how to engage people in sharing and opting in.
We have these long legal forms about sharing data, there’s all these new ways to manage it with, with blockchain and beyond. How do you explain that in smart ways to the folks who are asking to hopefully share and opt in to contribute some of their data and knowledge base?
I think perfection is also the enemy of the good here, and we need to sort of be building stacks and new approaches, and allowing folks to become data donors who often are unrepresented, so we can have less biased starting points.
Final thoughts about data bias
When we’re trying to abide by privacy and varying regulatory standards, a lot of times we’re forced (for a good reason) to extract extensive degrees of context from the data, and so the algorithms come from data that has removed context. Ideally, we’d like to maintain that context in those data sources.
We need to ensure secure, privacy-preserving, yet scalable approaches so we can collaborate on data broadly, to ultimately mitigate bias that emanates from limited data diversity.
While some elements of data diversity are measurable — like what proportion of your population is African American versus Hispanic American versus Indian or other populations — there are some immeasurable aspects that you might need larger datasets in order to ultimately account for, because. In other words, we don’t know what we don’t know.
We need to have a means to verify the level of reliability. This means improving methods to understand how well an individual is truly represented within a dataset. How do we create systems to understand the representation of a given new individual coming in, and to whom this algorithm is now being applied?
We can’t sit here and think that industry by itself, academia by itself, or government by itself are going to be able to — as a silo — figure out how to solve all of this. The creation of appropriate guardrails will likely require cross-sector consortiums.
Why organizations are waking up to the benefits of PEC
From healthcare to finance, businesses today are collecting and storing more sensitive customer data than ever before.
At the same time, consumers are much more aware of the risks of sharing their personal information with companies. As a result, businesses must find ways to protect their customers’ privacy while still being able to collect and use data for business purposes.
One way to address this challenge is through Privacy-Enhancing Computation (PEC). PEC, also referred to as Privacy Preserving Technology or Privacy-Enhancing Technology (PET), is a set of techniques that can be used to protect data while still allowing it to be shared and used for analysis.
But PEC can seem too complex a topic for non-technical business leaders, and you may not know whether it’s for you. So why should businesses actually bother to learn about Privacy-Enhancing Computation techniques?
How Privacy-Enhancing Computation can help your business
Privacy laws, regulations, and policies are an important part of protecting consumers and patients. That said, privacy rules constrain businesses from leveraging their data as effectively as they could.
Most privacy-related issues can be divided into two categories: internal and external.
PEC solves internal privacy issues (within organizations)
If you work in a multinational company, or have some kind of cross-border operations, you may be too familiar with this already: when sensitive data crosses borders, business gets more complicated.
Every country and state has its own set of laws, which creates a jungle of rules and regulations to navigate. Working in the US will require you to jump through different hoops than in the EU, and the rules change again when you work with European countries like Ukraine, who aren’t EU members (yet). The more locations you have, the more painful this can be.
Navigating US medical privacy laws is cumbersome enough, but now you have to consider different sets of rules for every new location? (Cue collective groan).
Similarly, financial workers have to deal with their own labyrinth of privacy constraints. Since China doesn’t let private data leave the country, and your New York headquarters needs to run payroll, how are they supposed to know who is getting paid? Are you just supposed to fly someone to China with a laptop every time you process payroll? That seems wasteful at best.
Or what about trying to stay on target for Diversity, Equity & Inclusion (DEI) goals? It becomes a much bigger hassle to assess your progress when you’re dealing with international rules.
PEC provides great technologies for dealing with all kinds of data residency issues like these.
PEC solves external privacy issues (between organizations)
Let’s say you want to pool data between your company and others. How do you do it safely and legally?
In financial services for instance, fraudsters may learn a new trick and hit as many financial institutions as possible before anyone is the wiser. Their job is easier because these institutions don’t share how they’ve been tricked, and have so many limitations on their ability to work with other institutions — no company wants to risk getting hit with an antitrust suit, or to have their competitors exploit valuable data.
The healthcare industry has their own set of data collaboration issues. Take precision medicine for instance. Big medical centers like Mayo Clinic (one of our investors) have oceans of data, much of which is hard to work with.
Imaging data like an EKG, EEG, or MRI might be straightforward to analyze individually, but what if you want to analyze a million of them, and use AI analytics tools to find important correlations — without all the incredibly biased data sets?
PEC is your answer. Legal agreements between medical institutions might seem like the only option, but these are slow and riddled with obstacles. Meanwhile, PEC is making it incredibly easy to collaborate with complex medical data, and to expand data sets without risking privacy.
“This is all great, but PEC sounds too expensive for us”
Surprisingly, Privacy-Enhancing Computation is a big cost-saver for businesses, especially when compared to more traditional alternatives to data collaboration (internally between countries or externally between organizations).
In particular, TripleBlind solves these issues inexpensively. You can ask us all about this when you book a demo.
“I’ve heard Privacy-Enhancing Computation solutions are slow and hard to scale”
PEC is still an emerging field, and it’s evolving rapidly. In the earlier days of these technologies, a PEC option called Homomorphic Encryption was a big focus, but it has a reputation for sluggish speeds and a high demand on computing power.
But PEC has evolved well past the stage of “clunky software experiment” and has grown into a full-fledged, commercially viable data collaboration option. In recent years, plenty of fast and scalable alternatives have emerged, allowing organizations to leverage their data without all the extra demands on resources.
The White House Office of Science and Technology Policy even recently endorsed the field, and is seeking public input to help increase adoption.
“We don’t have the tech talent to manage a PEC initiative”
One of the wonderful trends in technology today is how broadly accessible it’s becoming, and PEC is no exception. You don’t need to hire any expensive, high-level tech people to deal with your PEC solutions. It’s straightforward enough that everyday coders can work with it.
You can read about other common misconceptions around privacy-enhancing technologies on our blog.
Privacy-Enhancing Computation can help businesses meet their legal obligations to protect customer data, but PEC also helps businesses get more value from their data by allowing them to share and analyze it without revealing sensitive information about individual customers.
Despite these advantages, PEC is currently in the early and early-mid adopter stage of marketing development. This is often because business leaders are not familiar with PEC or its benefits, or have preconceptions about whether PEC is a realistic solution for them. For instance, many people are surprised to learn how PEC allows them to work with complex data like images, rather than just numbers or text.
Not only is PEC cost effective and faster than current alternatives, it allows organizations to confidently collaborate with data, knowing their usage will comply with regulations.
Alex Koszycki, Product Manager, TripleBlind
Chad Lagomarsino, Partnership Engineer, TripleBlind
David Almeida, Senior Customer Success Manager, TripleBlind
Privacy-enhancing technologies (PET) enable enterprises to collaborate using real, sensitive data for analysis, and can unlock innovation in all industries.
On July 27, TripleBlind data practitioners will lead a live demonstration, and discuss how PET can solve some of the most common issues in data access, data prep and data bias. The TripleBlind team will take questions and suggestions from participants live during the webinar.
Key Features of PET that will be covered include:
- Permanent & Irreversible One-Way Encryption
- Robust Digital Rights Management System
- Collaborative Peer-to-Peer Environment for Data Providers & Data Users
Common obstacles for data practitioners, in addition to increased regulations on data, has prevented collaboration between enterprises. Learn how privacy enhancing technologies (PET) are enabling broader, diverse data engagement when it comes to data prep and enabling secure cross-institutional data exchange.
Wednesday, July 27, 2022, 11 a.m. CT
Virtual, via Zoom
Participants can register here.
- Explore our AWS and Azure Marketplace options
- View our use cases
- Follow TripleBlind on LinkedIn and Twitter
Combining Data and Algorithms while Preserving Privacy and Ensuring Compliance
TripleBlind has created the most complete and scalable solution for privacy enhancing computation.
The TripleBlind solution is software-only and delivered via a simple API. It solves for a broad range of use cases, with current focus on healthcare and financial services. The company is backed by Accenture, General Catalyst and The Mayo Clinic.
TripleBlind’s innovations build on well understood principles, such as federated learning and multi-party compute. Our innovations radically improve the practical use of privacy preserving technology, by adding true scalability and faster processing, with support for all data and algorithm types. TripleBlind natively supports major cloud platforms, including availability for download and purchase via cloud marketplaces. TripleBlind unlocks the intellectual property value of data, while preserving privacy and ensuring compliance with HIPAA and GDPR.
TripleBlind compares favorably with other privacy preserving technologies, such as homomorphic encryption, synthetic data, and tokenization and has documented use cases for more than two dozen mission critical business problems.
For an overview, a live demo, or a one-hour hands-on workshop, email@example.com.