Privacy researchers in Europe believe they have the first proof that a long-theorised vulnerability in systems designed to protect privacy by aggregating and adding noise to data to mask individual identities is no longer just a theory.
The research has implications for the immediate field of differential privacy and beyond — raising wide-ranging questions about how privacy is regulated if anonymization only works until a determined attacker figures out how to reverse the method that’s being used to dynamically fuzz the data.
Current EU law doesn’t recognise anonymous data as personal data. Although it does treat pseudoanonymized data as personal data because of the risk of re-identification.
Yet a growing body of research suggests the risk of de-anonymization on high dimension data sets is persistent. Even — per this latest research — when a database system has been very carefully designed with privacy protection in mind.
It suggests the entire business of protecting privacy needs to get a whole lot more dynamic to respond to the risk of perpetually evolving attacks.
Academics from Imperial College London and Université Catholique de Louvain are behind the new research.
This week, at the 28th USENIX Security Symposium, they presented a paper detailing a new class of noise-exploitation attacks on a query-based database that uses aggregation and noise injection to dynamically mask personal data.
The product they were looking at is a database querying framework, called Diffix — jointly developed by a German startup called Aircloak andtheMax Planck Institute for Software Systems.
On its website Aircloak bills the technology as “the first GDPR-grade anonymization” — aka Europe’s General Data Protection Regulation, which began being applied last year, raising the bar for privacy compliance by introducing a data protection regime that includes fines that can scale up to 4% of a data processor’s global annual turnover.
What Aircloak is essentially offering is to manage GDPR risk by providing anonymity as a commercial service — allowing queries to be run on a data-set that let analysts gain valuable insights without accessing the data itself.The promise being it’s privacy (and GDPR) ‘safe’ because it’s designed to mask individual identities by returning anonymized results.
The problem is personal data that’s re-identifiable isn’t anonymous data. And the researchers were able to craft attacks that undo Diffix’s dynamic anonymity — although Aircloak is confident it has already prevented this attack.
“What we did here is we studied the system and we showed that actually there is a vulnerability that exists in their system that allows us to use their system and to send carefully created queries that allow us to extract — to exfiltrate — information from the data-set that the system is supposed to protect,” explains Imperial College’s Yves-Alexandre de Montjoye, one of five co-authors of the research paper.
“Differential privacy really shows that every time you answer one of my questions you’re giving me information and at some point — to the extreme — if you keep answering every single one of my questions I will ask you so many questions that at some point I will have figured out every single thing that exists in the database because every time you give me a bit more information,” he says of the pre