A new research collaboration between France and the UK casts doubt on growing industry confidence that synthetic data can resolve the privacy, quality

Synthetic Data Does Not Reliably Protect Privacy, Researchers Claim

submited by
Style Pass
2021-09-23 10:00:10

A new research collaboration between France and the UK casts doubt on growing industry confidence that synthetic data can resolve the privacy, quality and availability issues (among other issues) that threaten progress in the machine learning sector.

Among several key points addressed, the authors assert that synthetic data modeled from real data retains enough of the genuine information as to provide no reliable protection from inference and membership attacks, which seek to deanonymize data and re-associate it with actual people.

Furthermore, the individuals most at risk from such attacks, including those with critical medical conditions or high hospital bills (in the case of medical record anonymization) are, through the ‘outlier’ nature of their condition, most likely to be re-identified by these techniques.

‘Given access to a synthetic dataset, a strategic adversary can infer, with high confidence, the presence of a target record in the original data.’

Leave a Comment