A team of computer scientists devised a way to quickly remove traces of sensitive user information from machine learning models.
Rising consumer concern over data privacy has led to a rush of “right to be forgotten” laws around the world that allow individuals to request their personal data be expunged from massive databases that catalog our increasingly online lives. Researchers in artificial intelligence have observed that user data does not only exist in its raw form in a database, it is also implicitly contained in models trained on that data. So far, they have struggled to find methods for deleting these “traces” of users efficiently. The more complex the model is, the more challenging it becomes to delete data.
“ The exact deletion of data — the ideal — is hard to do in real time,” says James Zou , a professor of biomedical data science at Stanford University and an expert in artificial intelligence. “In training our machine learning models, bits and pieces of data can get embedded in the model in complicated ways. That makes it hard for us to guarantee a user has truly been forgotten without altering our models substantially.”
Zou is senior author of a paper recently presented at the International Conference on Artificial Intelligence and Statistics (AISTATS) that may provide a possible answer to the data deletion problem that works for privacy-concerned individuals and artificial intelligence experts alike. They call it approximate deletion.