Imagine you’re a data analyst at a global company who’s been asked to provide employee statistics for a survey on remote working and distr

Understanding privacy risk with k-anonymity and l-diversity

submited by
Style Pass
2024-11-05 10:00:06

Imagine you’re a data analyst at a global company who’s been asked to provide employee statistics for a survey on remote working and distributed teams. You’ve extracted the relevant employee data, but sharing it as-is could violate privacy laws. How can you anonymize this data while ensuring it’s still useful? In this article, you’ll learn about k-anonymity and l-diversity—two valuable techniques in privacy engineering to help you reduce the privacy risk in datasets.

Fortunately, the survey partner doesn’t need this level of specificity. Let’s start by removing all fields that directly identify individual employees, such as full_name and email.

There, we’ve removed the names and emails, but is new dataset truly anonymous? Assume that an attacker knows that Yuki Tanaka lives in Japan. With this dataset, they could infer that Yuki works in Engineering and roughly how long he’s been with the company. Even though a dataset doesn’t directly identify an individual, some employees may still be identifiable.

It’s clear that our first attempt isn’t enough to reasonably protect the privacy of the employees. Next, we’ll see how we can further reduce the privacy risk with k-anonymity.

Leave a Comment