In November, I participated in a technologist roundtable about privacy and AI, for an audience of policy folks and regulators. The discussion was grea

Ted is writing things

submited by
Style Pass
2025-01-14 10:30:05

In November, I participated in a technologist roundtable about privacy and AI, for an audience of policy folks and regulators. The discussion was great! It also led me to realize that there a lot of things that privacy experts know and agree on about AI… but might not be common knowledge outside our bubble.

When you train a model with some input data, the model will retain a high-fidelity copy of some data points. If you "open up" the model and analyze it in the right way, you can reconstruct some of its input data nearly exactly. This phenomenon is called memorization.

Memorization happens by default, to all but the most basic AI models. It's often hard to quantify: you can't say in advance which data points will be memorized, or how many. Even after the fact, it can be hard to measure precisely. Memorization is also hard to avoid: most naive attempts at preventing it fail miserably — more on this later.

Memorization can be lossy, especially with images, which aren't memorized pixel-to-pixel. But if your training data contains things like phone numbers, email addresses, recognizable faces… Some of it will inevitably be stored by your AI model. This has obvious consequences for privacy considerations.

Leave a Comment