PenPen's Note: This article is written with the intent of accessibility to non-maths folk who possess some computer knowhow. It comes in wake of the shit storm following Jeff Johnson’s recent “Apple Photos phones home on iOS 18 and macOS 15”. There’s a lot of confusion and curiosity about how this technology works, along with criticisms lobbed at Apple’s densely packed published research. The goal of this post is to distill that research into a more understandable package, so that you can make more informed decisions about your data. “Nowhere does Apple plainly say what is going on”, but maybe I can.
You are Apple. You want to make search work like magic in the Photos app, so the user can find all their “dog” pictures with ease. You devise a way to numerically represent the concepts of an image, so that you can find how closely images are related in meaning. Then, you create a database of known images and their numerical representations (“this number means car”), and find the closest matches. To preserve privacy, you put this database on the phone.
All of this, as cool as it might sound, is a solved problem. This “numerical representation” is called an embedding vector. A vector is a series of coordinates in a very high dimensional space. One dimension might measure how “dog-like” a thing is. Another might measure how “wild-like” a thing is. Dog-like and wild-like? That’s a wolf. We can compare distances using algorithms like cosine similarity. We are quite good at turning text into vectors, and only slightly worse at doing the same for images.