Embeddings are underrated#

submited by
Style Pass
2024-10-24 15:00:04

Machine learning (ML) has the potential to advance the state of the art in technical writing. No, I’m not talking about text generation models like Claude, Gemini, LLaMa, GPT, etc. The ML technology that might end up having the biggest impact on technical writing is embeddings.

Here’s an overview of how you use embeddings and how they work. It’s geared towards technical writers who are learning about embeddings for the first time.

Someone asks you to “make some embeddings”. What do you input? You input text. You don’t need to provide the same amount of text every time. E.g. sometimes your input is a single paragraph while at other times it’s a few sections, an entire document, or even multiple documents.

One input was drastically smaller than the other, yet they both produced an array of 3 numbers. Curiouser and curiouser. (When you work with real embeddings, the arrays will have hundreds or thousands of numbers, not 3. More on that later.)

Here’s the first key insight. Because we always get back the same amount of numbers no matter how big or small the input text, we now have a way to mathematically compare any two pieces of arbitrary text to each other.

Leave a Comment