RAG stands for “Retrieval Augmented Generation” and the later two words is what you do when you paste the transcript of your companies all-hands i

RAG: The Shaky Foundation of AI

submited by

Style Pass

2024-07-01 04:00:04

RAG stands for “Retrieval Augmented Generation” and the later two words is what you do when you paste the transcript of your companies all-hands into ChatGPT and ask if your department was mentioned. General LLMs are only useful for generalized knowledge, so to do anything interesting you either need to train a model, or do some sort of RAG. The ‘R’ in RAG is doing most of the work though, or rather I should say it requires the most work to get right. Retrieving the correct content, reliably, is hard. Remember how loved Google was twenty years ago when they had search right when no one else did? Now every AI tool requires their own, internal, version of Good Google.

LLMs (and semantic search) are very, very, very good at putting an Instagram ‘beauty filter’ on their outputs, and objectively evaluating a RAG pipeline is hard. So if it looks like it’s working, it’s left alone. But it’s not working. Specifically, preparing content for RAG is not a solved problem.

Preparing documents for RAG is dependent on your use case, but has two constraints: how will it be stored in the retrieval system, and how will it be used in an LLM. Vector databases (semantic search) creates embeddings of from documents. Those embeddings have a maximum length they can be created from. Similarly, LLMs have a maximum length an input can be. Both of these maximums are given in their ‘token’ count. OpenAI has a fun tool for playing with the text/token relationship.