One of the projects I have built is a long-standing retrieval-augmented generation (RAG) application. Documents are saved in a database, chunked into

Jonathan's blog

submited by
Style Pass
2024-09-04 23:30:06

One of the projects I have built is a long-standing retrieval-augmented generation (RAG) application. Documents are saved in a database, chunked into a reasonable amount of text that a large language model (LLM) can handle, and turned into numerical representation (vectors).

A user at some point later asks a question that get turned also into a numerical representation. We compare the numbers and do some math to identify the top k(3) chunks of texts that matches the question. This is feed that into an LLM, and we get an answer based on the documents uploaded.

RAG implementations are notorious for being unscientific in nature, and more art than science. How do you transform a PDF into text? What about tabular data? What if there are pictures in the document and they are pretty important? How long should the chunks be? How many chunks? Should you use Cosine similarity for the math or something else? and a million other questions where the correct answers is always "it depends".

I am and you should be highly suspicious of generic "one size fits all" RAG solutions. The really good ones customize for the use-case, user types, and the documents.

Leave a Comment