How I finally got agentic RAG to work right

submited by

Style Pass

2024-10-02 13:30:02

“LLMs suck at reasoning” I lamented to my co-founder. “They suck at reasoning and they suck at generating JSON when I tell them to.” I had just spent the better part of two days fighting with OpenAI, Claude, and Llama3. I wanted them to reason their way through helping a mock customer resolve simple product issues. I was tempted to dismiss the excitement around AI agents as irrational exuberance (and to be fair, there is quite a bit of irrational exuberance to be found), but I decided to press on. I’m glad I did.

With both traditional RAG and agentic RAG, you populate your search indexes using a RAG pipeline. The process looks something like this:

If someone asked you to look at this diagram and determine if this was going to support agentic RAG agents or some random conversational AI chatbot, you would have a hard time saying for sure. In both cases, you need to retrieve information. That involves optimizing retrieval processes. That involves RAG pipelines.

There were three data sources that contained information that could help our agentic RAG system answer common user questions. (I was doing this work for a customer and want to respect confidentiality. For that reason, I’m going to avoid mentioning exact technologies we were pulling data from to stay on the safe side of any privacy and security concerns.)