I'm pleased to announce that we came up with a novel technique that leverages the strength of LLMs to find the correct files to answer your prompts. W

RAGs to RIChes

submited by
Style Pass
2024-10-15 09:30:08

I'm pleased to announce that we came up with a novel technique that leverages the strength of LLMs to find the correct files to answer your prompts. We're calling the strategy RIC: Retrieval Input Compression. We index the git history and the files statically. Then, essentially, we compress the information at the retrieval stage, which retains the semantic meaning. This gives the LLM context-fitting data to infer the exact appropriate files. It solves the apples to oranges problem and eliminates the necessity for file chunking. The results are astounding. We can chat with git projects at scale.

Here is a video where I chat with Algorand's indexer project, which ingests Algrorand transactions for ease of discoverability. It has nearly 400 files checked into git, and it works unreasonably well.

The other key booster over traditional RAG is that it indexes in vast amounts of dimensions as it's tightly integrated into git. Not only is it understanding the code based on the commit history, but also the static state of the files checked into git. As a result, it can find the right file(s) like a needle in a haystack.

Leave a Comment