This is the fourth installment in a multi-part series on evaluating various RAG systems using Tonic Validate, a RAG evaluation and benchmarking platform. All the code and data used in this article is available here. We’ll be back in a bit with another comparison of more RAG tools!
I (and likely others) am curious to hear how you’ve been using the tool to optimize your RAG setups! Use Tonic Validate to score your RAG and visualize experiments, and let everyone know what you’re building, which RAG system you used, and which parameters you tweaked to improve your scores on X (@tonicfakedata). Bonus points if you also include your charts from the UI. We’ll promote the best write-up and send you some Tonic swag as well.
Hello again! In this series, we’ve heavily focused on young, nascent companies building RAG tooling, but there is also a host of RAG product suites offered by the big cloud providers. So, for this evaluation, I decided to evaluate Amazon Bedrock to see how some of Bedrock’s offerings perform at RAG. Amazon Bedrock has base models in the modalities of text, embedding, and image that anyone can use to build AI applications. For RAG specifically I’ll be looking at their text and embedding models. Bedrock has text models from Anthropic, Cohere, AI21 Labs, Meta, Stability AI, and Amazon, as well as embedding models from Amazon and Cohere. As you can see, there are a lot of models in Bedrock to choose from when deciding to build a RAG application. For this post, I’ll use Amazon Bedrock to compare head to head a RAG system using Amazon’s Titan models to a RAG system using Cohere’s models.
In the following sections you’ll see code for exactly how to implement each step. For now, I’ll summarize what each step consists of and show the code for a base class for implementing a simple RAG system.