Many popular business apps that have AI capabilities use the common RAG architectures pattern to generate more relevant and accurate results. Few exam

Building successful multi-tenant RAG applications

submited by

Style Pass

2024-10-30 17:00:16

Many popular business apps that have AI capabilities use the common RAG architectures pattern to generate more relevant and accurate results. Few examples:

Aside from being RAG apps, these examples all have something else in common: they are all multi-tenant applications, and in any given interaction, they only retrieve data for the current tenant.

This makes sense when you think about it. No marketing team wants to send personalized emails based on customer's interactions with a different company. Customer trust is the most important asset of B2B applications and the addition of AI makes establishing trust even more critical.

In addition to benefits from a user trust perspective, multi-tenant RAG also has performance benefits. RAG architectures have two high-latency steps: Retrieving vectors and LLM inference. Multi-tenant RAG makes vector storage and retrieval significantly more efficient: Searching vectors for a single customer is many orders of magnitude faster and more efficient than searching through all customers and then filtering the result. In many cases the number of vectors per customer is small enough that if you limit search to a specific customer, you can perform exact nearest neighbor search in a reasonable time, which means you benefit from perfect recall. As opposed to approximate nearest neighbor search, using indexes such as HNSW, where you trade-off resources, latency and recall.

RAG architectures rely on the ability to efficiently store and search large collections of vectors. The idea is that we take an input text and embed it (turn it into a vector that represents essential properties of that text). The reason we want to turn text into vectors is that we can take two vectors and check the distance between them. This means that for a given input vector, we can search a collection of stored vectors for the K nearest vectors (also known as K nearest neighbors). These vectors represent stored documents that are the most similar to the input text, and persumably most relevant as context for the LLM.