The open-source project DP-RAG, from Sarus, addresses the challenge of maintaining privacy in Retrieval-Augmented Generation (RAG) systems. RAG is a popular method for enhancing Large Language Models (LLMs) by providing them with current and relevant information. However, incorporating external documents into the generation process introduces privacy risks, as responses might inadvertently expose confidential data. DP-RAG uses Differential Privacy (DP) to aggregate information from multiple documents, thereby preventing the disclosure of sensitive data. This is particularly relevant in contexts where sensitive information is handled, such as healthcare, finance, and government.
DP-RAG uses a novel token-by-token aggregation technique, and a method to collect documents related to a question without preventing the output from being used in a DP mechanism. The technical report, RAG with Differential Privacy, also presents empirical results demonstrating the effectiveness of DP-RAG, particularly when there are enough documents providing the necessary information.