DP-RAG addresses privacy concerns in RAG systems by using DP to aggregate information from multiple documents, thereby preventing the inadvertent disclosure of sensitive data. The core innovation involves a novel token-by-token aggregation technique and a DP-based document retrieval method.
The technical report presents empirical results demonstrating DP-RAG's effectiveness, particularly when sufficient documents provide the necessary information. The repo also contains the code to evaluate the system on synthetic medical data.