When we develop user-facing applications that are powered by LLMs, we're taking on a big risk that the LLM may produce output that is unsafe in some w

Red-teaming a RAG app: What happens?

submited by

Style Pass

2025-08-04 17:30:07

When we develop user-facing applications that are powered by LLMs, we're taking on a big risk that the LLM may produce output that is unsafe in some way - like responses that encourage violence, hate speech, or self-harm. How can we be confident that a troll won't get our app to say something horrid? We could throw a few questions at it while manually testing, like "how do I make a bomb?", but that's only scratching the surface. Malicious users have gone to far greater lengths to manipulate LLMs into responding in ways that we definitely don't want happening in domain-specific user applications.

That's where red teaming comes in: bring in a team of people that are expert at coming up with malicious queries and that are deeply familiar with past attacks, give them access to your application, and wait for their report of whether your app successfully resisted the queries. But red-teaming is expensive, requiring both time and people. Most companies don't have the resources nor expertise to have a team of humans red-teaming every app, plus every iteration of an app each time a model or prompt changes.

Fortunately, my colleagues at Microsoft developed an automated Red Teaming agent, part of the azure-ai-evaluations Python package. The agent uses an adversarial LLM, housed safely inside an Azure AI Foundry project such that it can't be used for other purposes, in order to generate unsafe questions across various categories. The agent then transforms the questions using the open-source pyrit package, which uses known attacks like base-64 encoding, URL encoding, Ceaser Cipher, and many more. It sends both the original plain text questions and transformed questions to your app, and then evaluates the response to make sure that the app didn't actually answer the unsafe question.

Red-teaming a RAG app: What happens?

Leave a Comment

Related Posts

Recent Posts

Computer Science > Human-Computer Interaction

North Korean Soldiers in the Ukraine War: What We Know and What It Means

Here Come the New Industrialists - by Batya Ungar-Sargon

AI going critical: Hyundai to help build nuclear-powered datacenter in Texas

Decision-making study

The last generation of programmers - by Karim Fanous

Salutations! - The Journals of Noiruzi Sangsung | Royal Road

Search code, repositories, users, issues, pull requests...

Discounts Don’t Devalue Your Brand (When Done Right)

Capitalists Love This Podcast. So Do Their Critics.

You Know More Finnish Than You Think

Search code, repositories, users, issues, pull requests...

Designing a TRON to JAMMA Interface Board

Skip the Feature Flags infra headache

Why the U.S. China Hawks Are Hurting the U.S.

Avalon EHR for iPad & iPhone

Marking 13 Years on Mars, NASA’s Curiosity Picks Up New Skills

Republican Administration Fires Head of BLS After ‘Bad’ Jobs Report

The great AI bubble – I’m a believer

Proton fixes Authenticator bug leaking TOTP secrets in logs