LLMs are proficient on about 80% of tasks—it’s the last 20% that often prevents a project from reaching production. The “human-in-the-loop” approach is designed to close this gap by leveraging a human expert to review LLM outputs. But how do you determine when to use human verification? Requiring humans to review millions or billions of outputs a day is not very efficient, and LLMs themselves are notoriously bad at identifying when they have made a mistake. It's also not a good experience for end users if they're forced to decipher when an LLM is misleading them and then request human assistance.
To solve this problem, we need to be able to accurately predict exactly when a query can be answered by an LLM and when it should go to a human expert.
In this tutorial, we will use Not Diamond to build a custom router that determines when a query can be answered by an LLM and when it should go to a human. We’ll then use LangGraph to to build an app that routes queries between LLMs and humans. You can follow along with the example below or in this notebook.