Using a LLM-as-a-judge for hallucinations is slow and imprecise relative to simple NER. We share how we solved hallucination detection using based NE

Accurate Hallucination Detection With NER

submited by

Style Pass

2025-01-08 00:30:07

Using a LLM-as-a-judge for hallucinations is slow and imprecise relative to simple NER. We share how we solved hallucination detection using based NER and keyword detection.

You can find all the code involved in our NER system, including benchmarks, at github.com/devflowinc/trieve/tree/main/hallucination-detection.

Our method zeroes in on the most common and critical hallucinations—those that could mislead or confuse users. Based on our research, a large percentage of hallucinations fall into three categories:

Instead of throwing complex language models at the problem with a LLM-as-a-judge approach, we use Named Entity Recognition (NER) to spot proper nouns and compare them between the gen AI completion and the retrieved reference text. For numbers and unknown words, we use similarly straightforward techniques to flag potential issues.

Our approach will only work in use-cases where RAG is present which is fine given that Trieve is a search and RAG API. Further, because the most common approach to limiting hallucinations is RAG, this approach will work for any team building solutions on top of other search engines.