A big caveat, there have been no large scale evals yet for Entropix, so it’s not clear how much this helps in practice. But it does seem to introduc

Detecting when LLMs are Uncertain

submited by

Style Pass

2024-10-25 18:00:11

A big caveat, there have been no large scale evals yet for Entropix, so it’s not clear how much this helps in practice. But it does seem to introduce some promising techniques and mental models for reasoning.

Sampling is the process of choosing which token from the distribution of possible tokens (the logits) that a LLM chooses. You can tell how confident a model is in its predictions by looking at that distribution.

But in reality, models are not always so sure of their predictions. You will often run into cases where the next token prediction looks like this:

Entropy measures how different the predicted logits are from each other, i.e. how uncertain we are in the most probably outcome. In low entropy, we are pretty certain in a few of the logits. In high entropy, the distribution of the logits is more uniform and we are much less certain.

Varentropy is a different type of entropy metric. It gives us an idea of the “shape” of the uncertainty. High varentropy indicates that some of the values are highly different from others.

Detecting when LLMs are Uncertain

Leave a Comment

Related Posts

Recent Posts

What screen time does to children's brains is more complicated than it seems

Common Open Software Environment

The Hidden Infrastructure Costs of Enterprise AI Adoption

All-Source Intelligence

Sneaking Into the Spy Museum’s New Vault

Hundreds of agencies tap Atherton’s surveillance system for feds; town fails to follow own rules

BingeWorthy - Edutainment Videos

The upstart company that wants to build the world's largest aircraft

Tracking source locations

The leaked email that blows apart the BBC’s impartiality claims over Gaza

Search code, repositories, users, issues, pull requests...

Use DNSBL to block AI crawlers in Caddy

Money by Vile Means | Compact

Oulipo - Wikipedia

PlayReady DRM Leak Triggers Microsoft Takedown and Amazon Account Suspensions

Improve PostgreSQL performance: Diagnose and mitigate lock manager contention

'Communities' of strange, extreme life seen for first time in deep ocean

Debug WebSocket Like a Pro

SUSPENDING DUTY-FREE DE MINIMIS TREATMENT FOR ALL COUNTRIES

The Sunday Morning Post: ‘Science Is the Belief in the Ignorance of Experts’