A big caveat, there have been no large scale evals yet for Entropix, so it’s not clear how much this helps in practice. But it does seem to introduce some promising techniques and mental models for reasoning.
Sampling is the process of choosing which token from the distribution of possible tokens (the logits) that a LLM chooses. You can tell how confident a model is in its predictions by looking at that distribution.
But in reality, models are not always so sure of their predictions. You will often run into cases where the next token prediction looks like this:
Entropy measures how different the predicted logits are from each other, i.e. how uncertain we are in the most probably outcome. In low entropy, we are pretty certain in a few of the logits. In high entropy, the distribution of the logits is more uniform and we are much less certain.
Varentropy is a different type of entropy metric. It gives us an idea of the “shape” of the uncertainty. High varentropy indicates that some of the values are highly different from others.