Sampling a sample from a distribution appears in a lot of places in natural sciences, but lately it’s mostly been used to power LLMs, due to how the Transformer architecture works internally. This blogpost will talk about how to efficiently sample from a categorical distribution given logits.
Mathematically, most general method to sample a categorical distribution is to sample from its quantile function. A quantile function is the inverse of the cumulative distribution function (CDF), and CDF for a particular value xx x is defined as the sum of PMF probabilities for all the values less than xx x . Usually, though, LLMs output logits and not PMF, so we need to calculate PMF from the logits using softmax. Lots of fancy words, so let’s see an example:
2. The PMF of a chosen distribution. This is the output of a Softmax operation. We can se that the relative shape went unchanged, but high values were pushed up highed, and low values pushed down lower.