Cross-entropy. Intuition and applications.

submited by
Style Pass
2025-08-07 00:30:04

In pop science, entropy is considered a measure of disorder: a system has high entropy when it is disordered (e.g., my college bedroom), and low entropy when it is ordered (e.g., a chocolate box). This meaning probably has its roots in thermodynamics, where my college bedroom was evidence of the universe getting ever closer to its heat death.

But for us, entropy is not (only) about messy bedrooms, but about messy data. That’s why I will focus on Shannon’s entropy \(H(P)\), which is a property of a probability distribution \(P\). For a discrete random variable:

As stated, using the binary logarithm, entropy is measured in bits; when using the natural logarithm instead, the unit of measure is nats. From here on, \(\log\) means \(\log_2\).

In a nutshell, entropy is the average surprise we’ll experience when observing a realization of \(P\): if an outcome is rare (\(P(x)\) is small), observing it should be quite surprising (\(\log \frac{1}{P(x)}\) is large); if it is very common, the surprise should be low.

A more tangible interpretation of entropy links it to the encoding of a message. Imagine we want to encode the outcome of a probability distribution. We observe an outcome and want to unambiguously communicate it to a friend. For instance, let’s say the weather in my city follows the following probability distribution:

Leave a Comment
Related Posts