Linear-log bucketing: fast, versatile, simple

submited by
Style Pass
2023-01-24 09:30:05

What do memory allocation, histograms, and event scheduling have in common? They all benefit from rounding values to predetermined buckets, and the same bucketing strategy combines acceptable precision with reasonable space usage for a wide range of values. I don’t know if it has a real name; I had to come up with the (confusing) term “linear-log bucketing” for this post! I also used it twice last week, in otherwise unrelated contexts, so I figure it deserves more publicity.

I’m sure the idea is old, but I first came across this strategy in jemalloc’s binning scheme for allocation sizes. The general idea is to simplify allocation and reduce external fragmentation by rounding allocations up to one of a few bin sizes. The simplest scheme would round up to the next power of two, but experience shows that’s extremely wasteful: in the worst case, an allocation for \(k\) bytes can be rounded up to \(2k - 2\) bytes, for almost 100% space overhead! Jemalloc further divides each power-of-two range into 4 bins, reducing the worst-case space overhead to 25%.

This sub-power-of-two binning covers medium and large allocations. We still have to deal with small ones: the ABI forces alignment on every allocation, regardless of their size, and we don’t want to have too many small bins (e.g., 1 byte, 2 bytes, 3 bytes, …, 8 bytes). Jemalloc adds another constraint: bins are always multiples of the allocation quantum (usually 16 bytes).

Leave a Comment