UNORM and SNORM to float, hardware edition

submited by
Style Pass
2024-12-24 16:00:05

I mentioned in a previous post that doing exact UNORM or SNORM conversions to float in hardware was not particularly expensive, but didn’t go into detail how. Let’s rectify that! (If you haven’t read that post yet, please start there if you need an explanation of what the UNORM and SNORM formats are, as well as the basic idea.)

using the specific case n=8, but the proof is identical for other n. For UNORM8, SNORM8, UNORM16 and SNORM16 we need the cases n=8, n=7, n=16 and n=15, respectively. This is already good news, because this boils down to just a number of bit shifts (for the divisions by powers of two) and adds. In the cases we care about for UNORM/SNORM conversions, we also have 0 ≤ x ≤ 2n – 1, meaning it’s actually even simpler than adds, because we have n-bit values shifted by multiples of n bits, making the bit ranges non-overlapping, so there’s not even real additions involved, we’re really just repeating a bit string over and over. Therefore, even though expanding our division by 2n – 1 into an infinite series looks scary at first sight, it gives us a straightforward way to get the exact quotient to as many significant digits as we want.

With the scary-looking division step turning into something fairly benign, this looks fairly doable, but there’s a few special cases to take care of first.

Leave a Comment