Compilers are (too) smart // .mischief.mayhem.soap.

submited by
Style Pass
2024-06-08 06:30:03

Last week I noticed some unexpected assembly code in a function calling std::unordered_map::find (yes, I know, friends don’t let friends use STL, but that is a topic for another discussion). I decided to investigate a bit and it turned out to be quite a journey. It was a Clang compiler, ARM-based platform, I will use x64 here as behavior is similar and it is easier to repro in the Compiler Explorer. What is important however, we should be using libcxx STL (-stdlib=libc++ switch). A minimal example might look like:

(ident is our version of std::identity, ie. our hash of ‘x’ is just ‘x’, key was pre-hashed, I don’t think it matter, but removes hashing from the equation) Simple enough, yet if we look at the generated assembly there is this giant blob at the beginning:

It bothered me because it was almost as many instructions as the rest of the function on the fast path! What is going on here? At first I thought it was not eliminating the hash function properly, but it was quickly disproved. Don’t kick yourself if you can’t tell what that code is doing, it was a bit more obvious in the ARM version as it uses hex constants. What are these weird numbers? Well, 6148914691236517205 = 0x5555555555555555 3689348814741910323 = 0x3333333333333333 1085102592571150095 = 0xf0f0f0f0f0f0f0f 72340172838076673 = 0x101010101010101 Still not clear? It was not for me either, but it was familiar enough and ‘bit-twiddly’ enough to find an answer with some quick web search - it basically counts the number of set bits (1s) in a word. I still didn’t understand where did it come from, it was all inlined in optimized builds and obviously looked very different in debug. I tried to step through the C++ code anyway, looking for clues and found this . find calls constrain_hash (multiple times) – it basically converts a potentially very big hash to 0->bucket count range…

Leave a Comment