Intro  In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed

Thoughts on Rust hashing | purplesyringa's blog

submited by
Style Pass
2024-12-11 20:30:07

Intro In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:

How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.

How do you mix hashes? You can:Leave that to the users. Everyone will then invent their own terrible mixers, like x ^ y. Indeed, both arguments are pseudo-random, what could possibly go wrong?Provide a good-enough mixer for most use cases, like a * x + y. Cue CVEs because people used mix(x, mix(y, z)) instead of mix(mix(x, y), z).Provide a quality mixer, missing out on performance in common simple cases.

What guarantees do you provide regarding the hash values?Do you require the avalanche effect? Your hash is suboptimal even for simple power-of-two-sized hash tables.Do you require a half-avalanche effect instead? Congrats, you broke either those or prime-sized hash tables.Do you require the hash table to perform finalization manually? Using strings as keys is now suboptimal, because computing a non-finalized hash of a string is of good enough quality already.

Leave a Comment