Optimizing Open Addressing – Max Slater – Computer Graphics, Programming, and Math

submited by
Style Pass
2023-03-19 19:00:10

Your default hash table should be open-addressed, using Robin Hood linear probing with backward-shift deletion. When prioritizing deterministic performance over memory efficiency, two-way chaining is also a good choice.

Most people first encounter hash tables implemented using separate chaining, a model simple to understand and analyze mathematically. In separate chaining, a hash function is used to map each key to one of \(K\) buckets. Each bucket holds a linked list, so to retrieve a key, one simply traverses its corresponding bucket.

Given a hash function drawn from a universal family, inserting \(N\) keys into a table with \(K\) buckets results in an expected bucket size of \(\frac{N}{K}\), a quantity also known as the table’s load factor. Assuming (reasonably) that \(\frac{N}{K}\) is bounded by a constant, all operations on such a table can be performed in expected constant time.

Unfortunately, this basic analysis doesn’t consider the myriad factors that go into implementing an efficient hash table on a real computer. In practice, hash tables based on open addressing can provide superior performance, and their limitations can be worked around in nearly all cases.

Leave a Comment