Hashing Modulo Theories

submited by
Style Pass
2024-05-14 15:30:06

We anticipate that hash functions will have collisions. They have to. Good hash functions don’t have collisions for keys likely to appear together.

Consider for example (4,8) representing the fraction $\frac{4}{8}$. In our intended meaning this is “equal” to (1,2) aka $\frac{1}{2}$ , but if I use python’s default hashing function, they will map to different integers

More pragmatically, if I have some kind of AVL tree or red-black tree data structure representing a set or dictionary, the exact balancing is actually irrelevant internals. If I use an ordinary tree hashing algorithm on these, logically “identical” sets will map to different hashes, which is wrong. So what do you do?

Sets are useful in all sorts of ways, but in particular I the application I find most interesting is to hash sets that correspond to contexts in a automated theorem prover context (egglog, datalog, other).

Another example that I care about a lot is alpha invariant expressions. If the intended meaning of an expression is intended to be invariant to non-clashing renamings aka permutations of the variable labels. For example, foo(X,Y,X) = foo(Z,W,Z) != foo(X,Y,Z). This shows up in theorem proving contexts.