If you're developing an application and find yourself running a benchmark whose results are measured in nanoseconds... you should probably stop and ge

Unnecessary Optimization in Rust: Hamming Distances, SIMD, and Auto-Vectorization | Evan Schwartz

submited by
Style Pass
2024-12-22 15:30:07

If you're developing an application and find yourself running a benchmark whose results are measured in nanoseconds... you should probably stop and get back to more important tasks. But here we are.

I'm using binary vector embeddings to build Scour, a service that scours noisy feeds for content related to your interests. Scour uses the Hamming Distance to calculate the similarity between users' interests and each piece of content. (As a refresher, the Hamming Distance between two bit vectors is simply the number of bits that are set differently between the two.) I got nerd sniped into wondering which Hamming Distance implementation in Rust is fastest, learned more about SIMD and auto-vectorization, and ended up publishing a new (and extremely simple) implementation: hamming-bitwise-fast.

(Note that we are not comparing the distances, stringzilla, or triple_accel crates because those calculate the Hamming distance between strings rather than bit-vectors.)

Leave a Comment