Accelerating Bloom Filters by 310x using Rust and WASM

submited by
Style Pass
2024-11-26 13:30:15

I built Nuenki to translate everyday web content into your target language, but only the bits you can actually learn from. It processes webpages, picks out sentences at your knowledge level, then translates them as you browse.

The central technical problem is economics: High-quality translation is expensive, particularly when you're translating parts of every single website you visit, every single day. One of the solutions to this is client-aware caching: The extension knows what translations are in the server's cache, so it can be more liberal about translating sentences that won't incur additional costs.

Nuenki uses Bloom filters, regularly downloaded from the server to the client, to track cached translations. They're probabilistic data structures that compress large sets into a compact form that can be queried for membership. They're space-efficient and fast, but can return false positives. That's a non-issue for caching, but worth keeping in mind, particularly because they tend to grow non-linearly as you reduce the false positive rate.

Optimising bloom filters quickly became a key part of optimising Nuenki as a whole. On top of querying every sentence in every webpage, Nuenki also extensively uses them in its sentence-difficulty estimation.

Leave a Comment