Do you worry about randomly generated strings containing not-so-savory language? We do! So, we had some fun over-engineering a solution to guard our

How we use Bloom filters to check for profanity

submited by

Style Pass

2022-01-12 22:30:02

Do you worry about randomly generated strings containing not-so-savory language? We do! So, we had some fun over-engineering a solution to guard our strings from profanity.

Imagine this: you’ve built a data model that relies on some form of randomly generated strings. Think: coupon codes or reservation IDs. At Offsyte, we have many of these types of strings, but among the most important are our booking IDs – a randomly generated 6-character string.

Everything goes smoothly until one of these strings contains a bad word. Now, your customer is sharing links containing this ID with others in reference to your product. As funny as this could be – depending on the gravity of the word – it has the potential to set a bad tone for customers and the brand.

We wanted to tackle this problem before anything offensive could sneak its way into the system, and we settled on a technically interesting and efficient solution.