I was talking to someone recently about how searchcode.com works with regards to its bloom filter. The question came up about why it stores the index in memory and not on disk considering how fast disks are these days.
Now the answer is firstly performance, but the second is that writing all the code to store the index on disk is a lot of work and I didn’t feel like working on it. However I was curious… since I have been using SQLite a lot recently, how about storing the bloom filter on disk using that? It comes with its own indexes for stepping though the filters and is likely out of the box to be better than anything I could come up with quickly.
So I tried it out. Pulling some existing code I had https://github.com/boyter/indexer I was able to modify it slightly to create a bloom filter backed by SQLite.
I then populated it with some random values in order to simulate a bloom filter and ran some queries over it using two different select approaches. The first being where each portion of the filter is pulled back by a single select, and another where all of the values needed to be checked were pulled back. In effect