I'm working on a database system that stores and queries chess games and positions. Right now, it contains 240 million unique positions1 from 3.8 million games. One of the things it needs to do is quickly find all the games where a particular position occurs. I'd also like it to do things like find games where this position occurs and it ends in a draw.
Bitmaps are really useful here, and with some care they can achieve unbelievable efficiency. They can also be really slow if you're not careful. It's a journey.
We'll start by looking at how my bitmaps are implemented, and then we'll see how an assumption punished me severely and how I fixed it to make things a lot faster.
A bitmap is a sequence of bits which indicate a boolean condition across a range of items. You can use them to store a true/false condition for any collection which you can assign sequential integer IDs to.
For example, if you have 8 cats, you could store for each one whether it is cute or not. The simple approach is to store a list of booleans, each which indicates this condition. This is wasteful: Each of those booleans takes at least one byte of memory, but you're only really using one bit from that byte! You would allocate 64 bits to use 8.