How bloom filters made SQLite 10x faster

submited by
Style Pass
2024-12-22 15:00:04

This is the fascinating story of how researchers used Bloom filters cleverly to make SQLite 10x faster for analytical queries. These are my five-minute notes on the paper SQLite: Past, Present, and Future (2022). I’ll also explain some database internals and how databases implement joins.

SQLite is a B-tree on disk, row-based storage. It internally uses a VM called VDBE to execute queries. It is cross-platform, single-threaded, and runs almost everywhere.

SQLite is a general-purpose database, but it excels at OLTP workloads. However, researchers at Buffalo University in 2015 found that most queries are simple key-value lookups and complicated OLAP queries. So, researchers at the University of Wisconsin-Madison set out to make it faster for analytical queries since the trend is changing. To set a baseline, they used DuckDB. They used the industry standard OLAP benchmark - Star Schema Benchmark (SSB).

SSB contains a bunch of analytical queries called Star Queries, where you have one large fact table and multiple smaller dimension tables. e.g., fact table = customer orders, dimension tables = customer, merchant info, delivery partners, etc.

Leave a Comment