Fine-grained caching strategies of dynamic queries

submited by
Style Pass
2023-09-19 08:30:04

Today I would like to talk about caching strategies for aggregate queries over time-based data which is updated often. This is something I spent significant brain-cycles on my previous job and I would love to share some of my findings.

The table has a secondary ordered index on the date column in such a way that one can quickly query a slice of the dates (ie ... WHERE date BETWEEN '2023-05-24' AND '2023-05-31'). Let’s assume that the table consists of enough rows in such a way that SELECT SUM(amount) FROM transactions is slow.

It turns out that the above requirement is a surprisingly hard problem to solve! I like to relax engineering problems a bit to understand what is actually hard, so let’s build up a solution from scratch where we relax some of the problems:

For now, let’s assume that the table is immutable (ie. mutations are not allowed). So, what is wrong with simply constructing a SELECT query against immutable data? Well, it turns out that querying it is too slow (requirement 4). The classic way to solve this is to add a caching layer* in front of the database. We use the SQL query as our cache key, and return the cached value if it exists - otherwise, run the expensive query against the database.

Leave a Comment