The missing tier for query compilers

submited by
Style Pass
2025-01-13 06:30:04

Database query engines used to be able to assume that disk latency was so high that the overhead of interpreting the query plan didn't matter. Unfortunately these days a cheap nvme ssd can supply data much faster than a query interpreter can process it.

The problem is that most of the cpu time is spent on the overhead of the interpreter itself - figuring out which operator is next, moving data between operators etc - rather than on the actual computation of the query. Depending on the query, removing the interpreter bottleneck can yield several orders of magnitude speedup.

For OLAP-ish queries, which tend to consist of summarizing data via full table scans over compressed column-oriented data, the standard solution is to 'vectorize' the interpreter - run each operator on a batch of rows instead of a single row. Then we only have to pay the interpreter overhead per batch rather than per row. This is a really well understood design and the industry is converging towards a standard set of data formats (eg arrow) and libraries (eg datafusion, velox) that work more or less out of the box.

For OLTP-ish queries, which tend to consist of searching and connecting sparse data using index lookups and hash-joins over row-oriented storage, vectorization doesn't work as well because the inner loop of each operator still has to have dynamic parameters specifying how long a row is and how to extract the value we're operating on. We can't pull that interpreter overhead out because there are an infinite number of possible combinations of parameters.

Leave a Comment