DuckDB is a new SQL execution engine that has been getting lots of attention in the last year. There are many projects like it, so what makes DuckDB special? First, it's exceptionally simple to operate. It's a standalone C++ library with no dependencies, so it's easy to incorporate into larger projects. Second, it's fast. It incorporates the key idea, columnar-vectorized execution, that makes modern analytical databases fast.
But just how fast is DuckDB? Fans often compare it to systems like Postgres or SQLite, but these are operational databases. They are not especially fast at analytical workloads. We would like to know how DuckDB compares to the best analytical databases running on similar hardware. Benchmarking is something of a hobby of mine, so I decided to find out for myself.
I used DuckDB's built-in tpcds extension to generate the TPC-DS dataset and queries. To understand how performance varies with data size, I generated three different-sized datasets for each configuration: a relatively small size, a medium size, and a large size.