In a previous article in this series, I explored the magic of INSERT...UNNEST for improving PostgreSQL batch INSERT performance. While it’s a fantastic technique, I know it’s not the fastest option available (although it is very flexible). Originally, I hadn't intended to loop back and benchmark all the batch ingest methods, but I saw a lot of confusion out there, so I'm back, and this time I'm looking at COPY too. As usual for this series, it’s not going to be a long post, but it is going to be an informative one.
I flipped my approach for this post, comparing not just the PostgreSQL database performance in isolation but the practical performance from an application. To do this, I built a custom benchmarking tool in Rust to measure the end-to-end performance of each method. In this article, I’ll walk you through the batch ingest options you’ve got, and how they stack up (spoiler alert, the spread is over 19x!).
I’m defining batch ingest as writing a dataset to PostgreSQL in batches or chunks. You’d usually do this because the data is being collected in (near) real time (think a flow of IoT data from sensors) before being persisted into PostgreSQL (hopefully with TimescaleDB, although that's out of scope for this post).