Let’s say you’re working with sensor data in PostgreSQL, with each reading containing a sensor ID, timestamp, and value. You want to power an appl

PostgreSQL DISTINCT: TimescaleDB’s SkipScan Under Load

submited by

Style Pass

2024-11-07 20:30:05

Let’s say you’re working with sensor data in PostgreSQL, with each reading containing a sensor ID, timestamp, and value. You want to power an application dashboard that needs to know the last known state of each sensor in your fleet. Your query might look like this:

The DISTINCT ON clause ensures only one record per sensor is selected, and because the query is ordered by descending timestamp, you’ll get the latest reading for each sensor. Simple enough, right?

In practice, this query pattern can be inefficient, even with proper indexing. In this post, I’ll explain why and walk through a benchmark demonstrating that TimescaleDB’s SkipScan can optimize this query by an astonishing 10,548x at p50 and 9,603x at p95.

SkipScan is one of those TimescaleDB features that flies under the radar but provides impressive performance improvements—especially given it works with both Timescale’s hypertables and standard PostgreSQL tables (although not currently on compressed hypertables).

As tables and indexes grow, DISTINCT queries slow down in PostgreSQL because it doesn’t natively pull unique values directly from ordered indexes. Even if you have a perfect index in place, PostgreSQL will still scan the full index, filtering out duplicates only after the fact. This approach leads to a significant slowdown as tables grow larger.