Let’s imagine I have a database table — maybe a large collection of events, the sort of thing with a created_at timestamp and a few other columns.

How filter pushdown works

submited by
Style Pass
2025-08-01 00:00:13

Let’s imagine I have a database table — maybe a large collection of events, the sort of thing with a created_at timestamp and a few other columns. We’ll also imagine that I want fast, consistent queries as my data changes, so I’ve imported that table into Materialize.

Materialize splits the data in a durable collection like this into multiple bounded-size parts, and stores each of those parts in an object store like S3. It stores the metadata separately, in a serializable store like CockroachDB or Postgres; this includes pointers to all the individual parts in the blob store, along with other metadata that Materialize needs to manage that collection as parts are added and removed over time.

Materialize compiles this query down to a dataflow; in this precise case, you could think of it as a pipeline with roughly the following stages:

Because of that filter, the reduce stage may only see a small fraction of the rows that are present in our collection. As it happens, it’s fairly common for all the rows that match a filter to be stored in just a small subset of the parts:

Leave a Comment
Related Posts