I work for Confluent, where the table/stream duality has been written and spoken about for many years. Jay Kreps wrote about it back in 2013 while sti

Table format comparisons - Change queries and CDC

submited by
Style Pass
2024-09-19 18:30:05

I work for Confluent, where the table/stream duality has been written and spoken about for many years. Jay Kreps wrote about it back in 2013 while still at LinkedIn. This duality concept is about how the stream and the table are different sides of the same coin:

A stream is a log of events where each event represents some change in the world. For example, a user account was created, and then, in another event, it was deleted. OLTP databases use transaction logs (aka write-ahead-logs and redo-logs) where changes are written before being applied to a table.

A table is a point-in-time snapshot of the world. At point t1, user 1 existed in the table, and in point t2, the user did not exist. Some databases store the table state of multiple points in time, with the table formats being one example with their immutable data + log of deltas/snapshots design.

This post, and its associated deep dives, will look at how changes made to an Iceberg/Delta/Hudi/Paimon table can be emitted as a stream of changes. In the context of the table formats, it is not a continuous stream, but the capability to incrementally consume changes by performing periodic change queries.

Leave a Comment