Around 60% of stream processing is spent on mundane transformation tasks. Format unification for machine learning (ML) workloads, filtering for privac

Understanding the Redpanda Data Transform architecture

submited by

Style Pass

2024-04-25 21:30:04

Around 60% of stream processing is spent on mundane transformation tasks. Format unification for machine learning (ML) workloads, filtering for privacy, simple enrichments like geo-IP translations — the list goes on.

To stand up something “simple” often involves three or four distributed systems, multiple nights of reading configurations, and a few too many espressos. Once you’re done, you end up ping-ponging the data back and forth between storage and compute, when all you had to do was remove a field from a JSON object.

To the data engineer, it feels like an endless game of system whack-a-mole before you can start working on the good part of actually understanding the data.

Redpanda solves this problem and eliminates the data ping-pong with Redpanda's Data Transforms. Powered by WebAssembly (Wasm), this feature allows engineers to read data, prepare messages, and make API calls without the “data ping-pong” for simpler and less expensive in-broker data transformations.

Redpanda Data Transforms is built on the Wasmtime engine, which powers many other modern serverless platforms. Embedding this virtual machine directly onto each shard in Redpanda's thread-per-core architecture provides an alternative to the classical data back-and-forth when trying to make sense of real-time streaming data.