For years, popular stream processing frameworks such as Kafka Streams have used embedded RocksDB for state storage, primarily due to the efficiency of

Stop embedding RocksDB in your Stream Processor!

submited by
Style Pass
2024-10-15 22:30:06

For years, popular stream processing frameworks such as Kafka Streams have used embedded RocksDB for state storage, primarily due to the efficiency of local storage and RocksDB’s ability to handle high-throughput, low-latency operations. Embedded state eliminates network overhead by keeping state local to the processing code, which makes sense if you’re aiming to optimize for latency in a distributed setting. In fact, embedded state is an architectural pattern that has become almost synonymous with modern stateful stream processing.

But what if this default approach is suboptimal? What if we are unnecessarily sacrificing reliability, scalability, and maintainability for throughput? As stream processing use cases grow in complexity, and as demands for scalability, resilience, and operational simplicity increase, we’ve seen many developers trip over the inherent limitations of embedding RocksDB in their stream processor.

In this post, we’ll explore the underlying trade-offs of embedding RocksDB in your stream processor and why it's time to rethink how we handle state in stream processing frameworks.

Leave a Comment