Kafka is a popular streaming system which provides replicated, sharded, append-only logs. Bufstream is a drop-in replacement for Kafka designed to prioritize data governance and cost efficiency in cloud environments.
Like Kafka, Bufstream provides a collection of named, partially ordered logs called topics. Each topic is divided into partitions.1 Each partition is a totally ordered, append-only list of records (also called messages or events). Within a partition, each record is uniquely identified by a monotonically advancing integer offset. Offsets may be sparse: some offsets are used for storing internal metadata and are invisible to clients.
Bufstream works with standard Kafka clients. There are two main types of clients in Kafka-compatible systems. Producers append records to partitions by calling producer.send(). Consumers read those records. Consumers are first bound to partitions via consumer.assign() or consumer.subscribe() operations.2 Once bound, one repeatedly calls consumer.poll() to receive records from any of those partitions. Each consumer can belong to a consumer group, which shares responsibility for processing records from a set of topics.
Each partition has a last stable offset (LSO), which is the highest offset below which every transaction has completed. It also has a committed offset for each consumer group, which is the highest offset below which that consumer group has processed all records in the partition.3