Apache Kafka is a building block of many cloud architectures. Itâ€™s heavily used to move large amounts of data from one place to another because itâ€™s performant and provides good reliability and durability guarantees. Processing large amounts of data in the cloud can easily become one of the more expensive services in your cloud provider bill. Apache Kafka provides many configuration options to reduce the cost, both at the cluster and client level, but it can be overwhelming to find the proper values and optimizations. New Relic is a heavy user of Apache Kafka, and has previously published several blogs such as 20 best practices for Apache Kafka at scale. In this article, weâ€™ll expand those insights using monitoring to highlight ways to tune Apache Kafka and reduce cloud costs.
The first step to optimizing costs is to understand them. Depending on how Apache Kafka is deployed and the cloud provider used, cost data can be provided in a different way. Letâ€™s use Confluent Cloud as an example. Confluent Cloud is a managed service available in the main cloud providers. They provide a billing API with the aggregated-by-day costs. The API is a great first step, but with that data imported into New Relic, itâ€™s possible to understand the evolution over time and even have anomaly detection with applied intelligence with proper alerts and notifications. Nobody likes unexpected cloud costs.