Kafka is a popoular replicated log service which forms the backbone of event-driven systems at GoJEK. Kafka consumer apps are powered by Ziggurat, an

Monitoring Kafka streams applications

submited by
Style Pass
2021-06-07 14:00:13

Kafka is a popoular replicated log service which forms the backbone of event-driven systems at GoJEK. Kafka consumer apps are powered by Ziggurat, an open source tool making it easier to spawn new kafka consumer apps.

At Gojek Kafka serves as the backbone of all our microservices, and knowing how it is performing every minutes enables us to improve the overall user experience for our end consumers. At Gojek we use Ziggurat to consume events from Kafka.

Although Ziggurat publis hes it’s own metrics like ingestion lag and throughput, these metrics are restricted to an application level, we had been lacking a sneak peak into the internals of Kafka Streams Threads and Tasks running in each of the VMs and pods across GoJEK. In order to get a visiblility at a stream thread or stream task level we had to read the metrics published by the kafka streams client itself and push it to our monitoring backend system.

We have had numerous production issues where we were not fully aware of what was happening inside the Kafka Stream Clients running on the VMs OR Pods. We would often observe lag on one particular partition and not know which Stream Thread OR Stream Task it belonged to. To figure out this information through logs is very tedious and time consuming.

Leave a Comment