The ability to comprehend and assess a system’s state using the data it produces is known as observability. It is one of the most important traits of system reliability that gives an opportunity to take action. Observability is crucial, especially in distributed systems where end-to-end testing is really difficult or even impossible. There are three pillars of observability:
In this post, I would like to cover only the metrics part and show how we can implement web application monitoring using Prometheus. We discuss how to choose some metrics as Service Level Indicators (SLIs) and also how to ensure system reliability by setting proper targets as System Level Objectives (SLOs).
The aim of Prometheus is quite simple: to have a metrics server that gathers data from several systems or applications and aggregates them in one place. Prometheus collects metrics by scraping targets that expose metrics using HTTP endpoints.
Exporters are processes that run on Prometheus targets responsible for serving metrics and presenting them in an easily consumable format for Prometheus. Prometheus fetches the data from exporters in a pull-based manner (there is also pushgateway approach, but we would not cover it here). There are a lot of native exporters that can be used, e.g. Node exporter (Linux Server), MySQL exporter, etc. Each service that exposes an HTTP endpoint with properly formatted metric data can be an exporter.