This blog post and linked pages contain information related to upcoming products, features, and functionality. It is important to note that the inform

How we reduced 502 errors by caring about PID 1 in Kubernetes

submited by
Style Pass
2022-05-18 07:30:07

This blog post and linked pages contain information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. As with all projects, the items mentioned in this blog post and linked pages are subject to change or delay. The development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.

Our SRE on call was getting paged daily that one of our SLIs was burning through our SLOs for the GitLab Pages service. It was intermittent and short-lived, but enough to cause user-facing impact which we weren't comfortable with. This turned into alert fatigue because there wasn't enough time for the SRE on call to investigate the issue and it wasn't actionable since it recovered on its own.

We decided to open up an investigation issue for these alerts. We had to find out what the issue was since we were showing 502 errors to our users and we needed a DRI that wasn't on call to investigate.

Leave a Comment