Chris's Wiki :: blog/sysadmin/MonitoringTooHard

submited by
Style Pass
2022-05-16 02:30:09

Grumpy thesis: monitoring TLS certificate expiry is too hard (evidence: good people keep having certs expire on them). Why don't web servers ship with routine cron jobs that email you when any actively used TLS certificate is N days or less from expiring, for example?

Having a TLS certificate for a public web server unexpectedly expire on you is practically a rite of passage for a system administration team. And I'm not here to throw stones, because while we have a reasonably good system for monitoring our TLS certificates, it's critically reliant on us remembering to add monitoring for the actual TLS website. When the TLS website is a standalone web server, that's fairly easy (because we know we want to check if the site is actually up), but when it's yet another virtual host on our central web server, it's also easy for it to drop through the cracks because we know we're already monitoring the web server as a whole.

As a general rule, when people keep doing something wrong, they're actually right and your system is wrong. Put another way, "if your system depends on humans never making errors, you have a systems problem". If it takes extra steps and extra attention to add monitoring, people will keep forgetting to do so and then they will get burned by it. TLS certificates are an obvious case, but there are lots of other ones. How many systems ship with default monitoring that tries to let you know if the local disk space is getting alarmingly low, for example?

Leave a Comment
Related Posts