Been a while since that was our 500 page, hasn’t it?  It was cute and fun.  We’ve now got our terribly overwhelmed Snoo being crushed by a pile of

You Broke Reddit: The Pi-Day Outage : RedditEng

submited by
Style Pass
2023-03-22 01:00:04

Been a while since that was our 500 page, hasn’t it? It was cute and fun. We’ve now got our terribly overwhelmed Snoo being crushed by a pile of upvotes. Unfortunately, if you were browsing the site, or at least trying, during the afternoon of March 14th during US hours, you may have seen our unfortunate Snoo during the 314-minute outage Reddit faced (on Pi day no less!) Or maybe you just saw the homepage with no posts. Or an error. One way or another, Reddit was definitely broken. But it wasn’t you, it was us.

Today we’re going to talk about the Pi day outage, but I want to make sure we give our team(s) credit where due. Over the last few years, we’ve put a major emphasis on improving availability. In fact, there’s a great blog post from our CTO talking about our improvements over time. In classic Reddit form, I’ll steal the image and repost it as my own.

As you can see, we’ve made some pretty strong progress in improving Reddit’s availability. As we’ve emphasized the improvements, we’ve worked to de-risk changes, but we’re not where we want to be in every area yet, so we know that some changes remain unreasonably risky. Kubernetes version and component upgrades remain a big footgun for us, and indeed, this was a major trigger for our 3/14 outage.

Leave a Comment