When responding to an incident, your priority is to build an understanding of what happened and why, so you can understand how to fix it. This underst

September 24, 2021 How to avoid bad assumptions during incidents

submited by
Style Pass
2021-09-27 05:00:06

When responding to an incident, your priority is to build an understanding of what happened and why, so you can understand how to fix it. This understanding is built on discoveries you make during your investigation, and the data you capture along the way.

One of the most difficult situations to recover from is when the data you collected is either flat-out wrong, or is an unverified assumption posing as a fact.

I once worked on an incident where we hit this problem several times over, and suffered because of it. It’s a good example of why you should trust-but-verify past conclusions, and use incident gear-changes as prompts to raise the burden of proof required for your key assumptions.

They say their code is failing because it can no longer find the HTTP headers they expected. The headers are there, but things like Content-Type are now downcased to content-type, and their codepath doesn’t handle it.

We feel bad because we missed the notice, and there’s a certain karmic justice in us suffering this disruption as a result of our fumble. We open a ticket with Google asking them if they can opt us out, but assume they’ll say no and change our app to properly handle the header downcasing (as it should anyway, per the HTTP standard).

Leave a Comment