This is the first in a series of posts which look at what happens when the incident is over and we're thinking about what to do next. We'll look at some guidance for deciding whether a debrief is worthwhile, how to prepare for a debrief meeting, and finally how to approach the debrief meeting itself.
The dust has settled after your efforts to get things back on track during your last incident, and everything's once again working as it should. Time to get back to work? Possibly, but you might want to pause and take the time to look more deeply at what happened, and whether it's worth seeking out and socialising learnings more widely. We call this activity an incident debrief, but you might know them as post mortems or incident analysis.
We think about incidents as a cost of doing business – a byproduct of success – and since you can't avoid them, the best you can do is to make sure you get your money's worth. But what does that mean in practice? How do you get value from failure, and when is it worth actively investing time actively seeking that value with post-incident activities?
Perhaps the most pertinent question is why wouldn’t you want to thoroughly analyse every incident? In an ideal world, we’d probably do just that, but for the vast majority of us there are time and cost trade-offs to be made. You could spend a day or two preparing for an incredible debrief, but what about the product feature you need to ship, or the improvements you already know you need to make to improve the reliability of your system?