The Advanced Principles of Chaos Engineering

submited by
Style Pass
2021-05-31 13:30:07

In our previous post, “What Chaos Engineering Is (and Isn’t)” we laid out the key principles behind the practice. Chaos Engineering is grounded in empiricism, experimentation over testing, and verification over validation. But not all experimentation is equally valuable. The principles of Chaos Engineering extend to a “gold standard” captured in a set of advanced principles: 

Every experiment begins with a hypothesis. For availability experiments, the form of the experiment is usually: Under ______ circumstances, customers still have a good time.

For security experiments by contrast, the form of the experiment is usually: Under ______ circumstances, the security team is notified.

In both cases, the blank space is filled in by the variables you determine. The advanced principles emphasize building your hypothesis around a steady-state definition. This means focusing on the way the system is expected to behave, and capturing that in a measurement. In the preceding examples, customers presumably have a good time by default, and security usually gets notified when something violates a security control.

This focus on steady state forces engineers to step back from the code and focus on the holistic output. It captures Chaos Engineering’s bias toward verification over validation. We often have an urge to dive into a problem, find the “root cause” of a behavior, and try to understand a system via reductionism. Doing a deep dive can help with exploration, but it is a distraction from the best learning that Chaos Engineering can offer. At its best, Chaos Engineering is focused on key performance indicators (KPIs) or other metrics like SLOs that track with clear business priorities, as those make for the best steady-state definitions.

Leave a Comment