Chaos testing at HubSpot came from the needs of our site reliability (SRE) team. We needed to test fault tolerance in the face of service to service c

A Java Agent's Architecture for Causing Chaos

submited by
Style Pass
2021-05-19 14:31:17

Chaos testing at HubSpot came from the needs of our site reliability (SRE) team. We needed to test fault tolerance in the face of service to service call failures. Our main concern was upgrading our core databases from legacy MySQL to Vitess. We were skeptical about the failure mechanics of Vitess. We wanted to ensure they would be compatible with legacy MySQL mechanics. To test, we needed to inject failures into many calls between services in our stack.

Though we’ve previously undergone failure testing efforts, we recognized an opportunity to provide reusable tooling for future engineering efforts. Our SRE team could not intervene with every team that wanted to set up failure injection. We landed on the decision to write a custom java agent. Several HubSpot-specific environmental conditions led to this solution.

One quick point ⁠— one might well conclude that we could skip over the java agent route. Instead, we could include some code in our existing shared HTTP client implementation. We did consider this route but two things stopped us from proceeding further.

Leave a Comment