How To Manage Flaky Tests

submited by
Style Pass
2025-01-01 01:30:03

Many projects suffer from the problem of flaky tests: tests that pass or fail non-deterministically. These cause confusion, slow development cycles, and endless arguments between individuals and teams in an organization.

This article dives deep into working with flaky tests, from the perspective of someone who built the first flaky test management systems at Dropbox and Databricks and maintained the related build and CI workflows over the past decade. The issue of flaky tests can be surprisingly unintuitive, with many "obvious" approaches being ineffective or counterproductive. But it turns out there are right and wrong answers to many of these issues, and we will discuss both so you can better understand what managing flaky tests is all about.

Often this manifest as sleep/time.sleep/Thread.sleep calls in your test expecting some concurrent code path to complete, which may or may not wait long enough depending on how much CPU contention there is slowing down your test code. But any multi-threaded or multi-process code has the potential for race conditions or concurrency bugs, and these days most systems make use of multiple cores.

Leave a Comment