What's Wrong With my Benchmark Results?

submited by
Style Pass
2021-10-19 15:30:14

Years ago, I briefly worked with a team that cared a lot about software performance. They had a comprehensive set of unit tests to check the correctness of their software, but unlike any other team I'd ever worked with, they automatically recorded the runtime of each of those tests so that they could spot performance regressions right away. It was a clever idea, and they'd put a lot of work into it, but unfortunately it didn't save them from releasing a new version of their code that ran half as fast for their users as the previous version. Their unit tests weren't representative of actual workloads, so they didn't catch interaction effects that turned out to be common in the real world.

The moral of the story is that benchmarking is hard; speaking as a former professor, I think we make it harder by not teaching people how to do it properly. Research like that reported in Costa2019 is therefore very welcome. The authors start by identifying five bad practices associated with use of Java's Microbenchmark Harness:

They then look at over a hundred open source Java projects and find that (a) all of these are common in the real world and (b) they have significant (misleading) impact on benchmark results. All of them are fixable—the authors submitted patches to several projects—but their findings leave little doubt that there's a lot of room for improvement in most projects' performance measurements.

Leave a Comment