Sometimes you just want to know how fast your code can go, without benchmarking it. Sometimes you have benchmarked it and want to know how close you are to the maximum speed. Often you just need to know what the current limiting factor is, to guide your optimization decisions.
Well this post is about that determining that speed limit1. It’s not a comprehensive performance evaluation methodology, but for many small pieces of code it will work very well.
This post is intended to be read from top to bottom, but if it’s not your first time here or you just want to skip to a part you find interesting, here you go:
There are many possible limits that apply to code executing on a CPU, and in principle the achieved speed will simply be determined by the lowest of all the limits that apply to the code in question. That is, the code will execute only as fast as its narrowest bottleneck.
So I’ll just list some of the known bottlenecks here, starting with common factors first, down through some fairly obscure and rarely discussed ones. The real-world numbers come mostly from Intel x86 CPUs, because that’s what I know off of top of my head, but the concepts mostly apply in general as well, although often different limit values.