It’s Not Always iCache

submited by
Style Pass
2021-07-12 14:30:04

This is a follow up to the previous post about #[inline] in Rust specifically. This post is a bit more general, and a bit more ranty. Reader, beware!

When inlining optimization is discussed, the following is almost always mentioned: “inlining can also make code slower, because inlining increases the code size, blowing the instruction cache size and causing cache misses”.

I myself have seen this repeated on various forms many times. I have also seen a lot of benchmarks where judicious removal of inlining annotations did increase performance. However, not once have I seen the performance improvement being traced to iCache specifically. To me at least, this explanation doesn’t seem to be grounded — people know that iCache is to blame because other people say this, not because there’s a benchmark everyone points to. It doesn’t mean that the iCache explanation is wrong — just that I personally don’t have evidence to believe it is better than any other explanation.

Anyway, I’ve decided to look at a specific case where I know #[inline] to cause an observable slow down, and understand why it happens. Note that the goal here is not to explain real-world impact of #[inline], the benchmark is artificial. The goal is, first and foremost, to learn more about the tools to use for explaining results. The secondary goal is to either observe iCache effects in practice, or else to provide an alternative hypothesis for why removing inlining can speed the things up.

Leave a Comment