I had no plans to write another post about zeros, but when life throws you a zero make zeroaid, or something like that. Here we go! When writing simpl

Hardware Store Elimination

submited by
Style Pass
2022-09-22 20:00:23

I had no plans to write another post about zeros, but when life throws you a zero make zeroaid, or something like that. Here we go!

When writing simple memory benchmarks I have always taken the position the value written to memory didn’t matter. Recently, while running a straightforward benchmark1 probing the interaction between AVX-512 stores and read for ownership I ran into a weird performance deviation. This is that story2.

On current mainstream CPUs, the timing of most instructions isn’t data-dependent. That is, their performance is the same regardless of the value of the input(s) to the instruction. Unlike you3 or me your CPU takes the same time to add 1 + 2 as it does to add 68040486 + 80866502.

That list is not exhaustive: there are other cases of data-dependent performance, especially when you start digging into complex microcoded instructions such as cpuid. Still, it isn’t unreasonable to assume that most simple instructions not listed above execute in constant time.

Certainly, the address matters. After all the address determines the caching behavior, and caching can easily account for two orders of magnitude difference in performance5. On the other hand, I wouldn’t expect the data values loaded or stored to matter. There is not much reason to expect the memory or caching subsystem to care about the value of the bits loaded or stored, outside of scenarios such as hardware-compressed caches not widely deployed6 on x86.

Leave a Comment