Zen 5 has an interesting frontend setup with a pair of fetch and decode clusters. Each cluster serves one of the core’s two SMT threads. That create

Disabling Zen 5’s Op Cache and Exploring its Clustered Decoder

submited by
Style Pass
2025-01-23 23:30:09

Zen 5 has an interesting frontend setup with a pair of fetch and decode clusters. Each cluster serves one of the core’s two SMT threads. That creates parallels to AMD’s Steamroller architecture from the pre-Zen days. Zen 5 and Steamroller can both decode up to eight instructions per cycle with two threads active, or up to four per cycle for a single thread.

Despite these decoder layout similarities, Zen 5’s frontend operates nothing like Steamroller. That’s because Zen 5 mostly feeds itself off a 6K entry op cache, which is often large enough to cover the vast majority of the instruction stream. Steamroller used its decoders for everything, but Zen 5’s decoders are only occasionally activated when there’s an op cache miss. Normally that’d make it hard to evaluate the strength of Zen 5’s decoders, which is a pity because I’m curious about how a clustered decoder could feed a modern high performance core.

Thankfully, Zen 5’s op cache can be turned off by setting bit 5 in MSR 0xC0011021. Setting that bit forces the decoders to handle everything. Of course, testing with the op cache off is completely irrelevant to Zen 5’s real world performance. And if AMD wanted to completely serve the core using the decoders, there’s a good chance they would have gone with a conventional 8-wide setup like Intel’s Lion Cove or Qualcomm’s Oryon. Still, this is a cool chance to see just how Zen 5 can do with just a 2×4-wide frontend.

Leave a Comment