Programmability has fueled the growth of most semiconductor products, but how much does it actually cost? And is that cost worth it?
The answer is more complicated than a simple efficiency formula. It can vary by application, by maturity of technology in a particular market, and in the context of much larger systems. What’s considered important for one design may be very different for another.
In his 2021 DAC keynote, Bill Dally, chief scientist and senior VP of research at Nvidia, compared some of the processors his company has developed with custom accelerators for AI. “The overhead of fetching and decoding, all the overhead of programming, of having a programmable engine, is on the order of 10% to 20% — small enough that there’s really no gain to a specialized accelerator. You get at best 20% more performance and lose all the advantages and flexibility that you get by having a programmable engine,” he said.
Later in his talk he broke this down into a little more detail. “If you are doing a single half-precision floating-point multiply/add (HFMA), which is where we started with Volta, your energy per operation is about 1.5 picojoules, and your overhead is 30 picojoules [see figure 2]. You’ve got a 20X overhead. You’re spending 20 times as much energy on the general administration than you are in the engineering department. But if you start amortizing (using more complex instructions), you get to only 5X with the dot product instruction, 20% with the half-precision matrix multiply accumulate (HMMA), and 16% for the integer multiply accumulate (IMMA). At that point, the advantages of programmability are so large, there’s no point making a dedicated accelerator. You’re much better off building a general-purpose programmable engine, like a GPU, and having some instructions you accelerate.”