NVIDIA’s MLPerf Inference v3.1 is out. The benchmark result suite on the data center side is still mostly a NVIDIA affair with a few Qualcomm Cloud AI 100 results, and Intel results. Still, the two most interesting results are the NVIDIA GH200 Grace Hopper and the Google GPU v5e.
MLPerf Inference 3.1 is mostly an edge affair. There are thousands of results from software companies that have small tuning changes on edge platforms. Instead, we generally look at the data center results.
Of the 65 data center closed and two preview results, there are eight Intel Xeon Platinum 8480+ results, one Intel Xeon Max 9480 result, and one Habana Gaudi2 result. Qualcomm was back with the Cloud AI 100 and had five results. Google had a TPU v5e result that is really interesting for not just the accelerators. Still, all of these result lines were not submitted on all results, so one way to look at this is that <25% of the configurations were not NVIDIA but the total benchmark results were well over 85% NVIDIA making other submissions almost rounding errors.
NVIDIA submitted single H100 80GB results as well as NVIDIA GH200 Grace Hopper results. The GH200 were ~2-17% faster with an average of just over 9% faster. There are, of course, some major differences with one directly connected CPU to GPU instead of a SXM setup. Still, NVIDIA is setting up a case to say “NVIDIA GPUs work best with NVIDIA CPUs” in the future so it can push Intel and AMD from AI servers.