About four years ago Qualcomm attempted to enter the data center market with its own Arm-based server CPUs with the launch of the Centriq 2400 server

A Look At Qualcomm’s Data Center Inference Accelerator

submited by

Style Pass

2021-09-14 09:30:05

About four years ago Qualcomm attempted to enter the data center market with its own Arm-based server CPUs with the launch of the Centriq 2400 server family based on their custom Falkor microarchitecture. Within less than a year after its launch, the company canned its data center effort. Now, Qualcomm is heading back into the data center. But instead of fighting over CPUs, the company is going after the rapidly growing AI market. The company hopes its expertise in AI inference acceleration in the mobile market will allow it to efficiently scale to the data center.

Qualcomm’s first series of inference accelerators is called the Cloud AI 100. “We are currently actively collaborating with several industry leaders for deploying this solution,” said Karam Chata, Lead Engineering Team for Cloud AI 100 at Qualcomm. It is designed to span from the data center down to the edge devices. Three different form factors have been announced: a PCIe card for data center servers with TDPs of up to 75 W, a dual M.2 module with mid-range performance, and TDPs of 15-25 W, as well as a dual M.2e module with no heat sink for embedded devices. Each product features roughly half the performance (OPs) of its bigger version with the data center PCIe card maxing out at 400 TOPs (Int8) and 200 teraFLOPs (FP16). We will delve a bit deeper into those products later on in this article.

One thing to point out with those three products is that the Dual M.2 variant with the heat sink actually has twice as much memory as the PCIe card. This is despite offering half the peak theoretical compute as the PCIe card. Note that the bandwidth remains identical for both devices at 136.5 GB/s.