Towards Tomorrow’s AI Networking: RDMA and TCP/IP over CXL Fabric and More

submited by
Style Pass
2024-12-25 13:30:02

Hello everyone, Today, we will share our insights on how our company views AI networking and the advancements we are making with CXL technology. We have titled our presentation “RDMA and TCP/IP over CXL Fabric and More.” We will discuss the progress of the RDMA and TCP/IP protocols over CXL fabrics and their applications in high-performance GPU clusters and high-performance storage clusters in the field of artificial intelligence.

First, let’s introduce the current trends in the computing field. Over the past eight years, computing power has increased by 1,000 times. This is an astonishing improvement, far exceeding the predictions of Moore’s Law in the general computing field. The main driver behind this incredible advancement is NVIDIA, which has achieved both technical and commercial success. However, if we think carefully about the driving force behind this remarkable improvement, it is the transformation of computing architectures.

As shown in our image, in 2018, when we talked about computing power, it was represented by a single GPU card, a device that could be held in one hand, just like what Jensen Huang demonstrated. However, by around 2020, Jensen Huang changed the handheld device to a large GPU, an 8-GPU matrix fully connected by NVLink, which he pulled out of an oven. Today, Jensen Huang can no longer hold the most powerful GPU in his hands because the definition of the most powerful GPU has evolved into a 36-GPU or 72-GPU cabinet fully connected by NVLink. As Jensen Huang summarized, we need bigger and stronger GPUs! Parallel computing architectures have thus evolved from intra-chip parallelism (with multiple CUDA cores) to ultra-multi-core computing modules based on new NVLink connections.

Leave a Comment