However, building a GPUs server is not that easy. And failing to create an appropriate infrastructure can have consequences on training time.
If you use GPUs, you should know that there are 2 ways to connect them to the motherboard to allow it to connect to the other components (network, CPU, storage device). Solution 1 is through PCI Express and solution 2 through SXM2. We will talk about SXM2 in the future. Today, we will focus on PCI Express. This is because it has a strong dependency with the choice of adjacent hardware such as PCI BUS or CPU.
This is a major element to consider when talking about deep learning as data loading phase is a waste of compute time, so bandwidth between components and GPUs is a key bottleneck in most deep learning training contexts.
PCIe lanes are used to communicate between PCIe Devices or between PCIe and CPU. A lane is composed of 2 wires: one for inbound communications and one, which has double the traffic bandwidth, for outbound.