Within each AI server, GPUs are linked by ultra-high-bandwidth, short-reach interconnects such as NVLink. Instead of sending data over the standard connection that normally links a GPU to the rest of the server, NVLink creates a direct ‘fast lane’ between GPUs so that they can exchange model data without going through the CPU, as it is the case for the standard connection. 61 This high-speed fabric effectively pools the memory and compute of multiple GPUs inside a rack but also raises local power density, adding extra heat that needs to be removed at rack level
So these are essentially like having lit up network links between cards inside the same server, rather of them relying on the existing links that all the other traffic uses?