When it comes to network interconnection technologies in the high-performance computing and data center fields, InfiniBand and NVLink are undoubtedly two highly discussed topics. In this article, we will delve into the design principles, performance characteristics, and application circumstances of InfiniBand and NVLink.
Introduction to InfiniBand
InfiniBand (IB) is a high-speed communication network technology designed for connecting computing nodes and storage devices to achieve high-performance data transmission and processing. This channel-based architecture facilitates fast communication between interconnected nodes.
Components of InfiniBand
- Subnet
Subnet is the smallest complete unit in the InfiniBand architecture. Each subnet consists of end nodes, switches, links, and a subnet manager. The subnet manager is responsible for managing all devices and resources within the subnet to ensure the network’s proper operation and performance optimization.
- Routers and Switches
InfiniBand networks connect multiple subnets through routers and switches, constructing a large network topology. Routers are responsible for data routing and forwarding between different subnets, while switches handle data exchange and forwarding within a subnet.
Main Features
- High Bandwidth and Low Latency
InfiniBand provides bidirectional bandwidth of up to hundreds of Gb/s and microsecond-level transmission latency. These characteristics of high bandwidth and low latency enable efficient execution of large-scale data transmission and computational tasks, making it significant in fields such as high-performance computing, data centers, and cloud computing.
- Point-to-Point Connection
InfiniBand uses a point-to-point connection architecture, where each node communicates directly with other nodes through dedicated channels, avoiding network congestion and performance bottlenecks. This connection method maximizes data transmission efficiency and supports large-scale parallel computing and data exchange.
- Remote Direct Memory Access
InfiniBand supports RDMA technology, allowing data to be transmitted directly between memory spaces without the involvement of the host CPU. This technology can significantly reduce data transmission latency and system load, thereby improving transmission efficiency. It is particularly suitable for large-scale data exchange and distributed computing environments.
Application Scenario
As we have discussed above, Infiniband is significant in HPC and data center fields for its low latency and high bandwidth. Moreover, RDMA enables remote direct memory access. The Point-to-Point Connection allows it to support various complex application scenarios, providing users with efficient and reliable data transmission and computing services. Therefore, InfiniBand is widely used in switches, network cards, and module products. As a partner of NVIDIA, FS offers a variety of high-performance InfiniBand switches and adapters to meet different needs.
- InfiniBand Switches
Essential for managing data flow in InfiniBand networks, these switches facilitate high-speed data transmission at the physical layer.
Product | MQM9790-NS2F | MQM9700-NS2F | MQM8790-HS2F | MQM8700-HS2F |
Link Speed | 400Gb/s | 400Gb/s | 200Gb/s | 200Gb/s |
Ports | 32 | 32 | 40 | 40 |
Switching Capacity | 51.2Tb/s | 51.2Tb/s | 16Tb/s | 16Tb/s |
Subnet Manager | no | yes | no | yes |
- InfiniBand Adapters
Acting as network interface cards (NICs), InfiniBand adapters allow devices to interface with InfiniBand networks.
Product | MCX653106A-HDAT | MCX653105A-ECAT | MCX75510AAS-NEAT | MCX715105AS-WEAT |
ConnectX Type | ConnectX®-6 | ConnectX®-6 | ConnectX®-7 | ConnectX®-7 |
Ports | Dual | Single | Single | Single |
Max Ethernet Data Rate | 200 Gb/s | 100 Gb/s | 400 Gb/s | 400 Gb/s |
Support InfiniBand Data Rate | SDR/DDR/QDR FDR/EDR/HDR | SDR/DDR/QDR FDR/EDR/HDR100 | SDR/FDR/EDR/HDR/NDR/NDR200 | NDR/NDR200/HDR/HDR100/EDR/FDR/SDR |
Overview of NVLink
NVLink is a high-speed communication protocol developed by NVIDIA, designed to connect GPUs, GPUs to CPUs, and multiple GPUs to each other. It directly connects GPUs through dedicated high-speed channels, enabling more efficient data sharing and communication between GPUs.
Main Features
- High Bandwidth
NVLink provides higher bandwidth than traditional PCIe buses, enabling faster data transfer. This allows for quicker data and parameter transmission during large-scale parallel computing and deep learning tasks in multi-GPU systems.
- Low Latency
NVLink features low transmission latency, meaning faster communication between GPUs and quicker response to computing tasks’ demands. Low latency is crucial for applications that require high computation speed and quick response times.
- Memory Sharing
NVLink allows multiple GPUs to directly share memory without exchanging data through the host memory. This memory-sharing mechanism significantly reduces the complexity and latency of data transfer, improving the system’s overall efficiency.
- Flexibility
NVLink supports flexible topologies, allowing the configuration of GPU connections based on system requirements. This enables targeted optimization of system performance and throughput for different application scenarios.
Application Scenario
NVLink, as a high-speed communication protocol, opens up new possibilities for direct communication between GPUs. Its high bandwidth, low latency, and memory-sharing features enable faster and more efficient data transfer and processing in large-scale parallel computing and deep learning applications. Now, NVLink-based chips and servers are also available.
The NVSwitch chip is a physical chip similar to a switch ASIC. It connects multiple GPUs through high-speed NVLink interfaces to improve communication and bandwidth within servers. The third generation of NVIDIA NVSwitch has been introduced, allowing each pair of GPUs to interconnect at an astonishing speed of 900GB/s.
NVLink servers use NVLink and NVSwitch technology to connect GPUs. They are commonly found in NVIDIA’s DGX series servers and OEM HGX servers with similar architectures. These servers leverage NVLink technology to offer superior GPU interconnectivity, scalability, and HPC capabilities.
Comparison between NVLink and InfiniBand
NVLink and InfiniBand are two interconnect technologies widely used in high-performance computing and data centers, each with significant differences in design and application.
NVLink provides higher data transfer speeds and lower latency, particularly for direct GPU communication, making it ideal for compute-intensive and deep learning tasks. However, it often requires a higher investment due to its association with NVIDIA GPUs.
InfiniBand, on the other hand, offers high bandwidth and low latency with excellent scalability, making it suitable for large-scale clusters. It provides more pricing options and configuration flexibility, making it cost-effective for various scales and budgets. InfiniBand is extensively used in scientific research and supercomputing, where its support for complex simulations and data-intensive tasks is crucial.
In many data centers and supercomputing systems, a hybrid approach is adopted, using NVLink to connect GPU nodes for enhanced performance and InfiniBand to link server nodes and storage devices, ensuring efficient system operation. This combination leverages the strengths of both technologies, delivering a high-performance, reliable network solution.
Summary
To summarize, we explore two prominent network interconnection technologies in high-performance computing and data centers: InfiniBand and NVLink. The article also compares these technologies, highlighting their distinct advantages and applications. After gaining a general understanding of InfiniBand and NVLink, we find that these two technologies are often used together in practice to achieve better network connectivity.