Microsoft announced the general availability of the Azure NDV5 VM series. Each NDV5 VM comes with 8 NVIDIA H100 GPUs, each providing 3,958 teraFLOPS (8-bit FP8), 80GB of GPU memory, and 3.35TB/s of GPU memory bandwidth. The 8 GPUs are interconnected through NVLink4 enabling them to communicate with each other at 900 GB/s. Each GPU connects to the CPU through PCle5 at 64 GB/s.
Across different VMs, GPUs are interconnected through ConnectX7 InfiniBand, enabling them to communicate with each other at 400Gb/s per GPU (i.e., 3.2 Tb/s per VM). These best-in-class GPU connectivity options substantially improve the training and inferencing performance of large language models (LLMs), which require heavy communication between GPUs. Azure allows easily scaling from 8 GPUs to a few tens/hundreds to many thousands of interconnected GPUs (referred as “super computers”) depending on the compute needs of a particular inferencing or training workload.