3.5 C
United States of America
Saturday, November 23, 2024

Microsoft launches newest Azure digital machines optimized for AI supercomputing, the ND H200 v5 sequence 


Our clients depend on Azure AI infrastructure to develop modern AI-driven options, which is why we’re delivering new cloud-based AI-supercomputing clusters constructed with Azure ND H200 v5 sequence digital machines (VMs) immediately.

The necessity for scalable and high-performance infrastructure continues to develop exponentially because the AI panorama advances. Our clients depend on Azure AI infrastructure to develop modern AI-driven options, which is why we’re delivering new cloud-based AI-supercomputing clusters constructed with Azure ND H200 v5 sequence digital machines (VMs) immediately. These VMs at the moment are usually out there and have been tailor-made to deal with the rising complexity of superior AI workloads, from foundational mannequin coaching to generative inferencing. The size, effectivity and enhanced efficiency of our ND H200 v5 VMs are already driving adoption from clients and Microsoft AI providers reminiscent of Azure Machine Studying and Azure OpenAI Service.

We’re excited to undertake Azure’s new H200 VMs. We’ve seen that H200 gives improved efficiency with minimal porting effort, we’re wanting ahead to utilizing these VMs to speed up our analysis, enhance the ChatGPT expertise, and additional our mission.” —Trevor Cai, head of infrastructure, OpenAI.

The Azure ND H200 v5 VMs are architected with Microsoft’s techniques method to boost effectivity and efficiency, and have eight NVIDIA H200 Tensor Core GPUs. Particularly, they tackle the hole resulting from GPUs rising in uncooked computational functionality at a a lot sooner charge than the hooked up reminiscence and reminiscence bandwidth. The Azure ND H200 v5 sequence VMs ship a 76% enhance in Excessive Bandwidth Reminiscence (HBM) to 141GB and a 43% enhance in HBM Bandwidth to 4.8 TB/s over the earlier technology of Azure ND H100 v5 VMs. This enhance in HBM bandwidth allows GPUs to entry mannequin parameters sooner, serving to cut back total utility latency, which is a crucial metric for real-time functions reminiscent of interactive brokers. The ND H200 V5 VMs also can accommodate extra advanced Massive Language Fashions (LLMs) throughout the reminiscence of a single VM, bettering efficiency by serving to customers keep away from the overhead of working distributed jobs over a number of VMs. 

The design of our H200 supercomputing clusters additionally allows extra environment friendly administration of GPU reminiscence for mannequin weights, key-value cache, and batch sizes, all of which straight impression throughput, latency and cost-efficiency in LLM-based generative AI inference workloads. With its bigger HBM capability, the ND H200 v5 VM can assist larger batch sizes, driving higher GPU utilization and throughput in comparison with ND H100 v5 sequence for inference workloads on each small language fashions (SLMs) and LLMs. In early assessments, we noticed as much as 35% throughput enhance with ND H200 v5 VMs in comparison with the ND H100 v5 sequence for inference workloads working the LLAMA 3.1 405B mannequin (with world measurement 8, enter size 128, output size 8, and most batch sizes – 32 for H100 and 96 for H200). For extra particulars on Azure’s excessive efficiency computing benchmarks, please learn extra right here or go to our AI Benchmarking Information on the Azure GitHub repository for extra particulars. 

The ND H200 v5 VMs come pre-integrated with Azure Batch, Azure Kubernetes Service, Azure OpenAI Service and Azure Machine Studying to assist companies get began straight away. Please go to right here for extra detailed technical documentation of the brand new Azure ND H200 v5 VMs. 



Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles