Covering Scientific & Technical AI | Friday, September 20, 2024

Nvidia Introduces Arm-based Grace Server Designs for AI, HPC, Cloud 

Nvidia is lining up Arm-based server platforms for a range of HPC, AI and cloud applications. The new systems employ Nvidia’s custom Grace Arm CPUs in four different configurations, including a Grace Hopper HGX baseboard powered by Nvidia's forthcoming Grace CPU and its new Hopper GPU.

Nvidia announced that its first Grace CPU-powered system designs will be coming to market in the first half of next year. The new details come on the heels of Grace and Grace Hopper Superchip disclosures made at the company’s March GPU Technology Conference.

The new HGX Grace and HGX Grace Hopper platforms are designed for a 2U server chassis from Nvidia OEM partners, Nvidia said. Four HGX Grace blades fit in a single 2U footprint, with each blade providing up to a terabyte of LPDDR5X memory and up to a terabyte per second of memory bandwidth.

The Grace Hopper Superchip inside the Grace Hopper HGX blade pairs an Nvidia Hopper GPU with a Grace CPU over NVLink-C2C in an integrated module. Graphic credit: Nvidia.

The HGX Grace Hopper 2U chassis implements two Grace Hopper blades, with each providing 512 gigabytes of LPDDR5X as well as 80 gigabytes of HBM3 memory and memory bandwidth up to 3.5 terabytes per second.

The Grace and Grace Hopper 2U chassis can be air or liquid-cooled and consume up to 2,000 watts.

“For these HGX references, Nvidia will of course provide the Grace Hopper and the Grace CPU Superchip modules as well as the corresponding PCB reference designs,” said Paresh Kharya, senior director of product management and marketing for accelerated computing at Nvidia, during a media briefing held in conjunction with Computex (which is taking place in Taipei this week). “Our partners can modify these reference designs and quickly spin up the motherboard, leveraging their existing system architectures.”

Connectivity between the Grace blades will be handled by Nvidia’s BlueField-3 DPU, and OEM partners will also have the option of employing a different interconnect in the servers, Kharya said. HGX Grace Hopper adds the option to connect nodes using NVLink.

Nvidia also announced Arm-based OVX and CGX blueprint systems. While HGX and DGX invoke Hyperscale and Datacenter respectively, OVX and CGX are a nod to Omniverse and Cloud. OVX Grace follows the x86-based OVX platform that Nvidia announced in March targeting digital twins and Omniverse applications. The Arm-based version employs one Grace CPU Superchip, one BlueField-3 DPU and and four Nvidia GPUs for visual computing (specific GPU unspecified at this time).

CGX systems are intended for cloud graphics and gaming. The Arm-based platform features the Grace CPU Superchip, one BlueField-3 DPU and two Nvidia A16 GPUs.

While Nvidia is going all-in on Arm, Kharya emphasized that the company has no plans to stop supporting x86 CPUs on its platforms. “x86 is a very important CPU that is pretty much all of the market of Nvidia’s GPUs today. We’ll continue to support x86 and we’ll continue to support Arm-based CPUs, offering our customers and market the choice for wherever they want to deploy accelerated computing,” he said.

Kharya added that all the server form factors and reference designs being launched on Grace next year are already available on x86.

“Together [with our partners] we offer hundreds of configurations of x86 and Arm systems to power the world’s need for HPC and AI, and we are preparing new systems for Hopper and BlueField,” said Nvidia Director of Product Ying Yin Shih, during Nvidia’s Monday night Computex keynote. “These systems are open for all partners to expand their markets by leveraging our ecosystem.”

OEM hardware partners at launch include Asus, Gigabyte, QCT, Supermicro, Foxconn and Wiwynn. These Taiwanese computer makers will be bringing dozens of server models to market, according to Nvidia. Broader OEM support is expected.

When the Grace Hopper Superchip was unveiled in April 2021, early customer announcements included the Swiss National Supercomputing Centre (CSCS) and the U.S. Department of Energy’s Los Alamos National Laboratory. The centers said they are working with HPE and Nvidia to deploy Grace systems based on the HPE Cray XE architecture in 2023.

Nvidia also revisited its retooled datacenter roadmap and cadence, announced by CEO Jensen Huang at GTC 2021. The company’s roadmap now includes three chips: the GPU, the CPU and the DPU. “Each chip architecture will have a two-year rhythm,” said Brian Kelleher, senior vice president, hardware engineering at Nvidia, during the company’s Computex keynote. “One year we’ll focus on x86 platforms; one year we’ll focus on Arm platforms.”

About the author: Tiffany Trader

With over a decade’s experience covering the HPC space, Tiffany Trader is one of the preeminent voices reporting on advanced scale computing today.

AIwire