Covering Scientific & Technical AI | Thursday, November 28, 2024

TACC’s Vista: AI-Focused Supercomputer in Production for Open Science Community 

Sept. 4, 2024 -- The Texas Advanced Computing Center (TACC) at The University of Texas today announced that Vista, a new artificial intelligence (AI)-centric system, is in full production for the open science community.

Vista, a new AI-centric system at TACC, is in full production for the open science community. Credit: TACC.

“Vista serves as a bridge between Frontera, the current NSF leadership-class computing system, and the forthcoming Horizon, which will be the primary system of the newly announced U.S. NSF Leadership-Class Computing Facility (LCCF), planned for 2026,” said Dan Stanzione, TACC executive director and associate vice president of research at UT Austin. “Vista expands TACC’s capacity for AI and will ensure that the broad science, engineering, and education research communities have access to the most advanced computing and AI technologies.”

Funded by the National Science Foundation (NSF), Vista marks a departure from the x86-based architecture used by TACC in Frontera, the Stampede systems, and others to central processing units (CPUs) based on the Advanced RISC Machines (Arm) architecture. The Arm-based NVIDIA Grace CPU Superchip is specifically designed for the rapidly expanding needs of AI and scientific computing.

Additionally, Vista includes the NVIDIA Grace Hopper GH200 combination of the Grace CPU tightly integrated with NVIDIA’s graphics processing unit (GPU) for the heavy computational lifting required by scientific and AI workloads.

“It is very satisfying for our team to be able to train extremely large neural networks on Vista during the early user period, especially recent foundation models from computer vision, natural language processing, and computational biology,” said Adam Klivans, computer science professor and member of the scientific board for the Center for Generative AI. “The speeds are well beyond what we have experienced on other advanced systems at TACC. This cluster will be a game-changer for the AI community at UT Austin.”

Vista will help users begin porting to future generations of these technologies.

“Now with Vista, along with Lonestar6, our AMD-based system, and Stampede3, our Intel-based system, our users will gain experience with and insight into three major architectural paths for what Horizon might look like,” Stanzione said.

The NVIDIA GH200 Grace Hopper Superchip is the processor for a the majority of Vista’s 860 compute nodes. It combines the NVIDIA Grace CPU with an NVIDIA Hopper architecture-based GPU so that the GPU can seamlessly access CPU memory to enable bigger AI models. The NVIDIA Grace CPU Superchip, which contains two Grace CPUs in a single module, fills out the remainder of Vista’s nodes for conventional applications.

Memory is implemented in a new way with the superchips. Instead of traditional DDR DRAM, the Grace CPU Superchip uses LPDDR5 technology—like the memory used in laptops but optimized for the needs of the data center. In addition to delivering higher bandwidth, this memory is more power-efficient than traditional Dual Inline Memory Modules (DIMMS). When combined with a high performance and energy efficient CPU design, the energy savings can be over 200 watts per node.

In addition, the NVIDIA Quantum-2 400 Gb/s InfiniBand networking platform will help advance Vista’s performance with its advanced acceleration engines and in-network computing.

“AI is bringing unimaginable benefits to every aspect of society, including scientific exploration,” said Dion Harris, director of Accelerated Computing at NVIDIA. “Vista will inspire innovation and elevate the user experience to new heights, helping unlock the full potential of AI for the scientific community.”

On the storage side, TACC has partnered with VAST Data to supply Vista’s file system with all-flash, high-performance storage linked to its Stampede3 supercomputer. The compute nodes are manufactured by Gigabyte and Dell Technologies provided the integration.

Vista allocations will be available to the broad open science community through the Frontera project and the National AI Research Resource (NAIRR) pilot project.


Source: Faith Singer, TACC

AIwire