Covering Scientific & Technical AI | Thursday, January 30, 2025

Argonne Releases Aurora Exascale Supercomputer to Researchers 

Jan. 28, 2025 -- The U.S. Department of Energy’s (DOE) Argonne National Laboratory has released its Aurora exascale supercomputer to researchers across the world, heralding a new era of computing-driven discoveries. With powerful capabilities for simulation, artificial intelligence (AI), and data analysis, Aurora will drive breakthroughs in a range of fields including airplane design, cosmology, drug discovery, and nuclear energy research.

The ALCF's Aurora exascale supercomputer is now open to the research community. Image credit: Argonne National Laboratory.

“We’re ecstatic to officially deploy Aurora for open scientific research,” said Michael Papka, director of the Argonne Leadership Computing Facility (ALCF), a DOE Office of science user facility. “Early users have given us a glimpse of Aurora’s vast potential. We’re eager to see how the broader scientific community will use the system to transform their research.”

Exascale and AI: Boosting the Speed of Science

Aurora is one of the world’s first exascale supercomputers, along with Frontier at DOE’s Oak Ridge National Laboratory and El Capitan at DOE’s Lawrence Livermore National Laboratory. Exascale refers to systems capable of performing at least an exaflop—a quintillion (or a billion billion) calculations per second. The DOE machines are not only the first to reach exascale but are also currently the three fastest systems in the world.

Aurora has already established itself as the one of the world’s leading systems in AI performance, earning the top spot on the HPL-MxP benchmark in November 2024. Its advanced capabilities for AI tasks are being used by scientists to discover new battery materials, design new drugs, and accelerate fusion energy research. Before its deployment, an Argonne-led team demonstrated Aurora’s potential by using it to train AI models for an innovative protein design framework.

“A big target for Aurora is training large language models for science,” said Rick Stevens, Argonne associate laboratory director for Computing, Environment and Life Sciences. “With the AuroraGPT project, for example, we are building a science-oriented foundation model that can distill knowledge across many domains from biology to chemistry. One of the goals with Aurora is to enable researchers to create new AI tools that help them make progress as fast as they can think—not just as fast as their computations.”

Among the initial projects on Aurora, researchers are working to develop high-fidelity models of complex systems, such as the human circulatory system, nuclear reactors, and supernovae, to gain new insights into their behavior. Additionally, its capacity to process massive datasets is critical for analyzing the growing data streams from large-scale research facilities such as Argonne’s Advanced Photon Source (APS), a DOE Office of science user facility, and CERN’s Large Hadron Collider.

“The projects running on Aurora represent some of the most ambitious and innovative science happening today,” said Katherine Riley, ALCF director of science. “From modeling extremely complex physical systems to processing huge amounts of data, Aurora will accelerate discoveries that deepen our understanding of the world around us.”

Collaborative Development: Building and Preparing Aurora for Science

Aurora’s deployment marks the culmination of years of collaboration. Built in partnership with Intel and Hewlett Packard Enterprise (HPE), Aurora is equipped with 63,744 GPUs (graphics processing units) and 84,992 network endpoints, making it one of the largest supercomputer installations to date. Spanning eight rows of refrigerator-sized cabinets, the machine weighs 600 tons, covers 10,000 square feet—the size of two professional basketball courts—and is interconnected by 300 miles of networking cables.

“Bringing a system of this scale to life comes with a unique set of challenges,” said Susan Coghlan, ALCF project director for Aurora. “It required working with entirely new technologies at an unprecedented scale. Seeing the machine fully operational and ready to support science speaks to the hard work and expertise of everyone involved.”

To ensure Aurora was ready for science on day one of its deployment, the system was built through a collaborative process called co-design. Using this approach, the Aurora team developed the system hardware and scientific software in tandem to optimize performance and usability. This required years of collaboration between the ALCF, Intel, HPE, and researchers across the nation participating in DOE’s Exascale Computing Project (ECP) and the ALCF’s Aurora Early Science Program (ESP).

The ALCF team provides expertise and support to the research community, helping maximize the use of Aurora for scientific discovery. Image credit: Argonne National Laboratory.

While Aurora was being installed, ECP and ESP teams ran applications to stress-test the hardware while simultaneously optimizing their codes to run as efficiently as possible on the system. This resulted in dozens of scientific applications, along with a wide range of software and programming tools, being ready for Aurora before it entered production.

“Part of the process of bringing a new supercomputer online involves putting it through its paces with real codes running real science problems,” said Kalyan Kumaran, ALCF director of technology. “This is key to achieving our goal of enabling science on day one of a new supercomputer’s launch.”

Now that Aurora is in production, it has begun supporting over 70 diverse science and engineering projects. This includes projects from the Early Science Program, as well as those awarded computing time through DOE’s two primary allocation programs: the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) and the ASCR Leadership Computing Challenge (ALCC). The ALCF is also accepting Director's Discretionary allocation requests from researchers with computationally intensive problems to solve.

To learn more about Aurora, visit: https://www.alcf.anl.gov/aurora. For details on getting started with Aurora, click here.

The ALCF is hosting Aurora Bootcamp training sessions on February 12 and February 25, 2025. Register here.


Source: Jim Collins, Argonne Lab

AIwire