New MLPerf Training and HPC Benchmark Results Showcase 49X Performance Gains in 5 Years
SAN FRANCISCO, Nov. 8, 2023 -- Today, MLCommons announced new results from two industry-standard MLPerf benchmark suites:
- The MLPerf Training v3.1 suite, which measures the performance of training machine learning models.
- The MLPerf HPC (High Performance Computing) v.3.0 benchmark suite, which is targeted at supercomputers and measures the performance of training machine learning models for scientific applications and data.
MLPerf Training v3.1
The MLPerf Training benchmark suite comprises full system tests that stress machine learning models, software, and hardware for a broad range of applications. The open-source and peer-reviewed benchmark suite provides a level playing field for competition that drives innovation, performance, and energy-efficiency for the entire industry.
MLPerf Training v3.1 includes over 200 performance results from 19 submitters: Ailiverse, ASUSTek, Azure, Azure+NVIDIA, Clemson University Research Computing and Data, CTuning, Dell, Fujitsu, GigaComputing, Google, Intel+Habana Labs, Krai, Lenovo, NVIDIA, NVIDIA+CoreWeave, Quanta Cloud Technology, Supermicro, Supermicro+Red Hat, and xFusion. MLCommons would like to especially congratulate first-time MLPerf Training submitters Ailiverse, Clemson University Research Computing and Data, CTuning Foundation, and Red Hat.
The results demonstrate broad industry participation and highlight performance gains of up to 2.8X compared to just 5 months ago and 49X over the first results, reflecting the tremendous rate of innovation in systems for machine learning.
Significant to this round, is the largest system ever submitted to MLPerf Training. Comprising over 10K accelerators, it demonstrates the extraordinary progress by the machine learning community in scaling system size to advance the training of neural networks.
MLPerf Training v3.1 introduces the new Stable Diffusion generative AI benchmark model to the suite. Based on Stability AI’s Stable Diffusion v2 latent diffusion model, Stable Diffusion takes text prompts as inputs and generates photorealistic images as output. It is the core technology behind an emerging and exciting class of tools and applications such as Midjourney and Lensa.
“Adding Stable Diffusion to the benchmark suite is timely, given how image generation has exploded in popularity,” said Eric Han, MLPerf Training co-chair. “This is a critical new area-- extending Generative AI to the visual domain.”
MLCommons added the GPT-3 benchmark to MLPerf Training v3.0 last June. In just five months, the LLM benchmark has shown over 2.8X in performance gains. Eleven submissions in this round include this large language model (LLM) using the GPT-3 reference model, reflecting the tremendous popularity of generative AI.
“GPT-3 is among the fastest growing benchmarks we’ve launched,” said David Kanter, Executive Director, MLCommons. “It’s one of our goals to ensure that our benchmarks are representative of real-world workloads and it’s exciting to see 2.8X better performance in mere months.”
MLPerf HPC v3.0 Benchmarks
The MLPerf HPC benchmark is similar to MLPerf Training, but is specifically intended for high-performance computing systems that are commonly employed in leading-edge scientific research. It emphasizes training machine learning models for scientific applications and data, such as quantum molecular dynamics, and also incorporates an optional throughput metric for large systems that commonly support multiple users.
MLCommons added a new protein-folding benchmark in the HPC v3.0 benchmark suite: the OpenFold generative AI model, which predicts the 3D structure of a protein given a 1D amino acid sequence. Developed by Columbia University, OpenFold is an open-source reproduction of the AlphaFold 2 foundation model and has been the cornerstone of a large number of research projects since its creation.
MLPerf HPC v3.0 includes over 30 results – a 50% increase in participation over last year, and includes submissions by 8 organizations with some of the world’s largest supercomputers: Clemson University Research Computing and Data, Dell, Fujitsu-RIKEN, HPE+Lawrence Berkeley National Laboratory, NVIDIA, and Texas Advanced Computing Center. MLCommons congratulates first-time MLPerf HPC submitters Clemson University Research Computing and Data and HPE+Lawrence Berkeley National Laboratory.
The new OpenFold benchmark includes submissions from 5 organizations: Clemson University Research Computing and Data, HPE+Lawrence Berkeley National Laboratory, NVIDIA, and Texas Advanced Computing Center.
HPC v3.0 Performance Gains
The MLPerf HPC benchmark suite demonstrates considerable progress in AI for science that will help unlock new discoveries. For example, the DeepCAM weather modeling benchmark is 14X faster than when it debuted, illustrating how rapid innovations in machine learning systems can empower scientists with better tools to address critical research areas and advance our understanding of the world.
“The addition of OpenFold follows the spirit of the MLPerf HPC benchmark suite: Accelerating workloads with potential for global-scale contribution. We are excited for the new addition as well as the increased participation in the latest submission round,” said Andreas Prodromou, MLCommons HPC co-chair.
View the Results
To view the results for MLPerf Training v3.1 and MLPerf HPC v3.0 and find additional information about the benchmarks, please visit the Training and HPC benchmark pages.
About MLCommons
MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make machine learning better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire machine learning industry through benchmarks and metrics, public datasets, and best practices.
Source: MLCommons