NCSA Helps Found Trillion Parameter Consortium
Nov. 27, 2023 -- The National Center for Supercomputing Applications (NCSA) was recently announced as a founding member of the Trillion Parameter Consortium, a global gathering of scientists from federal laboratories, research institutes, academia and industry to address the challenges of building large-scale artificial intelligence (AI) systems and advancing trustworthy and reliable AI for scientific discovery.
The Trillion Parameter Consortium (TPC) brings together teams of researchers engaged in creating large-scale generative AI models to address key challenges in advancing AI for science. These challenges include developing scalable model architectures and training strategies, organizing and curating scientific data for training models; optimizing AI libraries for current and future exascale computing platforms; and developing deep evaluation platforms to assess progress on scientific task learning, reliability and trust.
NCSA Director Bill Gropp said: "NCSA is proud to be a founding member of the Trillion Parameter Consortium. Expanding the use of AI and machine learning in scientific research – in responsible and principled ways – is a very public priority for the United States government, and this group of scientific, academic and industry leaders across the world are aligned in that mission. NCSA and I are thrilled to work alongside such a robust collection of AI pioneers pushing the field into the future."
DeltaAI, NCSA’s AI-focused advanced computing and data resource that will be a companion system to the Center’s Delta, will play an instrumental role in the efforts undertaken by TPC. Set to come online in 2024, DeltaAI will triple NCSA’s AI-focused computing capacity and greatly expand the capacity available within the NSF-funded advanced computing ecosystem.
“DeltaAI will provide powerful capabilities for simulation and data science, with a strong emphasis on support for AI, which is in growing demand across many fields of science and engineering,” said Gropp in the July announcement for DeltaAI. “This project seeks to expand the use of AI methods in research by providing easier access, training offerings and other support to promote a wider demographic of researchers.”
TPC aims to:
- Build an open community of researchers interested in creating state-of-the-art large-scale generative AI models aimed broadly at advancing progress on scientific and engineering problems by sharing methods, approaches, tools, insights and workflows.
- Incubate, launch and coordinate projects voluntarily to avoid duplication of effort and to maximize the impact of the projects in the broader AI and scientific community.
- Create a global network of resources and expertise to facilitate the next generation of AI and bring together researchers interested in developing and using large-scale AI for science and engineering.
“At our laboratory and at a growing number of partner institutions around the world, teams are beginning to develop frontier AI models for scientific use and are preparing enormous collections of previously untapped scientific data for training,” said Rick Stevens, Argonne associate laboratory director for computing, environment and life sciences.
The consortium has formed a dynamic set of foundational work areas addressing three facets of the complexities of building large-scale AI models:
- Identifying and preparing high-quality training data with teams organized around the unique complexities of various scientific domains and data sources.
- Designing and evaluating model architectures, performance, training and downstream applications.
- Developing crosscutting and foundational capabilities, such as innovations in model evaluation strategies with respect to bias, trustworthiness and goal alignment.
TPC aims to provide the community with a venue in which multiple large model-building initiatives can collaborate to leverage global efforts, with flexibility to accommodate the diverse goals of individual initiatives. TPC includes teams that are undertaking initiatives to leverage emerging exascale computing platforms to train LLMs – or alternative model architectures – on scientific research including papers, scientific codes and observational and experimental data to advance innovation and discoveries.
Read more about the start of TPC in Argonne's announcement here.
Source: Andrew Helregel, NCSA