Covering Scientific & Technical AI | Monday, December 2, 2024

SC19: AI and Machine Learning Sessions Pepper Conference Agenda 

AI and HPC are increasingly intertwined – machine learning workloads demand ever increasing compute power – so it’s no surprise the annual supercomputing industry shindig, SC19 at the Colorado Convention Center in Denver next week, has taken on a strong AI cast. As we noted recently (“Machine Learning Fuels a Booming HPC Market”) based on findings by industry watcher Intersect360 Research, “enterprise infrastructure investments for training machine learning models have grown more than 50 percent annually over the past two years, and are expected to shortly surpass $10 billion, according to a new market forecast,” and much of that training calls for HPC-class systems.

With that in mind, here’s a rundown of AI-related sessions and activities coming up at SC19 (all event locations are in the Convention Center unless otherwise specified):

Sunday, Nov. 17:

Deep Learning on Supercomputers, 9am-5:30pm, room 502-503-504: This workshop will be led by Zhao Zhang of the University of Texas, Valeriu Codreanu of SURFsara and Ian Foster of Argonne National Laboratory and the University of Chicago and is designed to be a forum for practitioners working on all aspects of DL for science and engineering in HPC and to present their latest research results and development, deployment, and application experiences.

Tools and Best Practices for Distributed Deep Learning on Supercomputers, 1:30-5pm, room 201: This tutorial will be led by Xu Weijia and Zhao Zhang of the Texas Advanced Computing Center and David Walling of the University of Texas and is intended to be a practical guide on how to run distributed deep learning over multiple compute nodes.

Monday, Nov. 18:

Deep Learning at Scale, 8:30am-5pm, room 207: Led by seven experts from Lawrence Berkeley National Lab, Intel and Cray, this tutorial will focus on the impact of deep learning is having on the way science and industry use data to solve problems and the need for scalable methods and software to train DL models.

Latest Advances in Scalable Algorithms for Large-Scale Systems, 9am-5:30pm, room 607: Led by experts from the University of Tennessee and Oak Ridge National Laboratory, this workshop will examine scalable algorithms needed to enable science applications to exploit the computational power of large-scale systems.

Machine Learning in HPC Environments, 9am-5:30pm, room 502-503-504: Led by experts from Oak Ridge National Lab, Nvidia, North Carolina State University and Keuper Fraunhofer Institute for Industrial Mathematics, this workshop is intended is to bring together researchers, practitioners, and scientific communities to discuss methods that utilize extreme scale systems for machine learning.

“HPC Is Now” Plenary – When Technology Kills, 5:30-6:30pm, Mile High Ballroom: Moderated by Keri Savoca of Tapad, Inc. along with panelists Eric Hunter of Spherical Models and Bradford & Barthel LLP, Erin Kenneally of Elchemy, Inc., and Ben Rothke of Tapad, this panel will discuss how much autonomy ML, AI, IoT and smart data should be given, regulations that should be in place to govern software development, what additional training should people have to manage these new software systems, and ultimately, who is responsible when software fails and causes property damage, injury, or loss of life.

Tuesday, Nov. 19:

Machine Learning Training, 10:30-noon, room 401-402-403-404: Led Janis Keuper, senior scientist at Janis Keuper the Fraunhofer Institute, this event will feature presentations of three papers on large-batch training for LSTM, channel and filter parallelism for large-scale CNN training and on high-performance sparse communication for ML.

Machine Learning and HPC in Pharma Research and Development: 12:15-1:15 p.m., room 210-212: This is a birds-of-a-feather (BoF) panel discussion led by Mohammad Shaikh, director, scientific computing at Bristol-Myers Squibb that will focus on the challenges and approaches in applying ML and DL to build insightful models, challenges that include the increasing scale and velocity of data, model accuracy and refinement and computational scale.

US Administration Activities in Artificial Intelligence and HPC, 1:15-2:15pm, Mile High Ballroom: Lynne Parker, assistant director for AI at the White House Office of Science and Technology Policy, will discuss American AI Initiative, announced earlier this year, a “whole-of-government” strategy to advance AI in collaboration and engagement with the private sector, academia, the public and international allies.

Exhibitor Forum: Hardware for AI, 1:30-3pm, room 501-502: Led by Kazutomo Yoshii of Argonne National Lab, this session feature presentations on adding low latency capability to Panasas PanFS for AI and HPC; building a wafer-scale DL system; and a new block floating point arithmetic unit for AI/ML workloads.

HPC Impact Showcase, 1:30-3pm, room 503-504: Chaired by Lori Diachin of Lawrence Livermore National Laboratory, this will feature three half-hour sessions: HPC for numerical weather forecasting at The Weather Company; Livermore Labs and ExxonMobil’s HPC in reservoir simulation; and how TotalSim LLC has partnered with the Ohio Supercomputer Center to conduct HPC-based simulations for NASCAR R&D, Honda Racing and other auto racing clients.

HPC Impact Showcase, 3:30-5pm, room 503-504: Chaired by David Martin of Argonne National Lab, this will include sessions on the use of HPC and AI to accelerate design of clean engines and for the simulation of earthquakes.

AIOps: Bringing Artificial Intelligence to the Data Center, 4-4:30pm, room 501-502:  David Sickinger of the National Renewable Energy Laboratory and Sergey Serebryakov and Tahir Cader of Hewlett Packard Enterprise will discuss using AIOps to prevent downtime incidents, which cost an average of $260,000 per hour, according to Uptime Institute.

AIBench: Toward a Comprehensive AI Benchmark Suite for HPC, Datacenter, Edge and IoT, 5:15-6:45pm, room 503-504: This BoF, led by Jianfeng Zhan and Wanling Gao of the Institute of Computing Technology, Chinese Academy of Sciences, and Xiaoyi Lu of Ohio State University will discuss the growing interest in AI benchmarking and strategies for building a comprehensive AI benchmark suite across different communities with an emphasis on data and workload distributions among HPC, data center, Edge, and IoT.

Designing and Building Next-Generation Computer Systems for Deep Learning, 5:15-6:45pm, room 505: This BoF will be led by Volodymyr Kindratenko of the National Center for Supercomputing Applications, University of Illinois, Morris Riedel of Forschungszentrum Juelich University of Iceland and Yangang Wang of the Chinese Academy of Sciences and will bring together researchers and developers working on the design of next-generation computer systems for DL and parallel DL algorithms designed to exploit the potential of new systems.

Machine-Learning Hardware: Architecture, System Interfaces, and Programming Models, 5:15-6:45pm, room 704-706: Led by Pete Beckman and three colleagues from Argonne National Laboratory, this BoF will discuss concerns within the scientific community of its influence on the design of new, specialized technology for ML, including programming models, system interfaces and architecture trade-offs.

Wednesday, Nov. 20:

Machine Learning Optimization, 10:30am-noon, rooms 401-402-403-404: This session will include presentation of three research papers on fast neural network training by dynamic sparse model reconfiguration, scalable reinforcement learning-based neural architecture search for cancer research and a Tensor Core design for accelerating bit-based approximated neural nets.

HPC Impact Showcase, 3:30-4:30pm, room 503-504: Chaired by Melyssa Fratkin of the Texas Advanced Computing Center, these sessions will focus on financial services; the first will examine the partnership between the University of Texas at Dallas and the Federal Reserve Bank of Dallas to launch BigTex, an OpenHPC-based system for Federal Reserve economists and their co-authors to use for research purposes. The second session will discuss JP Morgan’s use of HPC for graph pattern match to look for known-bad cyber patterns on a shared-memory supercomputer.

Thursday, Nov. 21:

“Simulate First” and the Role of HPC – A Caterpillar Perspective, 11:15-noon, room: Mile High Ballroom: In this invited talk, Larry Seitzman of Caterpillar Inc. will discuss design simulation spreading across the design and manufacturing communities and its imperative for keeping the cost of developing new products as low as possible while exploring as much of the design space as possible.

MLPerf: A Benchmark for Machine Learning, 12:15pm-1:15pm, room 201-203: In this BoF, Tom St. John Tom St. John of Tesla Inc. and Peter Mattson of Google Brain will lead a discussion of MLPerf, a community-driven system performance benchmark covering a range of individual machine learning tasks. The session is designed to introduce MLPerf to the broader HPC community and solicit input from interested parties to drive the further adoption of this benchmark.

Benchmarking Machine Learning Ecosystem on HPC Systems, 12:15-1:15pm, room 708: This BoF, led by Murali Emani of Argonne National Lab, will examine the upsurge in HPC workloads that require data analysis, the use of ML and deep learning for image detection, segmentation, synthetic data generation, in-situ data analysis and other tasks in science domains, and how to utilize benchmarking to understand the performance of ML/DL models on HPC systems when executing these tasks.

HPC Impact Showcase, 1:30-3pm, room 503-504: Chaired by Alan Chalker of the Ohio Supercomputer Center, this showcase will feature three sessions: on harnessing HPC to realize the potential of genomics in the U.K. National Health System in Wales, Total’s PANGEA III HPC system for oil and gas exploration, and supercharging digital pathology AI with unified memory and HPC.

HPC Impact Showcase, 3:30-5pm, room 503-504: Led by Suzy Tichenor of Oak Ridge National Laboratory, these three sessions will examine paint application in the auto industry, use of HPC and computational fluid dynamics for offshore engineering in the energy industry and recreating an HPC cluster in a cloud during an HPC systems relocation.

Friday, Nov. 22:

Enabling Machine Learning-Based HPC Performance Diagnostics in Production Environments, 8:30-10am, room 205-207: Moderated by Ann Gentile of Sandia National Laboratories, this panel discussion will focus on the problems of data collection beyond the capacity of human consumption and recent machine learning techniques and tools for developing system and application behavioral models to improve operational efficiency and application performance.

HPC Big Data and AI: Computing under Constraints, 10:30am-noon, room 205-207: Led by Daniel Reed of the University of Utah, this panel will discuss the shift of big data and AI from rare and expensive to ubiquitous and inexpensive data driven by powerful networks and inexpensive accelerators, which are bringing new data-driven approaches to technical computing.

AIwire