Covering Scientific & Technical AI | Sunday, January 19, 2025

Nvidia’s ‘GPU Cloud’ Adopts Containers for AI Development 

GPU powerhouse Nvidia's entry into the cloud market is differentiated from public cloud leaders by its focus on delivering development tools for training artificial intelligence models and running AI workloads using application containers.

Nvidia CEO Jensen Huang unveiled the "GPU-accelerated cloud platform optimized for deep learning" during the company's annual technology conference on Wednesday (May 10). Its AI development stack runs on the company's distribution of Docker containers and is touted as "purpose built" for developing deep learning models on GPUs.

Among the goals is giving AI developers easier access to the growing suite of deep learning software available for AI applications. The on-ramp approach to GPU-based cloud computing addresses growing requirements to gather into a single stack the proliferation of deep learning frameworks, drivers, libraries, operating systems and processors used for AI development.

"We took this incredibly complicated [software] stack and containerized it," Huang stressed. Once these frameworks and other software building blocks were bundled, Nvidia (NASDAQ: NVDA) created a cloud registry for the development stack to speed development of deep learning machines. "You download the container of your choice," Huang added.

Nvidia CEO Jensen Huang unveils the components of the chip maker's GPU Cloud.

The software components within Nvidia's AI supercomputer are bundled into a Docker container the company calls the Nvidia Graphics Cloud software stack. The idea is to make the up-to-date stack more readily available while optimizing performance.

The GPU cloud approach also addresses the computing resources needed to train neural networks, the company stressed. Developers could run the stack on GPU-powered machines, on Nvidia's DGX systems or "the ten thousand GPUs that are in the cloud," Huang said.

In one click, a single instance is created in the GPU cloud, the desired container is downloaded and "we burst your workload into the cloud," the Nvidia CEO explained. "This is really the first hybrid, deep learning cloud computing platform."

The graphics processor vendor based in Santa Clara, Calif., also announced the latest iteration of its Volta chip architecture, a high-end GPU dubbed Tesla V100 designed to power emerging AI development.

The combination of cutting edge graphics processing and scalable cloud computing resources is intended to attract a growing legion of AI developers who could leverage the service to build models of varying sizes, and then move them from prototyping to deployment in production via Docker containers.

The combination of the new graphics processor, the GPU cloud and AI software bundled and delivered in containers is seen by market watchers as a new way to boost AI development.

"The majority of deep learning applications will be run in cloud and hyper-scale environments, and the [Docker container] implementation lets users design on their own GPU systems then migrate to the cloud," explained Addison Snell of Intersect360 Research.

The deep learning software stack is the "most interesting use of containers I've seen," Snell added.

Huang said the GPU cloud platform would be available for beta testing in July. Pricing details would follow later, the company added.

--Tiffany Trader contributed to this report.

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).

AIwire