Integrated and automated management is a key component to handling data-centric growth
Sponsored Content by DDN

December 30, 2019

NVIDIA DGX SuperPOD and DDN AI400 storage speed AI deployment and workflows

Supercomputing systems and tools such as Artificial intelligence (AI) and Deep Learning (DL) have recently gained wider recognition as an opportunity for Enterprises to create increased value from their data. However, AI and DL pose exceptional infrastructure and data management challenges when running and storing data at scale. This article describes how the NVIDIA DGX SuperPOD™ supercomputing cluster paired with the DDN A³I (Accelerated, Any-Scale AI) storage solution makes it easy to deploy a supercomputer infrastructure with minimal complexity and reduced timelines. In addition, this combined solution meets the most challenging AI and DL workload needs and speeds data science workflows by up to 20 times.

Solving the challenges in system deployment

“Historically, the deployment of the largest supercomputing systems was a months-long process which involved extensive customization and tuning to extract the maximum performance for available resources. NVIDIA’s introduction of their NVIDIA DGX SuperPOD infrastructure is a game–changer for the world of complex AI modeling and other high performance computing (HPC)-like workloads that require extreme multi-node scale. Additionally, DDN and NVIDIA have committed an extensive effort to create an end-to-end deployment that pairs the power of the NVIDIA DGX-2 and the parallel data–delivery system of DDN’s A³I appliances for high-performance environments, which are easy to deploy and manage. In our testing, DDN successfully deployed ten AI400 appliances in only four hours.” states Kurt Kuckein, Vice President, Marketing.

Increasing performance for DL training model datasets

DL models used in areas such as DL training classification, object detection and natural language require large amounts of training data. Datasets in automotive and other computer vision tasks can exceed 30 terabytes (TB) in size and may require 1 GB/s per Graphic Processing Unit (GPU) for read performance. During DL training, data may be repeatedly read as the model is iterated to find the most accurate model. Processing this data requires a system that can handle massive throughput of data of many I/O patterns including large blocks (greater than 1 megabyte), smaller blocks (less than 1 megabyte and even less than 32 kilobytes), and memory-mapped files. NVIDIA’s DGX SuperPOD is specifically designed for this level of data processing. For a storage solution to meet the needs of the DGX SuperPOD, it must be able to handle these types of I/O patterns and scale to tens of gigabytes per second of read performance to all nodes simultaneously. DDN developed the DDN® AI400™ appliance to meet demanding mixed-I/O patterns, and is capable of supporting intensive deep learning (DL) workloads when connected to the DGX SuperPOD.

Introducing the NVIDIA SuperPOD – DDN A³I AI400 solution

The NVIDIA DGX SuperPOD is a first-of-its-kind AI supercomputing infrastructure that delivers groundbreaking performance, quickly deploys as a fully integrated system, and is designed to solve the world's most challenging AI problems. The DGX SuperPOD implements a reference architecture integrating 64 NVIDIA DGX-2 systems with Mellanox InfiniBand™ networking, and the DDN AI400 to create a shared supercomputing infrastructure designed not just for the lab world, but for businesses exploring data science at scale.

The DDN AI400 appliance is a compact and low-power storage solution that provides incredible raw performance with the use of Non-Volatile Memory (NVMe) drives for storage and InfiniBand as its network transport. The AI400 appliance leverages the EXAScaler® EXA5 file system which provides a high-performance enterprise parallel filesystem with expanded data management capabilities.

The AI400 appliance communicates with DGX SuperPOD clients using multiple EDR InfiniBand or 100 GbE network connections for performance, load balancing, and resiliency. The DDN parallel protocol allows each storage appliance to be accessed at over 48 GB/s, supplying plenty of overhead to feed multiple GPUs at full speed simultaneously. This performance is necessary for training image-based networks as image sizes grow to 1080p, 4K, and beyond. In addition, the all-NVME architecture of the DDN AI400 appliance provides excellent random read performance, often as fast as sequential read patterns.

Figure 1. DDN AI400 and NVIDIA SuperPOD – Transformative AI Infrastructure. Courtesy of DDN

Summary

Enterprises are gaining business insights from their data using supercomputing systems and tools such as Artificial intelligence (AI) and Deep Learning (DL) but face challenges in deploying these systems and processing and storing data at scale. A collaboration between NVIDIA and DDN to combine the power of the NVIDIA DGX SuperPOD system with DDN’s DDN A³I data management system and the DDN AI400 storage appliance allows deploying a supercomputer infrastructure with minimal configuration and reduced timelines. Now enterprises have access to leadership-class supercomputing resources without the complexity historically associated with this level of infrastructure. Both of these companies are leveraging decades of experience with data intensive computing to provide this groundbreaking integrated solution.

“Now, commercial customers that are struggling to deploy their AI models at scale with massive data sets have a readily–available recipe that requires little to no customization to drive business innovation. IT can consolidate silos of data science within their organization. In addition, by leveraging the combined capabilities of NVIDIA and DDN, enterprises can speed up data workflows up to 20 times, ” states Kurt Kuckein, Vice President, Marketing.

References

To get your copy of this reference architecture, please visit the DDN website.

About DDN

DDN is the world’s leading data management supplier to data-intensive, global organizations. The rapidly evolving competitive landscape makes it essential to ensure projects like AI initiatives can move quickly from investigation to production. For more than 20 years, DDN has focused on designing, deploying and optimizing solutions for production level AI, HPC and Big Data. DDN enables businesses to generate more value and accelerate time to insight from their data, on-premise and in multicloud environments. Organizations leverage the power of DDN technology and technical expertise to capture, store, process, analyze, collaborate and distribute information and content in the most efficient, reliable and cost-effective manner. DDN customers include many of the world’s leading financial services firms, banks, healthcare and life science organizations, manufacturing and energy companies, government and research facilities, and service providers who use their data to develop everything from innovative treatments for disease to new paths to revenue.

Contact DDN

DDN has long been a partner of choice for organizations pursuing data-intensive projects at any scale. DDN provides significant technical expertise through its global research and development and field technical organizations. A worldwide team with hundreds of engineers and technical experts can be called upon to optimize every phase of a project: initial inception, solution architecture, systems deployment, customer support and future scaling needs. DDN laboratories are also equipped with leading GPU compute platforms to provide unique benchmarking and testing capabilities for AI and DL applications.

Strong customer focus coupled with technical excellence and deep field experience ensures that DDN delivers the best possible solution for any challenge. Taking a consultative approach, DDN experts perform an in-depth evaluation of requirements and provide application-level optimization of data workflows for a project. They will then design and propose an optimized, highly reliable and easy-to-use solution that accelerates the customer’s effort.

Contact DDN today and engage our team of experts to unleash the power of your AI projects.

Integrated and automated management is a key component to handling data-centric growth
Sponsored Content by DDN

NVIDIA DGX SuperPOD and DDN AI400 storage speed AI deployment and workflows

Solving the challenges in system deployment

Increasing performance for DL training model datasets

Introducing the NVIDIA SuperPOD – DDN A³I AI400 solution

Summary

References

About DDN

Contact DDN

Related

Happening Now

Recent News

Contributors

Integrated and automated management is a key component to handling data-centric growth Sponsored Content by DDN

NVIDIA DGX SuperPOD and DDN AI400 storage speed AI deployment and workflows

Solving the challenges in system deployment

Increasing performance for DL training model datasets

Introducing the NVIDIA SuperPOD – DDN A3I AI400 solution

Summary

References

About DDN

Contact DDN

Related

Happening Now

Recent News

Contributors

Share

Copy short link

Integrated and automated management is a key component to handling data-centric growth
Sponsored Content by DDN

Introducing the NVIDIA SuperPOD – DDN A³I AI400 solution