Covering Scientific & Technical AI | Saturday, January 18, 2025

Half Time of the CAE Experiment 

<img style="float: left;" src="http://media2.hpcwire.com/dmr/Clouds.jpg" alt="" width="95" height="76" />Report on an on-going project (a.k.a. the Uber-Cloud Experiment) in which 170 industry and research organizations and individuals from 22 countries are working together to jointly explore the end-to-end process of remotely accessing technical computing resources sitting in Data Centers and in the Cloud.

Since its first announcement on July 13 here on the Digital Manufacturing Report, and its official start on July 20, the CAE Experiment (also called the Uber-Cloud Experiment) attracted over 170 industry and research organizations and individuals from 22 countries who share one goal: to jointly explore the end-to-end process of remotely accessing technical computing resources sitting in Data Centers and in the Cloud.

The focus of this experiment is on engineering FEM and CFD simulations performed by small and medium enterprises that expect a quantum leap in innovation and competitiveness by using cluster computing resources more powerful than their desktop workstation.

While the benefits of remote access to computing resources are widely recognized, and today we have developed and mastered most of the technology needed to access and run our engineering workloads on remote resources, we still face other challenges. For example, trusting the resource provider; giving away some control over our applications, data, and resources; security; provider lock-in; software licensing; unfamiliar pay-per-use computing model in the cloud; and a general lack of clarity in distinguishing between hype and reality. To explore these hurdles in detail and to learn more about this end-to-end process, we were able to build 25 teams, each consisting of an end-user and his/her application, the software provider, the computational resource provider, and a computing and/or CAE expert who manages the team process. Thanks to our participants, the following teams have been set up:

Team

Project Description

Anchor Bolt

Simulating steel to concrete fastening capacity for an anchor bolt

Resonance

Electromagnetic simulations of NMR Probe heads

Radiofrequency

Radiofrequency field distribution inside heterogeneous human body

Supersonic

Simulation of jet mixing in the supersonic flow with shock

Liquid-Gas

Two-phase flow simulation of separation columns

Wing-Flow

Flow around an aerospace wing

Ship-Hull

Simulation water flow around a hull of the ship

Cement-Flows

Burner simulation with different solid fuels in mining industry

Sprinkler

Simulating water flow through an irrigation water sprinkler

Space Capsule

Aerothermodynamics and stability analysis of a space capsule

Car Acoustics

Low frequency car acoustics

Dosimetry

Numerical EMC and Dosimetry with high-res models

Weathermen

Large-scale and high-resolution weather and climate prediction

Wind Turbine

CFD simulations of vertical and horizontal wind turbines

Combustion

Simulating combustion in an IC engine

Blood Flow

Simulation of water/ blood flow inside rotating micro channels

ChinaCFD

CFD using homegrown C/C++ application

Gas Bubbles

Simulation of gas bubbles in a liquid mixing vessel

Side impact

Optimization of the side-door intrusion bars under a crash

ColombiaBio

Analysis of the biological diversity in a geography using R scripts

Telecom

Hadoop based simulations with data from telecommunication

Acoustics

Ultrasonic therapy simulation in medical equipment

In the meantime, almost all of the 25 teams are underway: Four of them are busy with defining their end-user project; 15 teams are in contact with the assigned computing resources and setting up the project environment; three are working on initiating and monitoring the end-user project execution; two are reviewing the results with the end-user; and one team is already documenting the findings of the HPC as a Service process.

To illustrate the team process in more detail, below we present two teams and their current status:

Example: Simulating new probe design for a medical device

Team Expert: Chris Dagdigian from BioTeam

Our team's end-user (who wants to remain anonymous) is faced with a common problem: a periodic need for large compute capacity in order to simulate and refine potential product changes and improvements. The periodic nature of the HPC requirements means that it is not possible to have the desired amount of capacity internally as the company finds it difficult to justify capital expenditure for complex assets that may end up sitting idle for long periods of time. To date the company has invested in a modest amount of internal High Performance Computing (HPC) capacity sufficient to meet base requirements. Additional HPC resources would allow the end user to greatly expand the sensitivity of current simulations and may enable new product and design initiatives previously written off as "untestable."

Our HPC software is CST Studio (http://www.cst.com), a popular commercial application for many types of electromagnetic simulation. We are currently operating in the Amazon cloud and have successfully completed a series of architecture refinements and scaling benchmarks. Our hybrid cloud-bursting architecture allows local HPC resources residing at the end-user site to be utilized along with our Amazon cloud-based resources.

At this point in the project we are still exploring the scaling limits of the Amazon GPU-equipped EC2 (Elastic Computing Cloud) instance types and are beginning new tests and scaling runs designed to test HPC task distribution via MPI. The use of MPI will allow us to leverage different EC2 instance type configurations and scale beyond some technical limits imposed by the amount of memory residing within the NVIDIA GPU cards. Currently, we are nearly at the point in which we are routinely running simulations that would not be technically possible using the local-only resources of our end user.  We also intend to begin testing use of the Amazon EC2 Spot Market in which cloud-based assets can be obtained from an auction-like marketplace offering deeply significant cost savings over traditional on-demand hourly prices.

Example: Multiphase flows within the cement and mineral industry

Team Expert: Ingo Seipp from science + computing ag

In this project, ANSYS CFX is used to simulate a flash dryer in which hot gas is used to evaporate water from a solid. The team consists of FLSmidth as the end user, Bull as the computing resource provider with its extreme factory (XF) HPC on demand Cloud Service, ANSYS as the software provider and science + computing ag as team experts.

FLSmidth is the leading supplier of complete plants, equipment and services to the global minerals and cement industries. The end user needs about four to five days to complete a simulation run on the local IT infrastructure. Without investing in hardware, which may not always be utilized full-time, he would like to reduce the total throughput time of the project and, in a second step, increase the mesh size to refine the results. For this, the simulation must be run on more cores and more memory through more nodes connected by a high-speed network.

XF provides 150 teraflops of computing power with Infiniband, GPUs and currently about 30 installed applications.  Others are added on demand. Users can access XF through an easy-to-use web-portal or direct logon.

In this project, XF has enabled access to the end user and integrated ANSYS CFX in a web-interface for submitting jobs for the end user. For the course of this project, licenses have been granted by ANSYS. The end user can manage his ANSYS licenses easily through the portal. The preparations to run the jobs are almost completed now and the first test runs should start shortly.

Announcing Round 2 of the Uber-Cloud Experiment

We consider Round 1 as proof of the concept that: YES, access to remote computing resources works, and, there is a real need for it! YES, there are hurdles on the way, but in the meantime we know how to overcome them.

During the half time webinar we asked the attendees: Would you participate in a CAE Experiment Round 2? 97% answered with “Yes”. Therefore, we decided to start a new round of the CAE Experiment right after the end of the current round, running from mid-November to mid-February.

Round 2 of the experiment will be even more professional: the end-to-end process of identifying, accessing and using remote resources (hardware, software, expertise) will become more structured, standardized, and tools-based; we will handle more teams and more applications beyond CAE; and offer a list of additional professional services, e.g., measuring the overall team effort. Existing teams will be encouraged to use other resources, existing participants can work in new teams, and new participants can join and form new teams.

For more information and the registration for Round 2, please go to the CAE Experiment website.

AIwire