Feeding the Virtuous Cycle of Discovery: HPC, Big Data, and AI Acceleration

February 7, 2025 by Tabor Communications Editors

(Inkoly/Shutterstock)

GenAI hit the scene fast and furious when ChatGPT was released on November 30, 2022. The quest for bigger and better models has changed the hardware, data center, and power landscape and foundational models are still under rapid development. One of the challenges in HPC and technical computing is finding where GenAI “fits in” and, more importantly, “what it all means” in terms of future discoveries.

Indeed, the resource-straining market effects have mostly been due to creating and training large AI models. The expected inference market (deploying the models) may require different HW and is expected to be much larger than the training market.

What about HPC?

Aside from making GPUs scarce and expensive (even in the cloud), these rapid changes have suggested many questions in the HPC community. For instance;

How can HPC leverage GenAI? (Can it? )
How does it fit with traditional HPC tools and applications?
Can GenAI write code for HPC applications?
Can GenAI reason about Science and Technology?

Answers to these and other questions are forthcoming. Many organizations are working on these issues, including the Trillion Parameter Consortium (TPC) — Generative AI for Science and Engineering.

What has been reported, however, is that with all the improvements in LLMs, they continue, from time to time, to provide inaccurate or wrong answers (euphemistically called “hallucinations”). Consider the following search prompt and subsequent AI-generated answer. Someone asked an elementary school level chemistry question, “Will Water Freeze at 27 degrees F?” and the answer is comically wrong and seems subject to faulty reasoning. If GenAI is to work in science and technology, the models must be improved.

Maybe more data will help

The “intelligence” of the initial LLMs was improved by including more data. As a result, models became bigger, requiring more resources and computation time. As measured by some emerging benchmarks, the “smartness” of the models did improve, but there is an issue with this approach. Scaling models means finding more data, and in a simple sense, the model makers have already scraped a large amount of the internet into their models. The success of LLMs has also created more internet content in the form of automated news articles, summaries, social media posts, creative writing, etc.

There are no exact figures; estimates are that 10–15% of the internet’s textual content today has been created by AI. Predictions indicate that by 2030, AI-generated content could comprise over 50% of the internet’s textual data.

However, there are concerns about LLMs eating their own data. It is generally known that LLMs trained on data generated by other AI models will lead to a degradation in performance over successive generations — a condition called Model collapse. Indeed, models can hallucinate web content (“No, water will not freeze at 27F”), which may become input new model — and so on.

In addition, the recent release of report-generating tools like OpenAI Deep Research and Google’s Gemini Deep Research make it easy for researchers to create papers and documents by suggesting topics to research tools. Agents such as Deep Research are designed to conduct extensive research, synthesize information from various web sources, and generate comprehensive reports that inevitably will find their way into training data for the next generation of LLMs.

Wait, don’t we create our own data

HPC creates piles of data. Traditional HPC crunches numbers to evaluate mathematical models using input data and parameters. In one sense, data are unique and original and offer the following options

Clean and complete – no hallucinations, no missing data
Tunable – we can determine the shape of the data
Accurate – often tested against experiment
Almost limitless – generate many scenarios

There seems to be no tail to eat with science and technical data. A good example are the Microsoft Aurora (not to be confused with Argonne’s Aurora exascale system) data-based weather model results (covered on HPCwire).

Using this model, Microsoft asserts that Aurora’s training on more than a million hours of meteorological and climatic data has resulted in a 5,000-fold increase in computational speed compared to numerical forecasting. The AI methods are agnostic of what data sources are used to train them. Scientists can train them on traditional simulation data, or they can also train them using real observation data, or a combination of both. According to the researchers, the Aurora results indicate that increasing the data set diversity and also the model size can improve accuracy. Data sizes vary by a few hundred terabytes up to a petabyte in size.

Large Quantitative Models: LQMs

The key to creating LLMs is converting words or tokens to vectors and training using lots of matrix math (GPUs) to create models representing relationships between tokens. Using inference, the models predict the next token while answering questions.

We already have numbers, vectors, and matrices in Science and Engineering! We don’t want to predict the next word like Large Language Models; we want to predict numbers using Large Quantitative Models or LQMs.

Building an LQM is more difficult than building an LLM and requires a deep understanding of the system being modeled (AI), access to large amounts of data (Big Data), and sophisticated computational tools (HPC). LQMs are built by interdisciplinary teams of scientists, engineers, and data analysts who work together on models. Once complete, LQMs can be used in various ways. They can be run on supercomputers to simulate different scenarios (i.e., HPC acceleration) and allow users to explore “what if” questions and predict outcomes under various conditions faster than using traditional numeric based models.

An example of an LQM-based company is SandboxAQ, covered in AIwire that was spun out of Google in March 2022.

Their total funding is reported as $800 million and they plan to focus on Cryptography, Quantum Sensors, and LQMs. Their LQM efforts focus on life sciences, energy, chemicals, and financial services.

But …, data management

Remember BIG DATA, it never went away and is getting bigger. And it can be one of the biggest challenges to AI model generation. As reported in BigDATAwire, “The most frequently cited technological inhibitors to AI/ML deployments are storage and data management (35%)—significantly greater than computing (26%),” Recent S&P Global Market Intelligence Report.

In addition, it is computationally feasible to perform AI and ML processing without GPUs; however, it is nearly impossible to do so without proper high-performance and scalable storage. A little-known fact about data science is that 70%–80% of the time spent on data science projects is in what is commonly known as Data Engineering or Data Analytics (the time not spent running models).

To fully understand model storage needs, Glen Lockwood provides an excellent description of AI model storage and data management processes in a recent blog post.

Andrew Ng’s AI Virtuous Cycle

If one considers Andrew Ng‘s Virtuous Cycle of AI, which describes how companies use AI to build better products, the advantage of using AI becomes clear.

The cycle, as illustrated in the figure, has the following steps

Starts with user activity, which generates data on user behavior
Data must be managed — curated, tagged, archived, stored, moved
Data is run through AI, which defines user habits and propensities
Allows organizations to build better products
Attracts more users, which generates more data
and the cycle continues.

The framework of the AI Virtuous Cycle illustrates the self-reinforcing loop in artificial intelligence where improved algorithms lead to better data, which in turn enhances the algorithms further. This cycle explains how advancements in one area of AI can accelerate progress in others, creating a Virtuous Cycle of continuous improvement.

The Virtuous Cycle for scientific and technical computing

Similar to the Virtuous Cycle for product creation, a Virtuous Cycle for scientific and technical computing has developed across many domains. As described in the image, the virtual cycle includes HPC, Big Data, and AI in a positive feedback loop. The cycle can be described as follows;

Scientific Research and HPC: Grand-challenge science requires HPC capability and has the capacity to generate a very high volume of data.
Data Feeds AI Models: Data Management is critical. High volumes of data must be managed, cleaned, curated, archived, sourced, stored
“Data” Models Improve Research: Armed with insights from the data, AI models/LLMs/LQMs analyze patterns, learn from examples, and make predictions. HPC systems are required for training, Inferencing, and predicting new data for Step 1.
Lather, Rinse, Repeat

Using this Virtuous Cycle users benefit from these key indicators:

Positive Feedback Loops: Just like viral growth, positive feedback loops drive AI success.
Improvements lead to more usage, which in turn fuels further enhancements.
Network Effects: The more users, the better the AI models become. A strong user base reinforces the cycle.
Strategic Asset: AI-driven insights become a strategic asset. Scientific research that harnesses this cycle delivers a competitive edge.

The practical manifestation of the AI Virtuous Cycle is not merely a conceptual framework but is actively reshaping the digital research environment. As research organizations embrace and understand AI, they start to realize the benefits of a continuous cycle of discovery, innovation, and improvement, perpetually propelling themselves forward.

The new HPC accelerator

HPC is constantly looking for ways to accelerate performance. While not a specific piece of hardware or software, the Virtuous AI Cycle viewed as a whole is a massive acceleration leap for science and technology. And we are at the beginning of adoption.

This new era of HPC will be built on LLMs and LQMs (and other AI tools) that provide acceleration using “data models” derived from numerical data and real data. Traditional, verified, tested HPC “numeric models” will be able to provide raw training data and possibly help validate the results of data models. As the cycle accelerates, creating more data and using Big Data tools will become essential for training the next generation of models. Finally, Quantum Computing, as covered by QCwire, will continue to mature and further accelerate this cycle.

The approach is not without questions and challenges. The accelerating cycle will create further pressure on resources and sustainability solutions. Most importantly, will the Virtuous Cycle for scientific and technical computing eat its tail?

Keeping you in the virtuous loop

Tabor Communications offers publications that provide industry-leading coverage in HPC, Quantum Computing, Big Data, and AI. It is no coincidence that these are components of the Virtuous Cycle for scientific and technical computing. Our coverage has been converging on the Virtuous Cycle for many years. We plan to deliver HPC, Quantum, Big Data, and AI into the context of the Virtuous Cycle and help our readers benefit from these rapid changes that are accelerating science and technology.

Categories: Academia, AI/ML/DL, Cloud, Data Analytics, Datacenter, Energy, Financial Services, Government, Healthcare, Insight, Life Sciences, Manufacturing, Media, Retail, Sectors, Silicon, Software, Storage, Systems

Tags: AI software,AI virtuous cycle,algorithm,Big Data,high performance computing,HPC,virtuous cycle