Covering Scientific & Technical AI | Tuesday, December 3, 2024

VAST Data Primed to Serve Data, Compute for GenAI 

There is big data, and then there is VAST Data, which has emerged as one of the world’s hottest tech firms over the past year. The company, which started out as developer of flash storage arrays, has morphed into a full stack provider of software-defined infrastructure for running AI anywhere. As GenAI workloads go into production in 2024, VAST seems poised to compete with the likes of Databricks and Snowflake to capture its share of workload.

VAST Data was founded in 2016 by Renen Hallak, Shachar Fienblit, and Jeff Denwork, three tech veterans hailing from Dell EMC, Kaminario, and CTERA Networks, respectively. The company’s big goal was to rethink distributed systems architecture storage and redevelop a storage platform called DASE, which stands for Disaggregated and Shared Everything.

In 2019, it took the first step toward realizing the DASE goal with the launch of a scale-out unstructured storage offering dubbed Universal Storage. Instead of storing data in tiers, ranging from tape to memory, VAST Data, utilizing QLC NVMe drives and Optane Xpoint metadata, claims to have achieved the feat of delivering a single storage tier boasting the cost effectiveness of tape and the speed of RAM.

In 2021, VAST began selling its storage hardware using a subscription sales model, effectively turning hardware into software. Then last August, the company expanded its infrastructure coverage with the launch of the VAST Data Platform, which takes a full-stack approach to delivering data storage and compute for AI.

VAST Data packages its solutions as appliances but sells them via software subscriptions

The VAST Data Platform is composed of four pieces, including DataStore, the new name for Universal Storage; DataBase, which offers database, data warehouse, and data lake functionality; the DataSpace, a global namespace for storing, retrieving, and processing data; and DataEngine, a serverless compute engine (similar Amazon Lambda), which is slated to ship later this year.

While today’s big data systems focus primarily on processing terabyte-scale, structured and semi-structured data in batch mode atop single-site CPU-based systems, future AI workloads will work primarily atop terabyte-to-exabyte scale unstructured data in real-time atop globally federated GPU and DPUs, the company says.

“The aim of the VAST Data Platform is to bridge this divide and to provide customers with the simple experience of today’s data platforms while also addressing the needs of deep learning applications where datatypes, data scale and data locality stretch far beyond the boundaries of today’s business reporting system,” the company says in its white paper, “The Rise of the Deep Learning Data Platform.”

“By building an architecture that can store and organize exabytes of data and scheduling computational functions across a globally distributed set of AI supercomputers, the Platform’s north star points to a future beyond the relatively basic forms of Generative AI that we today see in use by Large Language Models,” the company continues.

VAST Data disaggregates compute from storage using NVME-oF (Image courtesy VAST Data)

It hasn’t yet delivered the DataEngine, a key leg of its Data Platform, but customers are lining up regardless. Companies like Pixar, Zoom, and Verizon have become paying customers, as well as governmental agencies like NASA, the U.S. Air Force, and the U.S. Department of Energy.

VAST Data passed the $100 million annual recurring revenue (ARR) in early 2023 and kept right on going, hitting $200 million by the end of August, when it had surpassed $1 billion in cumulative bookings, according to CEO Hallak. When it raised $118 million in a Series E led by Fidelity in early December, VAST Data already had a valuation of $9.1 billion.

Hallak told the Wall Street Journal in December that the company has been cash-flow positive for the past 12 quarters and hasn’t used any of the cash raised in the past three rounds. “The growth is intended to stay on this exponential trajectory,” he told the WSJ.

The current trajectory also calls for an IPO at some indeterminate point in time, according to Hallak. “We are running the company today as if it’s public,” he said.

Partnerships will be key for helping the New York-based software company achieve its vast dreams. The company has established partnership platform providers, including one with Hewlett Packard Enterprise, which includes VAST on its HPE GreenLake offering. It also has partnerships with Genesis Cloud and Nvidia.

VAST Data CEO and founder Renen Hallak

Yesterday, it announced a partnership with Run:AI to deliver full-stack AI solutions. Run:AI’s software sits between the AI workload and the underlying compute resources. It helps to automate the provisioning of GPUs for AI workloads, even providing fractional GPUs, while providing full monitoring of the environment.

“Our partnership with Run:ai transcends traditional, disparate AI solutions, integrating all of the components necessary for an efficient AI pipeline,” Hallak said in the press release. “Today’s announcement offers data-intensive organizations across the globe the blueprint to deliver more efficient, effective, and innovative AI operations at scale.”

As data volumes and AI compute requirements grow with AI, mismatches between current requirements and existing system architectures are inevitable. VAST Data claims to have a radical new approach that addresses this gap, and time will tell if it’s the right one.

This story originally ran on Datanami, Enterprise AI's sister site. 

About the author: Alex Woodie

Alex Woodie has written about IT as a technology journalist for more than a decade. He brings extensive experience from the IBM midrange marketplace, including topics such as servers, ERP applications, programming, databases, security, high availability, storage, business intelligence, cloud, and mobile enablement. He resides in the San Diego area.

AIwire