GOAI Publishes Python Data Frame for GPU Analytics
A group of data analytics vendors joined forces today at the GPU Technology Conference to create the GPU Open Analytics Initiative (GOAI) with the goal of fostering the development of a community of data science and deep learning workloads running on GPUs. The group also unveiled a Python-based API that begins to address its concern.
Continuum Analytics, H2O.ai and MapD Technologies are the founding members of GOAI, which was unveiled at NVidia's annual GPU Technology Conference in San Jose, California. The vendors say that, while each of them have powerful frameworks, the lack of a common standard data format hinders intercommunication among the various applications.
Without the capability to access and work with the same data in a GPU environment, the vendors say, it slows the workflow, increases latency, and increases of complexity of analytic workflows running on GPUs.
The group proposed a new data standard to address this concern. Called the GPU Data Frame, the standard facilitates the interchange of data among various processes running on the GPU. It currently exposes a Python API.
The new GPU Data Frame API enables end-to-end computation on the GPU, which therefore "avoids transfers back to the CPU or copying of in-memory data, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads," the group says in a press release.
The announcement continues:
"Users of the MapD Core database can output the results of a SQL query into the GPU Data Frame, which then can be manipulated by the Continuum Analytics’ Anaconda NumPy-like Python API or used as input into the H2O suite of machine learning algorithms without additional data manipulation."
Early tests show that, by keeping the data resident in the GPU and avoiding round-trips back to the CPU, processing times decreased by an order of magnitude, the group says.
Todd Mostak, CEO and co-founder of MapD Technologies and one of Datanami's 2017 People to Watch, says that, while the data science community is rapidly adopting GPUs for machine learning and deep learning workloads, the need to involve CPUs for tasks like subsetting and preprocessing of training data is creating a bottleneck.
"The GPU Data Frame makes it easy to run everything from ingestion to preprocessing to training and visualization directly on the GPU," he says in the announcement. "This efficient data interchange will improve performance, encouraging development of ever more sophisticated GPU-based applications."
Travis Oliphant, co-founder and chief data scientist of Continuum Analytics and also one of Datanami's 2017 People to Watch, says the approach will benefit Anaconda users who are using GPUs.
"Using NVIDIA’s technology, Anaconda is mobilizing the Open Data Science movement by helping teams avoid the data transfer process between CPUs and GPUs and move nimbly toward their larger business goals," he says in the press release.
Sri Ambati, CEO and co-founder of H2O.ai, says he's excited about GOAI's potential to drive a truly diverse open source ecosystem. "GOAI is a call for the community of data developers and researchers to join the movement to speed up analytics and GPU adoption in the enterprise,” he says.
Joining the three co-founders of GOAI are three additional data outfits, including BlazingSQL, a scale-out data warehousing outfit with a proprietary file format for petabyte-scale data sets; Graphistry, which develops a GPU-based data store and a visual analytics language; and Gunrock, an open source, high-performance graph primitive for GPU led by UC Daviss John Owens.
GOAI has published some of its specs at github.com/gpuopenanalytics.
In other news, MapD also announced that its database is now open source, which matches the code status of its two GOAI co-founders.
Related Items:
Why Anaconda’s Data Science Tent Is So Big–And Getting Bigger
How Machine Learning Is Eating the Software World