Nvidia’s Little Desktop AI Box with Big Unified GPU/CPU Memory

January 10, 2025 by Doug Eadline

At the 2025 CES event, Nvidia announced a new $3000 desktop computer developed in collaboration with MediaTek, which is powered by a new cut-down Arm-based Grace CPU and Blackwell GPU Superchip. The new system is called “project DIGITS” (not to be confused with Nvidia's Deep Learning GPU Training System: DIGITS). The platform offers a series of new capabilities for both the AI and HPC markets.

Project DIGITS features the new Nvidia GB10 Grace Blackwell Superchip with 20 Arm cores and is designed to offer a “petaflop” (at FP4 precision) of GPU-AI computing performance for prototyping, fine-tuning and running large AI models. (Mandatory floating point explainer may be helpful here.)

Since the release of the G8x line of video cards (2006), Nvidia has done a good job of providing CUDA tools and libraries available across the entire line of GPUs. The ability to use a low-cost customer video card for CUDA development has helped create a vibrant ecosystem of applications. Due to the cost and scarcity of performant GPUs, the DIGITS project should enable more LLM-based software development. Like a low-cost GPU, the ability to run, configure, and fine-tune open transformer models (e.g., llama) on a desktop should be attractive to developers. For example, by offering 128GB of memory, the DIGITS system will help overcome the 24GB limitation on many lower-cost consumer video cards.

Scant Specs

The new GB10 Superchip features an Nvidia Blackwell GPU with latest-generation CUDA cores and fifth-generation Tensor Cores, connected via NVLink-C2C chip-to-chip interconnect to a high-performance Nvidia Grace-like CPU, which includes 20 power-efficient Arm cores (ten Arm Cortex-X925 and ten Cortex-A725 CPU cores . Though no specs were available, the GPU side of the GB10 is assumed to offer less performance than the Grace-Blackwell GB200. To be clear; the GB10 is not a binned or laser trimmed GB200. The GB200 Superchip has 72 Arm Neoverse V2 cores combined with two B200 Tensor Core GPUs.

The defining feature of the DIGITS system is the 128GB (LPDDR5x) of unified, coherent memory between CPU and GPU. This memory size breaks a “GPU memory barrier” when running AI or HPC models on GPUs; for instance, current market prices for the 80GB Nvidia A100 vary from $18,000 to $20,000. With unified, coherent memory, PCIe transfers between CPU and GPU are also eliminated. The rendering in the image below indicates that the amount of memory is fixed and cannot be expanded by the user. The diagram also indicates that ConnectX networking (Ethernet?), Wifi, Bluetooth, and USB connections are available.

The system also provides up to 4TB of NVMe storage. In terms of power, Nvidia mentions a standard electrical outlet. There are no specific power requirements, but the size and design may give a few clues. First, like the Mac mini systems, the small size (see Figure 2) indicates that the amount of generated heat must not be that high. Second, based on the images from the CES show floor, no fan vents or cutouts exist. The front and back of the case seem to have a sponge-like material that could provide air flow and may serve as whole system filters. Since heat design indicates power and power indicates performance, the DIGITS system is probably not a screamer tweaked for maximum performance (and power usage), but rather a cool, quiet, and proficient AI desktop system with an optimized memory architecture.

Figure 1: Nvidia project DIGITS internal render. (Source: Nvidia)

As mentioned, the system is incredibly small. The image below offers some perspective against a keyboard and monitor (There are no cables shown. In our experience, some of these small systems can get pulled off the desktop by the cable weight.)

Figure 2: Nvidia project DIGITS system on desktop with magnified view. (Source: Nvidia)

AI on the desktop

Nvidia reports that developers can run up to 200-billion-parameter large language models to supercharge AI innovation. In addition, using Nvidia ConnectX networking, two Project DIGITS AI supercomputers can be linked to run up to 405-billion-parameter models. With Project DIGITS, users can develop and run inference on models using their own desktop system, then seamlessly deploy the models on accelerated cloud or data center infrastructure.

“AI will be mainstream in every application for every industry. With Project DIGITS, the Grace Blackwell Superchip comes to millions of developers,” said Jensen Huang, founder and CEO of Nvidia. “Placing an AI supercomputer on the desks of every data scientist, AI researcher, and student empowers them to engage and shape the age of AI.”

These systems are not intended for training but are designed to run quantized LLMs locally (reduce the precision size of the model weights). The quoted one petaFLOP performance number from Nvidia is for FP4 precision weights (four bits, or 16 possible numbers)

Many models can run adequately at this level, but quantization can be increased to FP8, FP16, or higher for possibly better results depending on the size of the model and the available memory. For instance, using FP8 precision weights for a Llama-3-70B model requires one byte per parameter or roughly 70GB of memory. Halving the precision to FP4 will cut that down to 35GB of memory, but increasing to FP32 will require 140GB, which is greater than the DIGITS system offers.

HPC cluster anyone?

What may not be widely known is that the DIGITS is not the first desk-side Nvidia system. In 2024, GPTshop.ai introduced a GH200-based desk-side system. HPCwire provided coverage that included HPC benchmarks. Unlike the DIGITS project, the GPTshop systems provide the full heft of either the GH200 Grace-Hopper Superchip and GB200 Grace-Blackwell Superchip in a desk-side case. The increased performance also comes with a higher cost.

Using the DIGITS Project systems for desktop HPC could be an interesting approach. In addition to running larger AI models, the integrated CPU-GPU global memory can be very beneficial to HPC applications. Consider a recent HPCwire story about CFD application running solely on Intel two Xeon 6 Granite Rapids processors (no GPU). According to author Dr. Moritz Lehmann, the enabling factor for the simulation was the amount of memory he was able to use for his simulation.

In a similar fashion, many HPC applications have had to find ways to get around the small memory domains of common PCIe-attached video cards. Using multiple cards or MPI helps spread out the application, but the most enabling factor in HPC is always more memory.

Of course, benchmarks are needed to determine the suitability of the DIGITS Project fully for desktop HPC, but there is another possibility: “build a Beowulf cluster of these.” Often considered a bit of a joke, this phrase may be a bit more serious regarding the DIGITS project. Of course, clusters are built with servers and (multiple) PCEe-attached GPU cards. However, a small, moderately powered, fully integrated global memory CPU-GPU might make for a more balanced and attractive cluster building block. And here is the bonus: they already run Linux and have built-in ConnectX networking.

This article first appeared on sister site HPCwire.

Categories: Academia, AI/ML/DL, Cloud, Datacenter, Sectors, Software, Systems

Tags: ARM,ConnectX,GB10 Superchip,GB200,Llama,LPDDR5X,Nvidia DIGITS Project