Covering Scientific & Technical AI | Sunday, December 1, 2024

Untether AI Introduces Early Access to imAIgine SDK with Advanced Inference Acceleration 

TORONTO, July 18, 2024 -- Untether AI, a leader in energy-centric AI inference acceleration, announced the availability of early access (EA) of its imAIgine Software Development Kit (SDK) supporting the speedAI inference acceleration solutions.

The imAIgine SDK provides a push-button flow, streamlining the process of converting trained neural network models into optimized, inference-ready models to be run on speedAI acceleration solutions. This latest EA release supports the speedAI family of devices and PCIe accelerator cards, which set a new industry benchmark of energy efficiency and 2000 TFLOPs of AI inference performance per device.

“Providing the early access version of the imAIgine SDK enables users to prepare their neural networks for the upcoming shipment of speedAI devices and cards,” said Philip Lewer, Sr. Director of Product at Untether AI. “With an extensive array of model garden and kernel support, automated compilation, and sophisticated analysis tools, this EA release gives users everything they need to easily deploy their models on the revolutionary speedAI family of inference acceleration solutions.”

Push-Button Flow for Simple Model Deployment

The imAIgine SDK provides an automated path to running neural networks on Untether AI’s inference acceleration solutions, with push-button quantization, optimization, physical allocation, and multi-chip partitioning. Supporting either TensorFlow or PyTorch, a few simple python commands quantize, lower, physically allocate, and run the models on speedAI hardware in a matter of minutes. With a comprehensive model garden library and kernel support users can quickly run classification, object detection, semantic segmentation, or natural language processing (NLP) models on speedAI hardware.

Sophisticated, automated quantization techniques convert the neural network to the preferred datatype. For the utmost in accuracy, post-quantization training (PQT) and knowledge distillation algorithms are available to maintain accuracy after quantization. During compilation the imAIgine SDK performs layer-fusion optimizations, graph-lowering, kernel mapping, and physical allocation to provide an optimal implementation result.

Power-User Flow for Low-Level Optimizations

With the power-user flow, users can directly develop optimized “bare metal” kernels for the over 1,400 RISC-V processors and over 350,000 at-memory compute processing elements in speedAI devices. Analogous to CUDA, but written in familiar C/C++, these kernels are directly compiled using a modified version of LLVM, enhanced to take advantage of the over 30 custom instructions Untether AI has added to the instruction set for its ultra-efficient at-memory compute architecture. Users can then manually place the kernels in any topology on the memory banks of the speedAI spatial architecture.

Extensive Suite of Analysis Tools Including Virtual Hardware

Within the imAIgine SDK there are several tools to analyze how networks are running on the speedAI devices, providing a virtual hardware view prior to receiving actual devices. The Model Explorer shows the entire floorplan of how the neural network is mapped to the silicon, enabling interactive inspection of connection topology, socket depth, and performance estimates. This can be enhanced by the Analysis Dashboard to provide information on processor activity, packet exchanges, and utilization. All of these tools provide a virtual hardware environment to help guide the user for optimal efficiency and performance.

Untether AI invites prospective customers and partners to explore the transformative potential of speedAI and the imAIgine SDK. To gain early access and start developing with the imAIgine SDK for the speedAI family, please visit https://www.untether.ai/imaigine-sdk-early-access-program and request download privileges.

About Untether AI

Untether AI provides energy-centric AI inference acceleration from the edge to the cloud, supporting any type of neural network model. With its at-memory compute architecture, Untether AI has solved the data movement bottleneck that costs energy and performance in traditional CPUs and GPUs, resulting in high-performance, low-latency neural network inference acceleration without sacrificing accuracy. Untether AI embodies its technology in runAI and speedAI devices, tsunAImi acceleration cards, and its imAIgine Software Development Kit.


Source: Untether AI

AIwire