Arm’s New Era of Embedded AI: Bridging Skills Gap and Enhancing Data and Model Management
Feb. 1, 2024 -- In this recent blog post, Arm's Parag Beeraka highlights the challenges and developments in integrating AI and machine learning into edge devices and embedded systems. Beeraka emphasizes the need for upskilling due to the skills gap in edge AI teams, as well as the importance of data management, model optimization, and efficient inference. To address these challenges, Arm provides innovative tools and collaborates with industry partners to streamline the development process and enable engineers to harness the full potential of AI in edge computing.
The rapid growth of AI and machine learning is supercharging innovation in edge devices and embedded systems. However, successfully deploying ML models on resource-constrained hardware requires edge AI expertise that can span data science, machine learning, and specialized embedded engineering disciplines. In a sense, the decades-old discipline of embedded design that delivered vast amounts of solutions built on simple microcontrollers using home-grown or commercial real-time operating systems, is literally drinking from an AI fire hose.
Most edge AI development teams that want to leverage new AI and ML workloads face a skills gap that hampers their ability to optimize and accelerate on-device AI. Some companies deal with the gap by building internal AI/ML teams; others have hired dedicated ML leadership and acquired startups to jumpstart their AI expertise.
However, while progress is being made, the bar continues to rise as ML methods and models grow more complex. We should expect, for instance, that several derivatives of ChatGPT or Gemini will be running on small, embedded controllers and edge AI soon in the future, in real-world applications providing greater benefits. Additionally, time-to-market pressures are huge. It can take years for embedded engineering teams to fully “skill-up” on MLOps. And, even then, they need to keep learning. They need to keep learning because MLOps at the edge continuously evolves.
Edge AI Gaps
So what key gaps persist? Three key areas stand out:
- Data management
- Model optimization
- Efficient inference
As a key provider in technologies that enable AI and ML solutions to expand and flourish, Arm is keen to make the lives of engineers and developers easier and more efficient to realize their AI dreams.
Let’s take a deeper look at those challenges.
Data management: Managing data effectively is critical for developing and deploying machine learning models but has its challenges. These include collecting the correct, unbiased data from sensors, labeling the data accurately and consistently for training and ensuring privacy and security of data.
Maintaining the tooling, infrastructure and skills for robust end-to-end ML data management – in a traditionally embedded-design world – introduces additional challenges for embedded teams.
For instance, targeting and prototyping on hardware traditionally has been a cumbersome, slow process. Arm has taken steps to simplify this part of the design process and accelerate deployment with Arm Virtual Hardware (AVH), a cloud-based service that provides functionally accurate models of Arm-based chips, allowing software developers to simulate the behavior of Arm-based IoT devices without the need for physical hardware.
Now consider the complexity of the data input for ML applications – audio and other forms of inputs from sensors. Because of the diverse nature of sensors, it’s important to maintain a focus on standards, as that helps to streamline the handling of these data sets.
Arm’s new SDS (Synchronous Data Streaming) framework for sensor data addresses the need for standardized data collection, labeling, and distribution for model development. It allows capturing physical sensor/audio data streams from the target hardware – for example from a MEMS gyroscope or microphone – during development. The framework provides Python-based utilities for playback, visualization and analysis of the captured data streams. SDS playback combined with AVH enables automated testing of algorithms on simulated models, useful for CI/CD pipelines. Through close partnership with Arm, TDK Qeexo has added support for Synchronous Data Streaming (SDS) framework in their Machine Learning Platform Qeexo AutoML.
Model Optimization: The AI journey starts with a use case and then with data for the use case. Once the use case and dataset are determined, then complex models can be trained using various methods. This requires access to large datasets and significant computational resources. That said optimization is a vital step to deliver ML workloads on power-constrained devices. This takes many forms. The size of ML models is reduced through techniques like pruning, quantization, and knowledge distillation. This cuts storage and memory requirements.
One optimization example is Arm’s collaboration with NVIDIA on TAO, a low-code AI toolkit built on TensorFlow and PyTorch that’s designed to simplify and accelerate the model training process by abstracting away the complexity of AI models and the deep learning framework.
Additionally, the Arm Model Optimization Toolkit, built with Arm’s vast global ecosystem and intimate knowledge of underlying hardware technologies and system-design requirements, is invaluable in helping development teams achieve their best optimizations.
Efficient Inference: Efficient inference in edge devices is literally where the rubber meets the road. Because these devices are usually resource-constrained, enormous care must be given to deploy models at the edge. While CPUs can handle some workloads, the emerging specialized workloads often demand a heterogeneous computing solution to deliver performance with processing efficiency. Indeed for hardware acceleration, many embedded SoCs provide accelerators like DSPs, TPUs, NPUs that are optimized for ML workloads.
Therefore, running ML workloads efficiently on embedded devices with diverse accelerators involves extensive tuning using compilers like Arm’s Vela and software libraries like CMSIS-NN. Using Arm Vela compiler, developers can compile a TensorFlow Lite for Microcontrollers neural network model into an optimized version that can run on an embedded system containing an Arm Ethos-U NPU (neural processing unit), which can accelerate various ML workloads. The Vela compiler allows users to optimize various properties of the Ethos-U embedded system, such as memory latencies and bandwidths, by rewriting the Vela configuration file. Arm ecosystem partners such as Edge Impulse, Nota.AI, Qeexo, and Plumerai have integrated Vela compiler into their tool flows so that their customers can easily use Arm-based platforms with Ethos-U accelerator.
The Up-Skilling Imperative
Maintaining the tooling, infrastructure and skills for robust end-to-end ML data management introduces additional skills challenges for traditional embedded design and development teams.
Bridging the skills gap requires combining the strengths of hardware vendors, AI/ML experts, and enterprise software providers through aligned strategies and unified toolchains. Arm products, tools, resources and its ecosystem serve as the foundation for upskilling engineers so they can unlock transformative AI use cases.
For more information, please visit the Arm web site.
Source: Parag Beeraka, Arm