Covering Scientific & Technical AI | Thursday, March 13, 2025

Cerebras Scales AI Inference with Hugging Face Partnership and Datacenter Expansion 

(Source: Michael Vi/Shutterstock)

As Nvidia and its partners prepare for the upcoming GTC event, another AI infrastructure company, Cerebras Systems, is making strategic moves to expand its role in high-speed AI inference. This week, Cerebras announced two developments: a new partnership with Hugging Face to offer developers access to its inference platform, and the planned launch of six new AI datacenters across North America and Europe. 

Expanding Access Through Hugging Face 

Cerebras' partnership with Hugging Face integrates its inference capabilities into the Hugging Face Hub, making it available to the platform's five million developers. This integration allows developers to select Cerebras as their inference provider for models like Llama 3.3 70B, offering API access to models powered by the Cerebras CS-3 systems. 

The CS-3 is built around Cerebras' Wafer-Scale Engine, a unique processor design that condenses the performance of tens to hundreds of GPUs into a compact 16RU system. This architecture simplifies deployment and accelerates inference tasks, reducing the time required to generate complex responses from days to minutes. Cerebras reports achieving over 2,200 tokens per second for Llama 3.3 70B, a rate the company claims is approximately 70 times faster than traditional GPU solutions. For developers, this means faster access to model outputs with comparable accuracy. 

“By making Cerebras Inference available through Hugging Face, we’re empowering developers to work faster and more efficiently with open-source AI models, unleashing the potential for even greater innovation across industries,” Cerebras CEO Andrew Feldman said in a statement. 

Scaling Infrastructure with New Datacenters 

In parallel with the Hugging Face partnership, Cerebras is expanding its infrastructure footprint with six new AI inference datacenters powered by the company’s signature Wafer-Scale Engines. Adding to its existing California datacenters, the new facilities are located across North America and Europe, including sites in Oklahoma City and Montreal (see below for the full list). Once fully operational, these centers are expected to significantly increase Cerebras' capacity to handle high-speed inference workloads, supporting over 40 million Llama 70B tokens per second. 

Cerebras AI Inference Data Centers:  

  • Santa Clara, CA (online)  
  • Stockton, CA (online)  
  • Dallas, TX (online)  
  • Minneapolis, MN (Q2 2025)  
  • Oklahoma City, OK (Q3 2025)  
  • Montreal, Canada (Q3 2025)  
  • Midwest / Eastern US (Q4 2025)  
  • Europe (Q4 2025) 

The Oklahoma City and Montreal datacenters house AI hardware exclusively owned and operated by Cerebras. The remaining sites are jointly operated with Cerebras strategic partner G42, the company says. 

The expansion is part of Cerebras' broader 2025 scaling plan, aiming to meet growing demand for dedicated AI inference infrastructure. Notably, 85% of the new capacity will be located in the United States, positioning Cerebras as a key player in domestic AI infrastructure development. 

“Cerebras is turbocharging the future of U.S. AI leadership with unmatched performance, scale, and efficiency – these new global datacenters will serve as the backbone for the next wave of AI innovation,” said Cerebras COO Dhiraj Mallick. “With six new facilities coming online, we will add the needed capacity to keep up with the enormous demand for Cerebras industry-leading AI inference capabilities, ensuring global access to sovereign, high-performance AI infrastructure that will fuel critical research and business transformation.” 

The Cerebras CS-3 System. (Source: Cerebras)

Several of these new datacenters are being developed in partnership with local infrastructure providers. The Oklahoma City facility, for instance, is a collaboration with Scale Datacenter and is designed with advanced resilience features, including seismic shielding and redundant power systems. In Montreal, Cerebras is working with Enovum, a division of Bit Digital, to establish what will be the company's first wafer-scale inference site in Canada. 

“Enovum is thrilled to partner with Cerebras, a company at the forefront of AI innovation, and to further expand and propel Canada’s world-class tech ecosystem,” said Billy Krassakopoulos, CEO of Enovum Datacenter. “This agreement enables our companies to deliver sophisticated, high-performance colocation solutions tailored for next-generation AI workloads.” 

Scaling for Speed with CS-3 

The Cerebras approach to AI infrastructure is shaped by its unique hardware offering. Unlike traditional distributed GPU clusters, the CS-3's wafer-scale design allows for simplified scaling and high-speed performance within a smaller physical and operational footprint. This design is central to Cerebras' inference cloud strategy, which aims to deliver faster, more efficient model processing for enterprise and research applications. 

While the company has highlighted its performance advantage in terms of speed, the broader impact will depend on how well these systems integrate into existing AI development workflows and whether they can attract sustained demand from both enterprise and research customers. In addition to serving many supercomputing and research centers, Cerebras's recent customer base includes AI startups like France’s Mistral and platforms like AI search engine Perplexity AI. The addition of Hugging Face as a partner signals an effort to broaden its reach within the developer community and position the company as a competitive provider of AI inference services. 

As the AI infrastructure landscape continues to evolve, Cerebras' combination of hardware innovation and infrastructure expansion suggests a strategy designed to differentiate the company in a GPU-centric market. It’s too soon to say if these developments will disrupt an inference landscape shaped by Nvidia, but Cerebras is making moves that could continue putting it on the map in a major way.

AIwire