AI Lessons Learned from DeepSeek’s Meteoric Rise
The AI world is still buzzing from last week’s debut of DeepSeek’s reasoning model, which demonstrates category-leading performance at a bargain-basement price. While the details of the Chinese AI developers’ approach are still being confirmed, observers have already taken away valuable lessons that are likely to shape AI’s development going forward.
Since ChatGPT set off the GenAI Gold Rush, model developers have been in a race to build bigger and more expensive models that could handle an ever-wider range of tasks. That necessitated bigger clusters loaded with more GPUs training on more data. Size definitely mattered, both in the size of your bank account, your GPUs, and your cluster.
But the rise of DeepSeek shows that bigger isn’t better, and that smaller, more nimble players can match the big AI giants–and potentially outmaneuver them.
“DeepSeek exposed a huge blind spot in our rush to adopt AI,” said Joe Sutherland, a professor at Emory University and author of the book “Analytics the Right Way: A Business Leader’s Guide to Putting Data to Productive Use.”
DeepSeek’s sudden success also suggests strongly that the top performing models in the future will be open source. That ultimately is good for customers and AI builders, and will help to democratize AI, says Sam Mahalingam, the CTO of Altair.
“By enabling developers to build domain-specific models with constrained/cost-effective resources and efficient training methods, it opens new avenues for innovation,” Mahalingam says. “The breakthrough, in my view, lies in the open-source licensing model. This, combined with intelligent training methodologies, will significantly further accelerate the development of large language models. I believe this approach demonstrates that building domain-specific smaller models is the next crucial step in integrating AI more deeply across various applications.”
The fact that DeepSeek snuck in with a smaller model that was trained on a subset of data a $5.5 million cluster–one that featured only Nvidia’s third-best GPUs–took everyone by surprise, says Databricks CEO Ali Ghodsi.
“No one could have predicted this,” Ghodsi said in an interview posted to YouTube on Tuesday. “There’s a paradigm shift happening. The game is shifting. The rules are changing completely.”
The old scaling law of AI–which stated that the more money you had to throw at an AI model, the better it would be–have officially been overturned.
“We’ve scaled the amount of dollars and GPUs…10 million times over,” Ghodsi said. “But it’s clear now that it’s very hard for us in the next 10 years to go 10 million times bigger than we have done in the last 10 years.”
Going forward, AI builders will use other techniques, such as training on small subsets of specialized data and model distillation, to drive the accuracy forward.
“DeepSeek had specific data in the domain of math…and they’re able to make the model extremely good at math,” Ghodsi said. “So I think this kind of domain intelligence where you have domains where you have really good domains – that’s going to be the path forward.”
Because DeepSeek’s R1 reasoning model was trained on math, it’s unclear how well the model will generalize. Up to this point, AI developers have benefited from large generalization gains as a byproduct of the massive amount of data used to train large foundation models. How well these new categories of reasoning models generalize is “the trillion-dollar question,” Ghodsi said.
Model distillation, or training a new model on the output of an existing model (which the DeepSeek models are suspected of using) is “extremely efficient,” Ghodsi said, and is a highly technique favored for the types of reasoning models that large companies and labs are now focused on. In fact, in just the past week, many distillations of the DeepSeek models, which are open, have been created in just the past week.
That leads to Ghodsi’s final observation: All models are now effectively open.
“My joke is everybody’s model is open source. They just don’t know it yet,” he said. “Because it’s so easy to distill them, you might think you haven’t open sourced your model but you actually have. Distillation is game-changing. It’s so cheap.”
We might not legally be allowed to use the outputs of one model to train a new one, but that isn’t stopping many companies and some countries from doing it, Ghodsi said. “So essentially it means that all the data is going to be spread around and everybody is going to be distilling each other’s models,” he said. “Those trends are clear.”
DeepSeek’s rise also marks a shift in how we build AI apps, particularly at the edge. AIOps and observability will see a boost, according to Forrester Principal Analysts Carlos Casanova, Michele Pelino, and Michele Goetz. It will also shift the resource demand from the data center out to the edge.
“It could be a game-changer for edge computing, AIOps, and observability if the advances of DeepSeek and others that are sure to surface run their course,” the analysts said. “This approach enables enterprises to harness the full potential of AI at the edge, driving faster and more informed decision-making. It also allows for a more agile and resilient IT infrastructure, capable of adapting to changing conditions and demands.
“As enterprises embrace this new paradigm, they must rethink their data center and cloud strategies,” Casanova, Pelino, and Goetz continued. “The focus will shift to a hybrid and distributed model, dynamically allocating AI workloads between edge devices, data centers, and cloud environments. This flexibility will optimize resources, reduce costs, and enhance IT capabilities, transforming data center and cloud strategies into a more distributed and agile landscape. At the center will remain observability and AIOps platforms, with the mandate for data-driven automation, autoremediation, and broad contextual insights that span the entire IT estate.”
This article first appeared on sister site BigDATAwire.
Related
Alex Woodie has written about IT as a technology journalist for more than a decade. He brings extensive experience from the IBM midrange marketplace, including topics such as servers, ERP applications, programming, databases, security, high availability, storage, business intelligence, cloud, and mobile enablement. He resides in the San Diego area.