How AutoML is Democratizing Data Science — And What That Means For Data Scientists
“Data scientist” has been one of the most in-demand job titles of the past decade. But in another 10 years, that role will look very different, thanks to technologies such as automated machine learning, or AutoML.
New technologies are already helping to reduce the need for organizations to build AI and ML models from scratch — a traditional data scientist’s bread and butter. Instead, at most organizations, software developers and even non-technical employees will do the heavy lifting, wielding powerful software tools that automate many of the tasks a data scientist handles today.
This transition is already underway: Data engineers — developers with data-related skill sets — were the fastest-growing tech job category in 2019 according to a study from Dice.com, growing almost twice as fast as demand for data scientists. As this trend continues, data scientists will shift to more consultative roles, guiding data strategy for organizations.
To understand this transition, we first have to understand where the spectrum of data science maturity stands today — and how it’s going to evolve.
“Homegrown” ML algorithms are out of reach for most
Fortune 500 companies and other major enterprises often have the highest level of machine learning maturity, as they have the resources and technical talent required to develop their own proprietary ML applications. These organizations employ teams of formally credentialed data scientists to build custom ML algorithms, generally using open-source tools like TensorFlow and the machine learning library for Python, Scikit-learn.
Successfully completing these projects requires a rare combination of data science talent, business intuition and deep knowledge of the specific problem one is trying to solve. It’s also incredibly labor intensive, involving highly manual processes that require a high degree of technical skill. A data scientist might start a project by manually importing data into a completely blank Jupyter notebook, conducting exploratory data analysis, evaluating different algorithms and engineering new features, then end by carefully tuning a model by hand.
These types of complex, bespoke projects can often deliver results somewhat more accurate than automated tools can accomplish. But given the level of investment and risk involved — and the sometimes marginal gains compared to simpler tactics — it’s no surprise that these projects are typically pursued by large enterprises with rich stores of historical data and the deep pockets to hire skilled resources.
For most organizations, this approach simply isn’t economical — nor is it necessary to achieve the desired business results. The investment in data science talent, computing resources and tools may not be worth it.
Instead, organizations have a number of other tools they can turn to, the most important being the set of machine learning automation tools known as AutoML.
AutoML offers a flexible, customizable alternative
AutoML is an ideal solution for organizations that lack the resources to build algorithms from scratch, but that also need more flexibility than off-the-shelf ML applications like AWS Lex or Azure Language Understanding can offer. By compressing the manual steps of a traditional machine learning workflow into a configurable stack, AutoML empowers developers to incorporate data science elements into projects without the need for academic data science training.
A software engineering skill set is all that’s required to build custom configurations, refine inputs and generally play in the AutoML sandbox. AutoML works especially well with large, relatively common datasets, like financial transaction data or clickstream data from a web property.
Today, many academically credentialed data scientists look down on solutions built with AutoML, as they usually deliver less accurate results than a “homegrown” model would. For most business tasks, however, slightly less accurate is still more than good enough — and AutoML’s greater accessibility makes the trade-off worthwhile.
The democratization of data science
AutoML is more than just a useful tool for building “good enough” ML solutions: Ultimately, these types of automation tools will become the driving force behind the democratization of data science.
By reducing the barrier to entry for building ML applications, the AutoML toolkit expands the pool of employees who are able to find creative solutions to enterprise data problems.
That universe isn’t limited to developers, either.
Many vendors have also launched AutoML products easy enough for non-technical staffers to use, creating “citizen data scientists” who are empowered to solve data problems they encounter in their day-to-day work. And while they’re not as flexible, off-the-shelf ML applications can also help increase data science literacy by introducing employees of smaller, less well-resourced organizations to basic automation and data capabilities. Both categories of technology will contribute to the spread of data literacy throughout the enterprise in the coming years.
The data scientist is dead — long live the data scientist
This process of democratization will shift the role of the data scientist as well. While this role will continue to add value at every level of the data maturity spectrum, some of the tasks they would normally handle will be automated, which will move data scientists to take on consultant roles. Rather than investing time into building models from scratch, data scientists will advise organizations on how to solve their problems with data using AutoML and other automation tools. In the future, familiarity with tools will be an expected part of their skill set, the way a developer is expected to be familiar with multiple programming languages today.
The day-to-day workload of a data scientist in 2030 will be radically different from what it is today — but that’s a good thing. It will mean that accessible tools have become so powerful and employees so data-literate that there’s little need for most organizations to build ML models from scratch.
Instead, data scientists will apply their skills and training to high-level strategic tasks, driving stronger business results — and making them even more indispensable to the organizations they serve.
About the author:
Eric Miller is the VP of private cloud solutions at Rackspace Technology. With 20 years of experience in enterprise IT, Miller is a strong advocate of cloud native architectural patterns, passionate about machine learning, IoT, serverless, and all things automation in the cloud. Previously he was the vice president of AWS customer solutions at Onica, which was acquired by Rackspace in 2019. He earned a bachelor of science in information technology and information systems security from the University of Phoenix.