Covering Scientific & Technical AI | Wednesday, December 4, 2024

Essential Ingredients for Building and Deploying Successful AIOps 

As big data proliferates in every aspect of business today, IT teams face a daunting task in processing the sheer volume and complexity of IT operations output. In response, the enterprise appetite for AIOps is growing. AIOps uses big data and machine learning to predict, identify, diagnose and resolve IT events at a scale and speed humans just cannot replicate. A recent report from private equity and venture capital firm Insight Partners estimates the AIOps platform market size will grow at a CAGR of 32.2% from 2021 to 2028, from approximately $2.83 billion in 2021 to $19.93 billion by 2028. That said, effective AIOps solutions don’t materialize overnight. A fully baked AIOps solution results from a recipe perfected over time through robust experimentation with three essential ingredients: data, analytics and diverse domain expertise.

Data

Successful AIOps simply do not exist without data. This ingredient is critical, and while available in abundant supply, the challenge is to harvest the data in a usable and validated form. AIOps relies on hundreds – or even thousands – of data points from diverse sources (for instance, network performance, business systems and customer support) all generating by the second, and in many cases, sub-second rates. How that vast pool of data is processed may make or break an AIOps solution. For speed, cost-effectiveness and maximum efficiency, a split pipeline of on and off-premise data management yields the best result.

A traditional single on-premises data processing model can no longer accommodate the complexity and volume of today's data sets. Instead, consider building or re-architecting the data processing funnel into two parts: a lean, fast-processing pipeline that runs through a real-time, on-premises data bus to handle time critical analysis, and a more robust channel that analyzes the remaining data in the cloud. Reducing the on-premises data production to a bare minimum and assigning the cloud – armed with elastic computing and more sophisticated storage capabilities – to process the rest of the data enables faster and more cost-effective data synthesizing.

A split pipeline model that manages data on and off-premises simultaneously can amplify an organization's ability to process millions of data points every hour. ML algorithms can help prioritize incoming data from each pipeline and convert the raw, unstructured data into usable metrics essential to customer service agents or IT teams. The efficiencies and speed gained from a dual-pronged system also enable organizations to deploy enhanced monitoring capabilities for real-time visibility and long-term trend information about network performance.

Analytics

The second essential ingredient for AIOps success is analytics. Analytics fold into the AIOps mix in two phases, including exploratory analysis – sifting through raw data for trends or anomalies that require additional examination – and advanced statistical analysis, which converts to actionable insights. As data funnels through the pipelines, engineering teams often skip ahead eagerly to advanced statistical analysis despite the integral role of exploratory research. Bypassing this initial phase can lead to data overfitting – injecting bias into the AIOps process and falsely identifying issues that would render AI/ML algorithms useless and cause unintended operational consequences.

Exploratory analysis relies on both ML and data scientists to identify and determine the specific metrics essential to customer service agents and engineers. IT teams may favor ML in this process – it’s exciting technology that seems efficient. But ML alone is not always the most effective method for analysis. ML tries to solve a particular problem based on a set of specific parameters. Engineers program ML algorithms based on the metrics they think they need to reach conclusions A, B or C – thereby removing other possible solutions or statistics from consideration.

Conversely, statisticians and data scientists examine raw data without a specific result in mind, instead reviewing the figures for patterns or anomalies. Manual data review, while tedious, allows experts to identify straightforward IT solutions that don't require advanced statistical analysis. For example, as the result of complaints about wireless network performance, an analytics team combed through interactive data visualizations on a dashboard to find that the problem sites were all on the same wireless carrier. From there, they deduced those sites all had the same hardware model of wireless modem. Finally, they found that the problem occurred when using a specific wireless band. The issue was a known problem to the wireless carrier and resolved by replacing the modem with a different model.

When teams are confident that the trends or anomalies identified in the exploratory phase are correct, they can then proceed to advanced statistical analysis and training AI/ML algorithms. Even AI/ML requires trial-and-error testing and will not yield immediate results. Behind every AIOps solution is a team of domain experts that tweak and test AI/ML models extensively and constantly to ensure AIOps success.

Diverse domain expertise

The third ingredient for a successful AIOps implementation is domain expertise. In the case of AIOps creation, there can’t be too many proverbial cooks in the kitchen. Successful deployment of AI in any enterprise requires the involvement of a diverse set of domain experts. For example, in the area of network operations, network engineers understand the nuances of ML systems and the necessary AI algorithms to solve a particular problem with accuracy. Meanwhile, non-technical experts bring sector-specific knowledge such as the source and usability of datasets, business strategy and operations. A deep bench of domain experts ensures that the AI/ML algorithms reflect real-world operations, provides crucial validation of the results and serves as an important check on erroneous approaches or unintended consequences. For instance, a communications system undergoing planned maintenance may exhibit behaviors (like extremely low network traffic) that typically indicate a problem state. Adding a business logic layer that communicates with a maintenance ticketing system to the model predictions eliminates these false alarms.

Domain experts play an important role in the hypothetical kitchen – but also in the theoretical dining room, where they can interpret to an audience of executives hungry for AIOps solutions. ML tends to operate in a black box, leaving teams unable to articulate the recipe by which the model arrived at a specific decision. This can lead to skepticism and hesitation among business executives to carry out an action based on an AI-driven insight. On the other hand, explainable AI delivers stronger buy-in and trust from business leaders unfamiliar with AIOps.

AIOps requires three core ingredients, but, as with any recipe, the quality of those ingredients and in whose hands they are placed will make all the difference in outcome. As with the best chef-creations in the world, trial and error is part of the process, particularly in the complex art of training the ML. Ensuring proper handling of data, employing the right type of analytics and engaging domain experts will help enterprises serve a successful, scalable AIOps solution to satisfy the increased appetite for operational efficiency.

About the Author

Frank Kelly, vice president at Hughes Network Systems, LLC (HUGHES), is the chief technology officer for the North American Division, responsible for identifying innovation and technology to improve service effectiveness and efficiency for consumer and enterprise services. In this capacity, he oversees the strategic direction and implementation of machine learning and artificial intelligence, in addition to applying agile development and service delivery techniques and integrating DevOps technologies into Hughes services. Mr. Kelly earned a Master Degree in Information Technology from Hood College, Maryland, with a focus on network management. He also holds a Bachelor of Science Degree in Computer Science from the University of Maryland.

AIwire