Covering Scientific & Technical AI | Wednesday, November 27, 2024

Alation Adds GenAI to Data Catalog 

(thodonal88/Shutterstock)

Alation is rolling out a new offering dubbed ALLIE AI that’s aimed at helping data stewards better organize data when it’s being ingested into the company, as well as to streamline and simplify how customers ultimately find what they’re looking for in Alation’s data catalog.

As the original creator of the data catalog and progenitor of the product category, Alation knows a thing or two about data curation and democratizing access to information in large, complex enterprises. It also knows how important having a structured process for ingesting data, assigning governance policies, and building indexes. Without it, we might as well return to the data dark ages.

With yesterdays’ launch of ALLIE AI, Alation is taking its data governance game up a notch and into the world of generative AI. As part of the new offering, Alation is training a large language model (LLM) on a customers’ own private data set to help with both frontend and backend data governance tasks.

On the frontend, or the data ingestion and curation stage, ALLIE AI will leverage the LLM’s built-in language understanding and generation capabilities to automatically document new data assets as they are brought into the customer’s private data respiratory and define the data governance policies. At the same time, ALLIE AI will suggest which of customer’s data stewards have the specialized knowledge to oversee and guide the data onboarding and sign off on the data governance polices that ALLIE AI automatically generates.

During the data access stage, such as when data analysts or data scientists are using the catalog to explore and access enterprise data sets, ALLIE AI leverages its natural language understanding capabilities to help data analysts and other users to access the data they need, without requiring knowledge of SQL or other specialist skills. Like most data catalogs, Alation uses traditional keyword search and indexing techniques to streamline user access to data, and the addition of an LLM’s vector search capability can provide better search results.

As the LLM learns about the data in the customer’s environment, it will get better at connecting users with data they’re looking for, said Jonathan Bruce, Alation’s vice president of product management.

“We expect to see a duality there for a significant amount of time. Matching by meaning is extremely useful for many of our customers in the way that it’s just going to collapse down the time to relevance, if you will,” he said. “It’s going to de-muddle search results so that customers can get what they want sooner in a way that doesn’t require them to work as much.”

Eventually, the LLM will enable Alation customers to have a natural language conversation about data within the catalog, Bruce said. We’re not there yet, but that’s the goal.

“If you’ve ever used ChatGPT, the first question you ask, it can gives you a general answer, but you can actually kind of tease out something more specific with some additional interactions,” he said. “That’s part of what we would do down the line.”

Alation’s goal is absolutely not to replace human stewards and curators, Bruce said. Humans will always be a part of the equation, he said. However, there’s quite a bit room for additional automation and accelerating the data onboarding process, and GenAI can deliver that.

“It’s about getting the catalog to a point whereby they’re able to hit that maturity curve sooner,” Bruce said. “That’s a critical for our customers, because they’re looking to wire in more and more data sources more frequently, and the human overhead to do that curation manually gets in the way of the speed of business they want to operate.”

(Dilok Klaisataporn/Shuttesrtock)

Many of Alation’s large customers, such as Cisco, employ a large number of data stewards to guide the ingest of new data into the company and its catalog in an orderly and repeatable manner. Each data steward brings strengths in different data specialties, and one way that GenAI can help is by gradually learning what those specialties are, so it can suggest which data steward oversees the creation of the data policies that govern who can access that data, and for what purposes.

“I think effectively what we’re doing, what we’re applying here, is allowing ALLIE AI to actually generate that content, but generate that content in a way that you don’t completely remove that human element,” Bruce told Datanami. “Humans can still review and provide feedback and then ultimately allow us to learn within the confines of my customer tenancy.”

Each ALLIE AI customers gets their own LLM, which ensures that the GenAI is trained on data sets and terms that are unique to the customer while also preventing sensitive data from inadvertently leaking outside the customers’ domain.

“It’s all within the customer tenancy,” Bruce says. “You have your own language model. It evolves within your own data, and that’s a really important part of how we afford that segmentation That is absolutely fundamental.”

About the author: Alex Woodie

Alex Woodie has written about IT as a technology journalist for more than a decade. He brings extensive experience from the IBM midrange marketplace, including topics such as servers, ERP applications, programming, databases, security, high availability, storage, business intelligence, cloud, and mobile enablement. He resides in the San Diego area.

AIwire