Covering Scientific & Technical AI | Saturday, January 18, 2025

Why Trusting AI is All a Matter of the Right Data at the Right Time 

The world has grown accustomed to the presence of artificial intelligence (AI) in its daily lives. In fact, unless you’ve been asleep for much of 2023, you can see just how AI’s influence on the world is growing with the hype around generative AI. We all know AI has been used for years to recommend your next purchase, but it is also used by businesses looking to speed up analysis and make better, faster, more intelligent decisions. But what if that AI is unknowingly delivering bad decisions?

Sure, it's not a big deal when the AI is recommending a book that you're not interested in or a vegetable that you hate. But when it comes to financial guidance for investors, or new features for a featured product, an incorrect analysis can lead to a bad decision that will affect a business's success or worse, place individuals or groups at risk or harm.

This raises the question: Can we really trust AI to make important decisions? Especially, as more sophisticated deep learning and Generative AI models are being trained and deployed for everyday use.

The answer lies in understanding the pivotal roles data and data quality have in the machine learning lifecycle.

The Impact of Data Quality

When it comes to trusting AI, the quality of the data used to drive its decisions holds immense significance. Flawed data, whether incomplete, incorrect, or biased, can skew the accuracy of an AI's prediction. The consequences of relying on unreliable data in an AI system could potentially become catastrophic.

Imagine you're an investor seeking financial guidance and you turn to an AI-powered platform to assist you in making investment decisions. This AI relies on historical financial data to analyze market trends, identify potential opportunities, and provide recommendations. However, if the data used by the AI is incomplete or biased, perhaps because it was never fully integrated into the system, the guidance it offers may be flawed, leading you down the wrong investment path.

For another example, say the AI has access to financial data from only a limited number of sources, omitting crucial information from certain sectors or geographic regions. As a result, the AI's analysis may overlook significant market shifts or emerging trends, preventing you from capitalizing on lucrative investment opportunities.

Furthermore, consider the detrimental impact of biased data on the AI's decision-making process. Biases can seep into data collection methods or be inherent within the data itself, leading to skewed insights and recommendations. For instance, if the financial data used by the AI system predominantly represents a specific demographic or fails to account for diverse economic factors, the resulting investment recommendations may not reflect the realities of the broader market.

Relying solely on predictions based on incomplete or biased data can lead to misguided investment decisions, adversely affecting financial success and impeding business growth. The importance of data quality becomes glaringly evident when we recognize that the decisions made by an AI system can only ever be as reliable as the data available to it when training and deploying AI models. A classic example of “garbage in, garbage out.”

Just as a solid foundation is essential for the structural integrity of a building, reliable and high-quality data forms the bedrock upon which trustworthy AI can operate. Just as compromised building materials or a weak foundation can jeopardize the stability and safety of a structure, flawed data can undermine the accuracy and reliability of AI's decision-making.

To mitigate these risks, organizations must prioritize data collection processes that ensure completeness, accuracy, and lack of bias. They should strive to gather diverse and comprehensive datasets that encapsulate various sectors, demographics, and geographic regions. Employing data validation techniques, such as cross-referencing multiple sources and employing data cleaning algorithms, can help identify and rectify errors, ensuring a more accurate representation of reality.

The Importance of Robust Models

Equally crucial to data quality are the models employed to use the data for predictive insights. The reliability of AI-driven decisions heavily depends on the robustness, accuracy, and transparency of these models and on using the right data and data sets to build them.

Let’s imagine AI being used in the healthcare industry to assist doctors in diagnosing diseases. This AI model employs a complex neural network to analyze patient symptoms and provide diagnostic recommendations. While the neural network may exhibit high accuracy in identifying diseases, it lacks transparency in explaining how it arrived at its conclusions.

If you were a doctor using this prediction to aid in diagnosing a patient, you’d be able to receive a speedy diagnosis but have no understanding of the factors that contributed to that conclusion. Without transparency, it becomes challenging to trust the AI blindly, as it may overlook crucial symptoms or fail to account for important medical considerations. These challenges are common in highly regulated industries, where trust and transparency are requirements.

On the other hand, perhaps the healthcare AI tool utilizes a decision tree algorithm model instead. Decision trees are known for their interpretability, as they provide a step-by-step breakdown of the decision-making process. In this scenario, when the AI system recommends a diagnosis, you can easily trace the path of decision-making, understanding which symptoms and factors led to that conclusion. This transparency empowers a doctor to make a more informed judgment on the AI's recommendations.

For another example, imagine you are presented with two complex puzzles; one with clear instructions and transparent steps, while the other lacks any guidance or explanation. In the first puzzle, you can readily understand the logical progression, enabling you to solve it effectively. The second puzzle, however, leaves you puzzled and uncertain, making it difficult to trust your own decisions or ascertain if you've arrived at the correct solution. Guesswork, assumptions, and trial-and-error become part of the decision-making process. Robust and transparent models enable users to better comprehend the AI's decision-making process, instilling a greater sense of confidence in its recommendations.

The Role of the Semantic Layer

The semantic layer plays a crucial role in creating a common data layer that enhances the trustworthiness of AI decision making. It addresses the challenge of inconsistent data definitions and a lack of context, which as we’ve seen in the above examples, can dramatically derail AI utilization and trust. By establishing a single source of truth, a semantic layer can ensure that all AI applications are working from a common data source that has visibility into the transformation that the data has gone through each step of the way.

When different teams or individuals within an organization use different measurement units or have varying interpretations of data, it can hinder collaboration and result in conflicting conclusions. The semantic layer helps overcome this issue by providing a shared understanding. It captures the relationships, concepts, and context within the data, enabling consistent interpretation and analysis. With a common understanding of data across the organization, more trustworthy conclusions can be made, as they’re based on the same reliable source of information.

The semantic layer helps ensure that data is accurately understood, interpreted, and used consistently, fostering trust in the insights derived from AI systems.

Building Trust in AI

The real question is: How can we ensure that AI systems deliver trustworthy decisions?

To improve trust in AI, organizations must prioritize the development and utilization of models that exhibit robustness, accuracy, and transparency. This entails employing techniques such as explainable AI, where the inner workings of the model are made understandable and interpretable. Additionally, organizations can embrace model evaluation methodologies, leveraging techniques like sensitivity analysis or performance metrics to assess the reliability and effectiveness of the models. By embracing the creation of a semantic data layer, AI decision making can become more reliable, transparent, and informed because of the common source. Without these initiatives, AI will never be seen as a reliable and trustworthy decision-making partner.

Alongside meticulous scrutiny, asking the right questions is paramount in establishing trust with AI. By posing pertinent inquiries to any AI, we can ascertain its reliability in making critical decisions on our behalf.

Here are some important questions to consider:

  • What data is being used to train and guide the AI? Is it complete, accurate, and unbiased?
  • How are the models constructed for data analysis? Are they accurate, robust, and transparent?
  • How do the AI's decisions compare to those made by human experts as part of the feedback loop?

Only by ensuring the data is complete, accurate, and unbiased, and by utilizing accurate, robust, and transparent models, can we really begin to trust AI to make sound decisions. Trustworthy AI can become a catalyst for progress, but only if we take the necessary actions to help it evolve along the way. Generative AI will only magnify this challenge which is why a solid data foundation is crucial and more important now than ever before.

About the Author:

Gaurav Rao is the EVP and GM for AI/ML, leading all AI/ML strategy for AtScale. Prior to AtScale, Gaurav served as VP of Product at Neural Magic. Before Neural Magic, he served in a number of executive roles at IBM spanning product, engineering, and sales. He is also an advisor to data and AI companies.

AIwire