Covering Scientific & Technical AI | Thursday, September 19, 2024

GenAI Lost in Translation? Assessing Advanced Technology’s Language Gap in Life Sciences 

As various industries explore new applications for advanced intelligence, generative artificial intelligence (GenAI) continues to gain traction. Its ability to process complex data, uncover hidden patterns, automate tasks and generate creative content emerges as a transformative tool to advance insights and productivity.

Sanmugam Aravinthan
Credit: IQVIA

However, a key hurdle to successful widespread adoption remains. GenAI’s limited language fluency is a significant handicap in the widespread adoption and use of this transformative technology.

Current GenAI systems are largely trained on data from online resources and databases, which tend to be dominated by a few major languages such as English, Spanish, and Chinese. The surplus of data related to a few globally dominant languages creates an imbalance. Considering the thousands of additional languages spoken around the globe, it can be assumed that a significant portion of global information is likely missing from current GenAI training datasets. This language bias could impart unintended drawbacks, potentially leading to skewed results or limited informational access for diverse populations.

Generative Artificial Intelligence in Action: Revolutionizing Life Sciences

This shortcoming is critically important in many fields, but particularly relevant in the life sciences industry. The life sciences sector is a prime target for GenAI technology due to its data overload. In fact, a recent survey indicates that GenAI investments in the life sciences industry has more than tripled in recent months. As a field at the forefront of discovery, the industry embraces advancements that accelerate research timelines, enhance data analysis and yield deeper insights.

Already, GenAI’s implementation has addressed several challenges in the life sciences industry. Through advanced analytical capabilities, organizations are empowered to improve critical safety surveillance measures with signal detection, data integration and automated reporting.

This technology allows for proactive monitoring of adverse drug or medical device reactions across diverse platforms, including social media. By leveraging ontologies and character recognition to train GenAI for pattern recognition, organizations can potentially predict and identify adverse events with greater accuracy, ultimately leading to improved patient safety.

Beyond safety surveillance, GenAI’s capabilities are also harnessed in the analysis of clinical data to identify suitable candidates for promising new therapies in clinical trials. This streamlining process could potentially lead to faster patient recruitment and, ultimately, shorter trial durations. Furthermore, GenAI's reach extends to patient interaction through chatbot functionalities. These chatbots gather patient symptoms and offer recommendations based on the information provided. This approach not only fosters patient engagement but also alleviates the workload burden of healthcare professionals.

Despite the potential for GenAI to improve healthcare outcomes, a key implication lies in its current language limitations. Current AI and GenAI models struggle to process information beyond a handful of dominant languages. This creates a blind spot for non-English speaking patients, potentially hindering GenAI's ability to revolutionize critical processes such as early detection of adverse events, patient recruitment for clinical trials, and advanced chatbot capabilities.

The Language Challenge: Generative Artificial Intelligence’s Language Blind Spot

The digital language divide poses a significant challenge for deploying advanced technologies across various industries. However, the life sciences industry stands to gain immense benefits from broader GenAI capabilities, potentially leading to dramatic improvements in patient outcomes.

Addressing this language gap now is crucial to ensuring future technologies’ ability to leverage vast, multilingual datasets reflecting the global healthcare landscape. Expanding training models to encompass multilingual data, incorporating diverse patient information and prioritizing language-agnostic development are all essential steps to increasing accessibility to healthcare and life sciences advancements on a global scale.

GenAI's usefulness within life sciences hinges on it ability to incorporate multilingual data

With this context in mind, how do organizations ensure the development of advanced technology that safeguards patients worldwide?

Increasing the amount of digital data utilized to train GenAI effectively is a critical first step. This requires improving global access to digital devices and internet services to expand the number of languages with sufficient digital footprints. The current limitations of many languages’ digital presence stem from a lack of access to digital services, hindering data collection for training purposes. Initiatives promoting high-speed broadband and internet-enabled devices can bridge the gap between languages by tackling this digital divide.

Strengthening GenAI's language capabilities extends beyond just the number of languages sourced in its development. It is crucial to incorporate language variations and dialects in the training of advanced technology. Biases against non-standard forms of language can be just as detrimental to patient safety as various language limitations. Limiting GenAI's exposure to language variations can lead to unintended biases. For GenAI to effectively detect abnormalities and concerns related to patient outcomes, it must be able to understand real-world conversations, including vernacular, slang and code-switching.

Ensuring Equitable Global Outcomes with Generative Artificial Intelligence

As GenAI takes root in the life sciences industry, acknowledging its limitations alongside its potential is critical for future success. For decades, healthcare and life sciences have faced challenges reaching diverse populations and improving research participant demographics.

Studies continue to reveal a concerning lack of representation even as organizations make concerted efforts at increasing access to diverse populations. These stark misrepresentations perpetuate global health inequities and limit the lifesaving potential of new therapeutics. The industry already struggles with existing underrepresentation and accessibility concerns in patient recruitment. As such, not recognizing GenAI's current language limitations will only deepen these existing problems.

GenAI holds immense promise for revolutionizing healthcare and life sciences, but its current language capabilities pose a significant barrier to achieving equitable patient outcomes. By expanding access to multilingual and diverse patient data, increasing the global availability of digital services and embracing language variations, organizations can take steps to bridge the digital language divide.

Addressing these shortcomings now will ensure GenAI's transformative power can create a future where healthcare and life sciences advancements benefit all populations, regardless of language.


About the Author: Sanmugam Aravinthan is the Senior Director, Development at IQVIA Vigilance Detect. As Senior Director, Development of IQVIA’s Vigilance Detect, Aravinthan’s main area of focus is on driving the technology development and delivery of a productized solution that enables optimized approach in detecting adverse events, product quality complaints and other safety risks in large-scale structured and unstructured data. He has 20+ years of industry experience in driving Software Engineering and Systems Development, with the past 10 years in the pharmaceutical and life sciences industries. He has a strong track record in directing software product development, managing technology delivery of clients and leading pharmacovigilance operations in client implementations. He has a US patent titled “System and method for multi-dimensional profiling of healthcare professionals.”

AIwire