Covering Scientific & Technical AI | Sunday, December 22, 2024

MIT Researchers Develop New Technique to Reduce Bias and Improve Accuracy in ML Models 

Machine learning (ML) has the potential to transform decision-making in healthcare by leveraging vast amounts of data for predictive insights. However, a key challenge emerges when these models are trained on datasets that inadequately represent all demographic groups.

A model that predicts a treatment plan for someone with a disease may be trained on a dataset that predominantly contains male patients. This can lead to inaccurate predictions for female patients. Such biases may result in harmful recommendations, especially for underrepresented groups.

One solution is to adjust or balance the training dataset to ensure all subgroups are equally represented. However, this data balancing approach introduces added complexity and may also reduce the model’s overall performance. Additionally, this method may require access to training group annotations and can end up removing large portions of the dataset. 

Researchers from MIT have taken a different approach. They have developed a new technique that identifies and removes specific points in the training dataset that most impact the model's poor performance on underrepresented groups. 

Instead of assuming every data point contributes equally to the model’s performance, this technique recognizes that certain points are disproportionately influencing the model's biased predictions.

The researcher's Data Debiasing with Datamodels (D3M) starts by using a metric called worst-group error, which measures how poorly a model performs on certain subpopulations. The model then improves the performance by using a framework they call Datamodelling, which approximates the predictions as simple functions of the train data. This allows them to quantify how individual data points influence worst-group performance. 

Using this method the researcher can identify the most problematic data points. However, rather than removing large portions of data, this model selectively removes only the most harmful data. If the underrepresented data doesn’t exist, the ML model can’t add data points, but it can remove the specific data points that are causing bias. 

In cases where underrepresented data is missing or not labeled, D3M’s approach can still uncover hidden biases by analyzing the data itself, making it a powerful tool for improving fairness even with limited or unlabeled data.

“Many other algorithms that try to address this issue assume each datapoint matters as much as every other datapoint. In this paper, we are showing that the assumption is not true. There are specific points in our dataset that are contributing to this bias, and we can find those data points, remove them, and get better performance,” says Kimia Hamidieh, an electrical engineering and computer science (EECS) graduate student at MIT and co-lead author of a paper published on arXiv.

Hamidieh co-authored the paper with Saachi Jain, Kristian Georgiev, Andrew Ilyas, and senior authors Marzyeh Ghassemi and Aleksander Madrt, all from MIT. The research will be presented at the Conference on Neural Information Processing Systems.

The researchers' new technique builds on their previous work, where they developed a method called TRAK, which identifies the most influential training examples for a particular model output.

The MIT team claims that the D3M method improved worst-group accuracy while removing about 20,000 fewer training samples than traditional data balancing methods. 

“This is a tool anyone can use when they are training a machine-learning model," says Hamidieh. "They can look at those data points and see whether they are aligned with the capability they are trying to teach the model.” 

The researchers plan to validate this method and further develop it through future human studies. One of their goals is to make the method easy to use and accessible for healthcare professionals so it can be deployed in real-world environments. 

According to Ilyas, the co-author of the paper, “When you have tools that let you critically look at the data and figure out which data points are going to lead to bias or other undesirable behavior, it gives you a first step toward building models that are going to be more fair and more reliable.”

The findings of the research could help address a persistent issue with AI and ML models: they are only as effective as the data they are trained on. If data points that degrade the overall performance of the AI model can be identified and removed through a scalable algorithm, especially for large datasets, it could be a game-changer for improving model accuracy and reliability across various applications.

AIwire