Currently, many healthcare artificial intelligence (AI) models work well in certain settings, but may experience performance drops once deployed in other areas. Find out when and how medical AI and machine learning (ML) models fail or provide a solution for the shot …
Currently, many healthcare artificial intelligence (AI) models work well in certain settings, but may experience performance drops once deployed in other areas. Find out when and how the AI and machine learning medical models (ML) fail or provide a solution for clinical decision-making that is not effective in practice is the mission that researchers at the Carle Illinois College of Medicine (CI MED) of the University of Illinois (USA) decided to undertake.
“Every domain in healthcare uses machine learning in one way or another, which is why they are becoming the mainstay of computational diagnostics and prognostics in healthcare.“he recalled Yogatheesan Varatharajah, Research Assistant Professor in the Department of Bioengineering at the University of Illinois at Urbana-Champaign. “The problem is that when we do studies based on machine learning, to develop a diagnostic tool, the model works fine in a limited test environment, and at that point, it is considered ready to go. But when we implement it in routine practice to make clinical decisions in real time, many of these approaches don’t work as expected.“, he added.
Disconnection from the real world
One of the most common reasons for this difference between models and the real world is the natural variability between the data collected that is used to build a model and the data collected after a model is deployed. That variability could come from the hardware or protocol used to collect the data, or simply from differences between patients inside and outside the model. “These small differences can add up to significant changes in model predictions and potentially have a model that doesn’t help patients.” as explained by prof.Varatharajah.
Varatharajah and his students focused their efforts on machine learning models based on electrophysiological data from patients with neurological diseases. From there, the team looked at clinically relevant applications, such as comparing normal EEGs with abnormalities to determine if it was possible to differentiate between them.
“We look at what kind of variability can occur in the real world, especially those variabilities that could cause problems for machine learning models.”, indicated the expert. “Next, we model those variabilities and develop some ‘diagnostic’ measures to diagnose the models themselves, to know when and how they are going to fail. As a result, we can be aware of these errors and take steps to mitigate them ahead of time so that the models can help clinicians in clinical decision-making.”.
The importance of this work lies in iidentify the disconnect between the data that AI models are trained on, compared to real-world scenarios with which they interact when they are implemented in hospitals”, indicated, for his part, another co-author of the work, Sam Rawal. “Being able to identify such real-world scenarios, where models may fail or perform unexpectedly, can help guide their implementation and ensure they are used safely and effectively.“, he concluded.