AI cancer tools risk “shortcut learning” rather than detecting true biology
Deep learning pathology models caught cheating
University of Warwick
image:
Whole slide image illustrating the detection of key histological structures such as glands and cells. Credit: Dr Fayyaz Minhas / University of Warwick
view moreCredit: Dr Fayyaz Minhas / University of Warwick
New research warns that popular deep learning systems trained for cancer pathology may be relying on hidden shortcuts rather than genuine biological signals.
Artificial intelligence tools are increasingly being developed to predict cancer biology directly from microscope images, promising faster diagnoses, and cheaper testing. But new research from the University of Warwick, published in Nature Biomedical Engineering, suggests that many of these systems may be using visual shortcuts rather than true biology — raising concerns that some AI pathology tools are currently too unreliable for real-world patient care.
“It’s a bit like judging a restaurant’s quality by the queue of people waiting to get in: it’s a useful shortcut, but it’s not a direct measure of what’s happening in the kitchen.” says Dr Fayyaz Minhas, Associate Professor and principal investigator of the Predictive Systems in Biomedicine (PRISM) Lab in the Department of Computer Science, University of Warwick, and lead author of the study. “Many AI pathology models are doing the same thing, relying on correlations between biomarkers or on obvious tissue features, rather than isolating biomarker-specific signals. And when conditions change, these shortcuts often fall apart.”
To reach this conclusion, the researchers analysed more than 8,000 patient samples across four major cancer types — breast, colorectal, lung and endometrial — and compared the performance of leading machine learning approaches. While the models often achieved high headline accuracy, the team found this frequently came from statistical “shortcuts.”
For example, instead of detecting mutations in the cancer-associated BRAF gene, a model might learn that BRAF mutations often occur alongside another clinical feature such as microsatellite instability (MSI). The system then learns to use this combination of cues to predict BRAF status rather than learning the causal BRAF signal itself - meaning accurate cancer predictions work only when these biomarkers co-occur and become unreliable when they do not.
Kim Branson, SVP Global Head of Artificial Intelligence and Machine Learning, GSK and co-author says: “We've found that predicting a BRAF mutation by looking at correlated features like MSI is often like predicting rain by looking at umbrellas—it works, but it doesn't mean you understand meteorology. Crucially, if a model cannot demonstrate information gain above a simple pathologist-assigned grade, we haven't advanced the field; we've just automated a shortcut. The roadmap for the next generation of pathology AI isn't necessarily bigger models; it’s stricter evaluation protocols that force algorithms to stop cheating and learn the hard biology."
When performance of AI models was assessed within stratified patient subgroups, such as only high-grade breast cancers or only MSI-positive tumours, accuracy fell substantially, revealing that the models were dependent on shortcut signals that disappear once confounding factors are controlled.
For certain prediction tasks, the performance advantage of deep learning over human-derived clinical information was modest. AI systems achieved accuracy scores of just over 80% when predicting biomarkers, compared with around 75% using tumour grade alone — a measure already assessed by pathologists.
Professor Nasir Rajpoot, Director of the Tissue Image Analytics (TIA) Centre at University of Warwick and CEO of Warwick spin-out Histofy said: “This study highlights a critical point about the rollout of AI in medicine: to deliver real and lasting impact, the value of AI-based clinically important predictions must be judged through rigorous, bias-aware evaluation, rather than relying solely on headline accuracies that fail to account for confounding effects.”
Machine learning methods can still prove valuable for research, drug development candidate screening and for clinical triaging, screening, or supplementary decision support. However, the researchers argue that future AI tools must move beyond correlation-based learning and adopt approaches that explicitly model biological relationships and causal structure. They also call for stronger evaluation standards, including subgroup testing and comparison against simple clinical baselines, before looking at deployment in routine care.
Dr Minhas concludes: “This research is not a condemnation of AI in pathology. It is a wake-up call. Current models may perform well in controlled settings but rely on statistical shortcuts rather than genuine biological understanding. Until more robust evaluation standards are in place, these tools should not be seen as replacements for molecular testing, and it is essential that clinicians and researchers understand their limitations and use them with appropriate caution.”
Coauthor, Prof. Sabine Tejpar, Head of Digestive Oncology at KU Leuven says: “Clinical relevance of novel tools requires grounded tailoring to what is precise, correct and feasible for the individual patient. Too often, oncology is swept up by ‘innovation’ with limited or no impact on patient care, driven more by what can be provided or sold than by rigorous assessment of what is truly relevant for individual patients and their specific features.
“While progress often requires imperfect first steps, we should learn from the past and avoid oversimplification or overreach through inappropriate concepts. Complexity and variability are central challenges — but they are also exactly what these novel technologies must learn to embrace.”
ENDS
Notes to Editors
For more information please contact:
Matt Higgs, PhD | Media & Communications Officer (Warwick Press Office)
Email: Matt.Higgs@warwick.ac.uk | Phone: +44(0)7880 175403
About the study
The paper, 'Confounding factors and biases abound when predicting molecular biomarkers from histological images' is published in Nature Biomedical Engineering. DOI: 10.1038/s41551-026-01616-8
The large-scale analysis was led by first author Dr Muhammad Dawood during his PhD at the University of Warwick, now a postdoctoral fellow at the University of Oxford
Why this matters
- Biomarkers guide treatment decisions. If AI tools confuse correlated signals, patients could receive inappropriate therapies.
- High accuracy scores can be misleading. This study shows why deeper validation is essential before clinical deployment.
- While AI promises faster and cheaper diagnostics, premature adoption could undermine confidence and lead to costly errors.
- The findings point to a shift toward causal, biology-aware AI models that better reflect how disease actually works.
About the University of Warwick
Founded in 1965, the University of Warwick is a world-leading institution known for its commitment to era-defining innovation across research and education. A connected ecosystem of staff, students and alumni, the University fosters transformative learning, interdisciplinary collaboration, and bold industry partnerships across state-of-the-art facilities in the UK and global satellite hubs. Here, spirited thinkers push boundaries, experiment, and challenge conventions to create a better world.
Journal
Nature Biomedical Engineering
Method of Research
Imaging analysis
Subject of Research
Human tissue samples
Article Title
'Confounding factors and biases abound when predicting molecular biomarkers from histological images
Article Publication Date
2-Mar-2026
COI Statement
MD conducted this study during his PhD at the University of Warwick, UK. MD received PhD studentship support from GSK Inc. KB is an employee of GSK Inc. NR is the founding Director, CEO and CSO of Histofy Ltd. FM holds shares in Histofy Ltd with no operational involvement. The authors declare no other competing interests.
No comments:
Post a Comment