Can a specialized AI model steer doctors toward the right scan?

Fine-tuned GPT-enhanced system outperforms general purpose models on radiology guideline alignment—with important caveats for clinical use

Intelligent Medicine

Similarity score distribution. — image:
Response similarity scores distribution. The distribution of response similarity scores for model (a) AMIR-GPT (b) GPT-4 (c) GPT-3.5 (d) Gemini is displayed from left to right, respectively
view more
Credit: Intelligent Medicine

Medical imaging is important in healthcare; however, its overutilization can contribute to resource wastage and can cause harm to patients. While various guidelines are available for its appropriate utilization, their adoption remains a challenge. Now, a new study in Intelligent Medicine finds that domain-specific adaptation may help improve AI-assisted imaging recommendations, pointing to a new direction for value-based clinical decision support.

Every year up to 30% of medical imaging studies ordered in the United States are considered unnecessary. This issue wastes resources, strains healthcare systems, and exposes patients to avoidable risks from radiation. Despite the existence of evidence-based appropriateness guidelines, translating them consistently into day-to-day clinical decisions remains difficult. A new study published in journal Intelligent Medicine this February suggests that large language models adapted to specific clinical domains may offer a meaningful path forward.

The research team, based at Beijing Friendship Hospital and collaborating institutions, developed a model called the Appropriate Medical Imaging Recommendations Generative Pre-trained Transformer (AMIR-GPT). Rather than relying on a general-purpose AI system, they asked whether targeted fine-tuning on structured radiology guidance could produce more accurate, guideline-aligned imaging recommendations for common clinical scenarios.

“Overutilization of medical imaging is not just a cost problem. It reflects a gap between the best available evidence and what happens in practice. Our goal was to explore whether a domain-specific AI model could help bridge that gap in a way that supports clinicians, not replaces them,” says Han Lyu, M.D., corresponding author of the study and associate professor at the Department of Radiology, Beijing Friendship Hospital, Capital Medical University.

Building and testing the model

To train AMIR-GPT, the researchers curated 1,036 question-and-answer pairs derived from 26 guidelines in the American College of Radiology Appropriateness Criteria (ACR AC), covering a broad range of common clinical indications, including low back pain, trauma, fractures, abdominal pain, cancer screening and staging, gastrointestinal bleeding, hearing related complaints, and pediatric fever. Of the 1,036 entries, 932 were used for model training across four iterations, with the remaining 104 reserved for testing.

AMIR-GPT was benchmarked against GPT-4, GPT-3.5, and Gemini using the same test questions. Responses were scored on a 1 to 5 scale for similarity to standard answers through an automated assessment by GPT-3.5 and by two expert radiologists.

What the results show

In the most stringent performance category, perfect agreement with standard guideline answers (score 5 out of 5), AMIR-GPT achieved the highest proportion among all models evaluated, at 33.3% of test responses. This compares to 16.7% for GPT-4, 6.2% for GPT-3.5, and 6.2% for Gemini. The overall difference among models was statistically significant (ANOVA: f = 6.49, P = 0.0004). Pairwise testing confirmed a significant advantage for AMIR-GPT over GPT-3.5 (P = 0.018).

However, the picture was more nuanced across other performance bands. When high match (score 4 out of 5), medium match (score 3 out of 5) and low match (score less than 3) are considered, the general purpose models were still competitive to AMIR-GPT. This finding matters for interpreting the study's claims. In medical AI evaluation, model ranking depends on whether the benchmark emphasizes exact guideline adherence or partial alignment. In clinical practice, that distinction is not merely academic. A fluent answer is not the same as a clinically appropriate one.

Qualitative review reinforced this point. In one higher-scoring example, AMIR-GPT correctly identified magnetic resonance imaging (MRI) without intravenous contrast as the appropriate first-line imaging study for a surgical candidate with subacute low back pain after six weeks of conservative management. This is consistent with ACR guidance and clinically meaningful. However, lower-scoring outputs revealed familiar risks in medical AI: omissions and deviations from standard recommendations, and in one case, an incorrect characterization of computed tomography (CT) enterography that failed to account for the potential masking of upper gastrointestinal bleeding by oral contrast agents.

Promising direction, preliminary evidence

The study positions domain specific fine tuning as a potentially useful strategy for improving AI performance in specialized clinical tasks. But the authors are careful not to overstate the implications.

The dataset covered only a subset of published ACR criteria, limiting the model's exposure to rarer or more complex clinical scenarios. Outputs that are inaccurate, fabricated, or off-target remain a barrier to unsupervised clinical deployment.

“This is a step toward AI as a collaborative tool in medicine, but responsible integration requires broader datasets, stronger evaluation methods, and validation across diverse real-world settings before these systems can be trusted more widely,” says Dr. Lyu.

Future work will focus on expanding training data to cover a broader range of ACR guidelines and more complex cases, incorporating real-time error correction mechanisms, and exploring applicability in electronic health record analysis and broader clinical decision support.

Importance

The findings contribute to a growing body of evidence suggesting that high performance in healthcare AI may require more than scaling general purpose models. Domain-specific adaptation, disciplined alignment with the standards, evidence structures, and reasoning patterns of a particular medical field, may be just as important as model size.

About the authors

Dr. Han Lyu (吕晗) is an Associate Chief Physician and Associate Professor of Radiology at Beijing Friendship Hospital, Capital Medical University. He specializes in advanced neuroimaging, brain structural-functional networks, tinnitus mechanisms, cerebral perfusion, and AI-enhanced medical diagnostics, with notable contributions to brain aging and neurodegeneration research. He is a former visiting scholar at Stanford University. Email: chrislvhan@126.com

Prof. Wang Zhenchang (王振常) is a distinguished medical imaging expert and Academician of the Chinese Academy of Engineering. He is affiliated with the Department of Radiology, Beijing Friendship Hospital, Capital Medical University, and leads pioneering work in ultra-high-resolution CT (world’s first 50 μm bone-specific scanner) for auditory and visual systems, as well as AI integration in medical imaging and diagnostics. Email: cjr.wzhch@vip.163.com

About the journal
Intelligent Medicine is a peer-reviewed, open-access journal focusing on the integration of AI, data science, and digital technology in clinical medicine and public health. It is published by the Chinese Medical Association in partnership with Elsevier. To learn more about Intelligent Medicine, please visit https://www.sciencedirect.com/journal/intelligent-medicine

Funding information
This study was partially supported by the National Natural Science Foundation of China (62171297, 61931013). The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Journal

Intelligent Medicine

DOI

10.1016/j.imed.2025.03.005

Method of Research

Computational simulation/modeling

Subject of Research

Not applicable

Article Title

Specific fine-tuned GPT-enhanced medical imaging diagnosis recommendations

HKUMed develops innovative AI tool: A single blood test can predict heart diseases up to 15 years before onset

The University of Hong Kong

image:
HKUMed develops a cardiovascular risk prediction tool that can accurately predict the future risk of six major cardiovascular diseases with a single blood test. The system can provide early warning signals up to 15 years before clinical onset. The research is led by Professor Zhang Qingpeng (left).
view more
Credit: HKU

A research team from the Department of Pharmacology and Pharmacy at the LKS Faculty of Medicine of the University of Hong Kong (HKUMed) has developed an innovative AI-based cardiovascular risk prediction tool, called CardiOmicScore. With a single blood test, the system can accurately forecast the future risk of six major cardiovascular diseases (CVDs): coronary artery disease, stroke, heart failure, atrial fibrillation, peripheral artery disease and venous thromboembolism. It can also provide early warning signals up to 15 years before clinical onset. The findings were published in Nature Communications [link to the publication].

AI-based multiomics integration reflects the body's real-time health status
CVDs remain the leading cause of death worldwide, accounting approximately 19.8 million fatalities in 2022 alone. In routine health assessments, physicians typically evaluate cardiovascular risk based on age, blood pressure, smoking and other conventional clinical indicators. However, these measures often fail to capture subtle and early biological changes before the disease becomes clinically apparent, leading to many patients missing the optimal window for preventive intervention. Although polygenic risk scores have become popular in recent years, genetic predisposition is largely fixed at birth and does not change over time. Consequently, polygenic risk scores cannot reflect the immediate impact on health conditions resulting from lifestyle or environmental changes. This creates an urgent need for tools that can capture a person’s current biological state and provide accurate, early warnings for CVDs.

To address this problem, the HKUMed research team applied deep learning techniques to integrate multiomics data, including genomics, metabolomics and proteomics, to develop the CardiOmicScore tool. The study was based on large-scale population data from the UK Biobank, analysing 2,920 circulating proteins and 168 metabolites measured from blood samples. These molecular signals act as ‘real-time recorders’ of the body, sensitively reflecting subtle changes in the immune system, metabolism, and vascular health.

Professor Zhang Qingpeng, Associate Professor in the Department of Pharmacology and Pharmacy at HKUMed, explained, ‘Genes determine where we start—they define our baseline health risk. However, proteins and metabolites reflect our current physical health. Our AI tool is designed to decode these complex molecular signals, enabling doctors and patients to identify risks much earlier, which can potentially change the trajectory of disease through timely lifestyle modifications and early prevention.’

Accurate prediction of six major cardiovascular diseases with 15‑year advance warning in high-risk groups
The results showed that CardiOmicScore transforms complex multiomics measurements into personalised risk scores with substantially improved predictive performance compared with conventional polygenic risk scores. When combined with clinical information such as age and gender, the model significantly enhanced the risk prediction accuracy of six common CVDs and can even flag elevated risk up to 15 years before symptoms appear.

This study marks a shift in precision medicine from a static, gene-centric paradigm towards a more dynamic, multiomics-based approach. In the future, a small-volume blood sample may be sufficient to generate a comprehensive cardiovascular risk profile for multiple diseases. Professor Zhang added, ‘We aim to leverage technology to identify and prevent diseases before they develop. By shifting health management from reactive treatment to proactive prediction and intervention, we aim to create a lasting impact for both public health and individual patient care.’

About the research team
The study was led by Professor Zhang Qingpeng, Associate Professor in the Department of Pharmacology and Pharmacy, HKUMed, and the HKU Musketeers Foundation Institute of Data Science (IDS). The first author is Luo Yan from the HKU IDS.

Media enquiries
Please contact LKS Faculty of Medicine of The University of Hong Kong by email (medmedia@hku.hk).

Journal

Nature Communications

DOI

10.1038/s41467-026-68956-6

Article Title

AI-based multiomics profiling reveals complementary omics contributions to personalized prediction of cardiovascular disease.

Using AI to improve standard-of-care cardiac imaging

UCSF-led research with deep neural networks enhances echocardiogram views of major cardiac conditions.

University of California San Francisco Medical Center

Heart disease is the leading cause of adult death worldwide, making cardiovascular disease diagnosis and management a global health priority. An echocardiogram, or cardiac ultrasound, is one of the most commonly used imaging tools employed by physicians to diagnose a variety of heart diseases and conditions. 

Most standard echocardiograms provide two-dimensional visual images (2D) of the three-dimensional (3D) cardiac anatomy. These echocardiograms often capture hundreds of 2D slices or views of a beating heart that can enable physicians to make clinical assessments about the function and structure of the heart.

To improve diagnostic accuracy of cardiac conditions, researchers from UC San Francisco set out to determine whether deep neural networks (DNNs), a type of AI algorithm, could be re-designed to better capture complex 3D anatomy and physiology from multiple imaging views simultaneously. They developed a new “multiview” DNN structure—or architecture—to enable it to draw information from multiple imaging views at once, rather than the current approach of using only a single view. They then trained demonstration DNNs using this architecture to detect disease states for three cardiovascular conditions: left and right ventricular abnormalities, diastolic dysfunction, and valvular regurgitation.

In a study published March 17 in Nature Cardiovascular Research, the researchers compared the performance of DNNs that analyzed data from either single view or multiple views of the echocardiograms from UCSF and the Montreal Heart Institute. They found that DNNs trained on multiple views improved diagnostic accuracy compared to DNNs trained on any single view, demonstrating that AI models combining information from multiple imaging views simultaneously better captured the disease state of these heart conditions.

“Until now, AI has primarily been used to analyze one 2D view at a time—from either images or videos—which limits an AI algorithm’s ability to learn disease-relevant information between views,” said senior study author Geoffrey Tison, MD, MPH, a cardiologist and co-director of the UCSF Center for Biosignal Research. “DNN architectures that can integrate information across multiple high-resolution views represent a significant step toward maximizing AI performance in medical imaging. In the case of echocardiography, most diagnoses necessitate considering information from more than one view because the information from any single view tells only part of the story.”

For example, for the assessment of left ventricle (LV) size or function, the echocardiogram view showing all the chambers of the heart at once (A4c) best captures certain left ventricular walls (inferoseptal and anterolateral walls), whereas another perpendicular echo view (A2c) captures other important walls (anterior and inferior walls). Often the function of LV walls may appear completely normal in one view but have significant dysfunction in another view. For the echocardiogram tasks they examined, such as identifying left and right ventricular abnormalities and diastolic dysfunction, the researchers’ results suggest that the multiview DNNs likely learn interrelated information between features from each view to achieve higher overall performance.

“Our multi-view neural network architecture is explicitly designed to enable the model to learn complex relationships between information in multiple imaging views,” said study first author Joshua Barrios, PhD, an assistant professor in the UCSF Division of Cardiology. “We find that this approach improves performance for diagnostic tasks in echocardiography, but this new AI architecture can also be applied to other medical imaging modalities where multiple views contain complimentary information.”

The researchers also found that averaging the predictions of three single-view DNNs improves performance beyond a single-view DNN while also being less computationally expensive, thus providing a viable alternative to training a multiview DNN. Comparatively, however, the multiview DNN provided the strongest performance.  They suggest that future research should examine how multiview DNN architectures may assist other medical tasks or imaging modalities.

Additional Authors: Minhaj U. Ansari, MS, Jeffrey E. Olgin, MD, Sean Abreau, MS, Jacques Delfrate, MS, Elodie L. Langlais, Robert Avram, MD, MS.

Funding: Support for this work was received from the National Institutes of Health: K23HL135274 (G.H.T.), R56HL161475 (G.H.T.), and DP2HL174046 (G.H.T.).

Disclosures: Please see the study.

About UCSF Health: UCSF Health is recognized worldwide for its innovative patient care, reflecting the latest medical knowledge, advanced technologies and pioneering research. It includes the flagship UCSF Medical Center, which is a top-ranked specialty hospital, as well as UCSF Benioff Children’s Hospitals, with campuses in San Francisco and Oakland; two community hospitals, UCSF Health St. Mary's and UCSF Health Saint Francis; Langley Porter Psychiatric Hospital; UCSF Benioff Children’s Physicians; and the UCSF Faculty Practice. These hospitals serve as the academic medical center of the University of California, San Francisco, which is world-renowned for its graduate-level health sciences education and biomedical research. UCSF Health has affiliations with hospitals and health organizations throughout the Bay Area. Visit http://www.ucsfhealth.org/. Follow UCSF Health on Facebook, Threads or LinkedIn.

Journal

Nature Cardiovascular Research

DOI

10.1038/s44161-026-00786-7

Method of Research

Computational simulation/modeling

Subject of Research

People

Article Title

“Multiview deep learning improves detection of major cardiac conditions from echocardiography"

Article Publication Date

17-Mar-2026

MSU study demonstrates faster discovery of therapeutic drugs through AI

Michigan State University College of Human Medicine

Inside the diseased cell, the genes are in chaos. Some are receiving signals to overproduce a protein. Others are reducing activity to abnormal levels. Up is down and down is up.

The right molecule could restore order, reversing dysregulation in specific genes. But finding the ideal compound could require examining millions of chemicals for their influence on hundreds or thousands of genes.

An MSU-led team of researchers has demonstrated a better way. Using machine learning trained on enormous amounts of published data, they were able to predict how chemicals will influence gene expression, based solely on the structure of the chemical.

Their study, recently published in the journal Cell, discovered compounds that are promising for treatment of two difficult diseases: the most aggressive form of liver cancer and a chronic lung disease with no curative options.

With implications for faster drug discovery, the findings result from years of work across multiple disciplines and institutes, said one of the senior authors, Bin Chen, associate professor at the College of Human Medicine in the departments of Pediatrics and Human Development and Pharmacology and Toxicology.

“So many people worked on this concept. We have over 20 researchers involved, and it's been a long journey,” said Chen, PhD, whose research focuses on developing computational methods and tools for drug discovery in collaboration with computer scientists, bench scientists and clinicians.

That interdisciplinary approach was key to this project. It began by training a “Gene expression profile Predictor on chemical Structures,” or GPS, on the millions of experimental measurements. Chen collaborated on this phase with another senior author, Jiayu Zhou, PhD, formerly at MSU now at the University of Michigan.

Chen compared the process to training a neural network to classify an image as a person, a cat or a dog.

“In our approach, instead of looking at cats or dogs, we want to know whether the compound is either going to regulate up or down the expression of a specific gene,” Chen said. “It’s still a classification problem, but more biologically driven.”

“But biological data are rarely clean,” said Zhou. “Imagine trying to learn from a huge pile of examples where some are clear, some are fuzzy, and some may even be misleading. Our approach helps the model separate stronger signals from weaker ones, so it can learn from the data without being thrown off by all the noise.”

After evaluating the data for theoretical application to multiple diseases, the team chose two for real-world testing. Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death worldwide. Idiopathic pulmonary fibrosis (IPF) is a chronic lung disease with a median survival rate of three years after diagnosis.

Both diseases need new therapeutics, and both hold intense interest for the researchers.

Chen previously found a deworming pill might be used to treat HCC. He and contributing authors Samuel So, MD, the Lui Hac Minh Professor, and Mei-Sze Chua PhD, senior research scientist, at the Asian Liver Center at Stanford University, have long collaborated toward the goal of developing a compound that benefits HCC patients.

“Our previous efforts were limited to repurposing FDA-approved drugs,” Chua said. “This new approach greatly expands the pool of novel compounds with potential therapeutic activity in HCC.”

“With the incidence of HCC continuing to increase in the USA, novel and more efficacious compounds that can target the molecular heterogeneity of HCC directly addresses an unmet clinical need,” said So.

Another senior author from MSU, Xiaopeng Li, PhD, associate professor in the Department of Pediatrics and Human Development in the College of Human Medicine, has focused his research on lung diseases such as IPF.

“We know this disease is hard to tackle,” Li said. “There have been so many failures to identify new drugs in the last 20 years. And I think the AI component helped us to probe the problem differently and more systemically.”

Discovering compounds in theory is one thing. They still must be validated in the real world, said Edmund Ellsworth, PhD, director of the MSU Medicinal Chemistry Facility and a professor in the Department of Pharmacology and Toxicology.

As a contributor to the study, Ellsworth and his team were responsible for creating related compounds discovered by the platform and optimizing them into safe and effective drugs. This critical step represents just the beginning of a complex process, he said.

“To move forward, it must be recognized that drug discovery is a team sport, and not for the faint of heart,” Ellsworth said. “It’s complicated, all sorts of things happen, and you need the diversity of experts to overcome and be successful.”

The compounds were tested on cell lines in the lab to confirm their influence on genes and to identify leading candidates for testing in living organisms.

When anti-HCC compounds were tested on mice, the team found two new compounds that reduced tumor size. For IPF, the team identified one repurposed drug and two new compounds that showed promise.

Testing compounds for IPF also started with mice, but expanded to samples of human lung tissue, thanks to a clinical-research collaboration with Corewell Health’s lung transplant program, located in Grand Rapids.

The program is the busiest in Michigan. And because pulmonary fibrosis is the leading indicator for lung transplants, the program had ample explants to share with researchers for testing as live cultures, said pulmonologist Reda Girgis, MD, medical director of the transplant program and a study contributor.

Girgis, who also is a professor in the College of Human Medicine, said the study illustrates the advancement possible through collaboration between Corewell and MSU.

“I think this is the best way to advance medical knowledge, for clinicians to work side by side with biologists, and now, computational people,” Girgis said. “That is really key to advance research."

The team has shared its code and developed a web portal for researchers to use GPS for virtual compound screening.

“It's like a paradigm shift approach for people to drive discovery,” Chen said. “And I want more people to test this approach. But most importantly, I want people really to be able to use it to discover new therapeutics.”

Li shared that ambition.

“I think it already has been proved that this platform can be applied to two totally different diseases,” he said. “So this platform can be used for other diseases, to just unleash the potential.”

The research was supported by National Institutes of Health, the National Science Foundation, a Michigan State University Strategic Partnership Grant, Corewell Health-Michigan State University Alliance Corporation, CJ Huang and Ha Lin Yip Foundation to the Asian Liver Center at Stanford University, and the Lui Hac Minh Foundation for Liver Cancer Research.

###

Michigan State University has been advancing the common good with uncommon will for more than 170 years. Among the world's top 100 universities and a leading U.S. public research institution, MSU pushes the limits of discovery and innovation to advance the state of Michigan and the nation, and make a better, safer, healthier world for all. The university provides life-changing educational opportunities through an inclusive academic community with more than 400 programs of study and is the largest producer of talent for Michigan, educating more undergraduates than any other university in the state.

For generations, Spartans have been changing the world through research. Federal funding helps power many of the discoveries that improve lives and keep America at the forefront of innovation and competitiveness. From lifesaving cancer treatments to solutions that advance technology, agriculture, energy and more, MSU researchers work every day to shape a better future for the people of Michigan and beyond. Learn more about MSU’s research impact powered by partnership with the federal government.

For MSU news on the web, go to MSUToday or x.com/MSUnews.

Journal

Cell

DOI

10.1016/j.cell.2026.02.016

Method of Research

Computational simulation/modeling

Subject of Research

Not applicable

Article Title

Deep-learning-based de novo discovery and design of therapeutics that reverse disease-associated transcriptional phenotypes

Article Publication Date

17-Mar-2026

LA REVUE GAUCHE - Left Comment

Thursday, March 19, 2026

Can a specialized AI model steer doctors toward the right scan?

Journal

DOI

Method of Research

Subject of Research

Article Title

HKUMed develops innovative AI tool: A single blood test can predict heart diseases up to 15 years before onset

Journal

DOI

Article Title

Using AI to improve standard-of-care cardiac imaging

Journal

DOI

Method of Research

Subject of Research

Article Title

Article Publication Date

MSU study demonstrates faster discovery of therapeutic drugs through AI

Journal

DOI

Method of Research

Subject of Research

Article Title

Article Publication Date

No comments:

Post a Comment