Thursday, March 26, 2026

  

MIT researchers show how to create “humble” AI



The MIT-led team is designing artificial intelligence systems for medical diagnosis that are more collaborative and forthcoming about uncertainty.



Massachusetts Institute of Technology





CAMBRIDGE, MA -- Artificial intelligence holds promise for helping doctors diagnose patients and personalize treatment options. However, an international group of scientists led by MIT cautions that AI systems, as currently designed, carry the risk of steering doctors in the wrong direction because they may overconfidently make incorrect decisions.

One way to prevent these mistakes is to program AI systems to be more “humble,” according to the researchers. Such systems would reveal when they are not confident in their diagnoses or recommendations and would encourage users to gather additional information when the diagnosis is uncertain.

“We’re now using AI as an oracle, but we can use AI as a coach. We could use AI as a true co-pilot. That would not only increase our ability to retrieve information but increase our agency to be able to connect the dots,” says Leo Anthony Celi, a senior research scientist at MIT’s Institute for Medical Engineering and Science, a physician at Beth Israel Deaconess Medical Center, and an associate professor at Harvard Medical School.

Celi and his colleagues have created a framework that they say can guide AI developers in designing systems that display curiosity and humility. This new approach could allow doctors and AI systems to work as partners, the researchers say, and help prevent AI from exerting too much influence over doctors’ decisions.

Celi is the senior author of the study, which appears today in BMJ Health and Care Informatics. The paper’s lead author is Sebastián Andrés Cajas Ordoñez, a researcher at MIT Critical Data, a global consortium led by the Laboratory for Computational Physiology within the MIT Institute for Medical Engineering and Science.

Instilling human values

Overconfident AI systems can lead to errors in medical settings, according to the MIT team. Previous studies have found that ICU physicians defer to AI systems that they perceive as reliable even when their own intuition goes against the AI suggestion. Physicians and patients alike are more likely to accept incorrect AI recommendations when they are perceived as authoritative.

In place of systems that offer overconfident but potentially incorrect advice, health care facilities should have access to AI systems that work more collaboratively with clinicians, the researchers say.

“We are trying to include humans in these human-AI systems, so that we are facilitating humans to collectively reflect and reimagine, instead of having isolated AI agents that do everything. We want humans to become more creative through the usage of AI,” Cajas Ordoñez says.

To create such a system, the consortium designed a framework that includes several computational modules that can be incorporated into existing AI systems. The first of these modules requires an AI model to evaluate its own certainty when making diagnostic predictions. Developed by consortium members Janan Arslan and Kurt Benke of the University of Melbourne, the Epistemic Virtue Score acts as a self-awareness check, ensuring the system’s confidence is appropriately tempered by the inherent uncertainty and complexity of each clinical scenario.

With that self-awareness in place, the model can tailor its response to the situation. If the system detects that its confidence exceeds what the available evidence supports, it can pause and flag the mismatch, requesting specific tests or history that would resolve the uncertainty, or recommending specialist consultation. The goal is an AI that not only provides answers but also signals when those answers should be treated with caution.

“It’s like having a co-pilot that would tell you that you need to seek a fresh pair of eyes to be able to understand this complex patient better,” Celi says.

Celi and his colleagues have previously developed large-scale databases that can be used to train AI systems, including the Medical Information Mart for Intensive Care (MIMIC) database from Beth Israel Deaconess Medical Center. His team is now working on implementing the new framework into AI systems based on MIMIC and introducing it to clinicians in the Beth Israel Lahey Health system.

This approach could also be implemented in AI systems that are used to analyze X-ray images or to determine the best treatment options for patients in the emergency room, among others, the researchers say.

Toward more inclusive AI

This study is part of a larger effort by Celi and his colleagues to create AI systems that are designed by and for the people who are ultimately going to be most impacted by these tools. Many AI models, such as MIMIC, are trained on publicly available data from the United States, which can lead to the introduction of biases toward a certain way of thinking about medical issues, and exclusion of others.

Bringing in more viewpoints is critical to overcoming these potential biases, says Celi, emphasizing that each member of the global consortium brings a distinct perspective to a broader, collective understanding.

Another problem with existing AI systems used for diagnostics is that they are usually trained on electronic health records, which weren’t originally intended for that purpose. This means that the data lack much of the context that would be useful in making diagnoses and treatment recommendations. Additionally, many patients never get included in those datasets because of lack of access, such as people who live in rural areas.

At data workshops hosted by MIT Critical Data, groups of data scientists, health care professionals, social scientists, patients, and others work together on designing new AI systems. Before beginning, everyone is prompted to think about whether the data they’re using captures all the drivers of whatever they aim to predict, ensuring they don’t inadvertently encode existing structural inequities into their models.

“We make them question the dataset. Are they confident about their training data and validation data? Do they think that there are patients that were excluded, unintentionally or intentionally, and how will that affect the model itself?” he says. “Of course, we cannot stop or even delay the development of AI, not just in health care, but in every sector. But, we must be more deliberate and thoughtful in how we do this.”

###

The research was funded by the Boston-Korea Innovative Research Project through the Korea Health Industry Development Institute.

Virtual reality shown to improve medical students' understanding of head and neck anatomy



Pilot study finds VR-based learning boosts anatomical knowledge and confidence regardless of prior technology experience



American Academy of Otolaryngology - Head and Neck Surgery





A new study published in OTO Open, the open-access journal of the American Academy of Otolaryngology–Head and Neck Surgery Foundation (AAO-HNSF), finds that a standardized virtual reality (VR) educational experience improved medical students' knowledge and confidence in head and neck anatomy—and did so regardless of students' prior experience with VR or video gaming.

“The anatomy of the head and neck is one of the most spatially complex regions in medicine. Virtual reality gives learners the ability to step inside that anatomy and explore it in three dimensions in a way that textbooks and static images simply cannot. What’s also exciting is that these immersive learning tools can be accessible and beneficial for all medical trainees,” said corresponding author Michael Yim, MD, Otolaryngology Program Director and Associate Professor of Otolaryngology and Neurosurgery at LSU Health Shreveport.

This pilot study evaluated whether a commercially available VR platform could serve as an effective supplement to traditional cadaveric anatomy training. Twenty-one medical students, all of whom had previously completed a formal cadaveric head and neck anatomy course, participated in a guided, immersive VR session.

The VR platform received high ratings from participants for control, sensory immersion, and realism, with minimal distraction or frustration reported. Standardized assessments of workload and presence—the NASA Task Load Index and Presence Questionnaire—confirmed that students were able to engage effectively with the virtual environment with low stress and high perceived success, even those with no prior VR experience.

The authors note that this is the first study to evaluate a VR adjunct specifically for head and neck anatomy education and call for larger, multi-institutional studies and prospective trials comparing VR-based learning directly to conventional teaching methods.

Study Citation: Alvarez, I., Johnson, E., Latour, M. and Yim, M.T. (2026), Next Dimension Medical Education: A Pilot Study Exploring Virtual Reality in Head and Neck Anatomy. OTO Open, 10: e70217. https://doi.org/10.1002/oto2.70217

OTO Open

OTO Open is the official open-access journal of the American Academy of Otolaryngology--Head and Neck Surgery Foundation. Its mission is to publish clinically relevant, contemporary, and ethical research in otolaryngology--head and neck surgery that advances patient care and supports the global medical community through free and unrestricted access to peer-reviewed science.

About the AAO-HNS/F
The AAO-HNS/F is one of the world’s largest organizations representing specialists who treat the ears, nose, throat, and related structures of the head and neck. Otolaryngologist-head and neck surgeons diagnose and treat medical disorders that are among the most common affecting patients of all ages in the United States and around the world. Those medical conditions include chronic ear disease, hearing and balance disorders, hearing loss, sinusitis, snoring and sleep apnea, allergies, swallowing disorders, nosebleeds, hoarseness, dizziness, and tumors of the head and neck as well as aesthetic and reconstructive surgery and intricate micro-surgical procedures of the head and neck. The Academy has approximately 13,000 members. The AAO-HNS Foundation works to advance the art, science, and ethical practice of otolaryngology-head and neck surgery through education, research, and quality measurement.

 

New study argues AI is reopening the “end of history” and forcing a fundamental rethink of education



Research proposes educational reconfiguration as the key to rebuilding trust and legitimacy in the age of artificial intelligence




ECNU Review of Education





Since the wide adoption of generative AI systems after 2022, societies worldwide have entered an intelligence transition period. AI has reopened the “End of History” by creating new ideological alternatives, promoting competition between different governance models and reshaping the foundations of national legitimacy.

A study, which was made available online on February 17, 2026 in  ECNU Review of Education, reexamines Francis Fukuyama’s “End of History” thesis in the light of recent AI breakthroughs. The research, led by Professor Yilei Shao from East China Normal University employs a problem-analysis-solution structure to explain how AI alters political legitimacy, national capacity, and the human condition. Using interdisciplinary analysis, the study identifies three singular transformations driven by AI, explores three structural gaps, and proposes dual-track educational reconfiguration to rebuild trust and institutional resilience.

“At precisely this moment, more than ever before, we need new forms of explanation, ability, legitimacy, and governance to fill the void of thoughts, trust, and policy,” said Prof. Shao. “This is the fundamental reason for the humanities and social sciences, and education to reconstitute themselves as the new-quality infrastructure’ of a humanmachine symbiotic society in front of us all.

The study repositions education as the decisive mechanism for resolving the crisis of technological legitimacy, arguing that the most urgent task for education in the age of AI is to redefine its position and function within the social system.

The author calls for a deeper structural transformation. It proposes that future education should focus on cultivating two essential capacities: civic literacy for the AI society and the ability for human-AI collaboration. “Education must now shoulder the fundamental task of guiding societies through legitimacy crises, rebuilding public trust, and cultivating a new civic literacy for the AI era.” Prof. Shao concluded.

 

***

 

Reference
DOI: 10.1177/20965311261422769

 

Funding information
This work was supported by the Shanghai Municipal Education Commission's “AI-Driven Scientific Research Paradigm Reform for Disciplinary Advancement Program” (Grant No. 2024AI01005).

 

Deepfake x-rays fool radiologists and AI




Radiological Society of North America
Anatomy-matched real and GPT-4o-generated X-rays 

image: 

Anatomy-matched real and GPT-4o-generated radiographs: (A) real and (B) GPT-4o-generated posteroanterior chest radiographs, (C) real and (D) GPT-4ogenerated lateral cervical spine radiographs, (E) real and (F) GPT-4o-generated posteroanterior hand radiographs, and (G) real and (H) GPT-4o-generated lateral lumbar spine radiographs. The pairs demonstrate that GPT-4o can produce radiographically plausible images across different anatomic regions.

view more 

Credit: Radiological Society of North America (RSNA)





OAK BROOK, Ill. – Neither radiologists nor multimodal large language models (LLMs) are able to easily distinguish artificial intelligence (AI)-generated “deepfake” X-ray images from authentic ones, according to a study published today in Radiology, a journal of the Radiological Society of North America (RSNA). The findings highlight the potential risks associated with AI-generated X-ray images, along with the need for tools and training to protect the integrity of medical images and prepare health care professionals to detect deepfakes.

The term “deepfake” refers to a video, photo, image or audio recording that appears real but has been created or manipulated using AI.

“Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present,” said lead study author Mickael Tordjman, M.D., post-doctoral fellow, Icahn School of Medicine at Mount Sinai, New York. “This creates a high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one. There is also a significant cybersecurity risk if hackers were to gain access to a hospital’s network and inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of the digital medical record.”

Seventeen radiologists from 12 different centers in six countries (United States, France, Germany, Turkey, United Kingdom and United Arab Emirates) participated in the retrospective study. Their professional experience ranged from 0 to 40 years. Half of the 264 X-ray images in the study were authentic, and the other half were generated by AI. Radiologists were evaluated on two distinct image sets, with no overlapping between the datasets. The first dataset included real and ChatGPT-generated images of multiple anatomical regions. The second dataset included chest X-ray images—half authentic and the other half created by RoentGen, an open-source generative AI diffusion model developed by Stanford Medicine researchers.

When radiologist readers were unaware of the study’s true purpose, yet asked after ranking the technical quality of each ChatGPT image if they noticed anything unusual, only 41% spontaneously identified AI-generated images. After being informed that the dataset contained synthetic images, the radiologists’ mean accuracy in differentiating the real and synthetic X-rays was 75%.

Individual radiologist performance in accurately detecting the ChatGPT-generated images ranged from 58% to 92%. Similarly, the accuracy of four multimodal LLMs—GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta)—ranged from 57% to 85%. Even ChatGPT-4o, the model used to create the deepfakes, was unable to accurately detect all of them, though it identified the most by a considerable margin compared to Google and Meta LLMs.

Radiologist accuracy in detecting the RoentGen synthetic chest X-Rays ranged from 62% to 78% and the LLM models’ performance ranged from 52% to 89%.

There was no correlation between a radiologist’s years of experience and their accuracy in detecting synthetic X-ray images. However, musculoskeletal radiologists demonstrated significantly higher accuracy than other radiology subspecialists.

The study identified common features of synthetic X-rays.

"Deepfake medical images often look too perfect,” Dr. Tordjman said. “Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone." 

Recommended solutions to clearly distinguish real and fake images and help prevent tampering include implementing advanced digital safeguards, such as invisible watermarks that embed ownership or identity data directly into the images and automatically attaching technologist-linked cryptographic signatures when the images are captured.

“We are potentially only seeing the tip of the iceberg,” Dr. Tordjman said. “The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI. Establishing educational datasets and detection tools now is critical.”

The study’s authors have published a curated deepfake dataset with interactive quizzes for educational purposes.

Examples of GPT-4o-generated X-rays of fractures 

Examples of GPT-4o-generated radiographs of fractures: (A) posteroanterior radiograph of the hand, (B) posteroanterior radiograph of the lower leg, and (C) medial oblique radiograph of the foot. The images show fracture lines (arrow) that are unusually smooth, clean, and consistent and, in the case of B, unicortical. The presence of these idealized fracture lines, characterized by unnatural smoothness and incomplete cortical disruption, could serve as a primary diagnostic cue for identifying artificial intelligence–generated trauma images.

Credit

Radiological Society of North America (RSNA)


“The Rise of Deepfake Medical Imaging: Radiologists’ Diagnostic Accuracy in Detecting ChatGPT-generated Radiographs.” Collaborating with Dr. Tordjman were Murat Yuce, M.D., M.S., Amine Ammar, M.D., Mingqian Huang, M.D., Fadila Mihoubi Bouvier, M.D., Maxime Lacroix, M.D., Anis Meribout, M.D., Ian Bolger, M.S., Efe Ozkaya, Ph.D., Himanshu Joshi, Ph.D., Amine Geahchan, M.D., Rayane El Rahi, M.D., Haidara Almansour, M.D., Ashwin Singh Parihar, M.D., Carolyn Horst, M.D., Samet Ozturk, M.D., Muhammed Edip Isleyen, M.D., Gul Gizem Pamuk, M.D., Ahmet Tan Cimilli, M.D., Timothy Deyer, M.D., Arvin Calinghen, M.D., Enora Guillo, M.D., Rola Husain, M.D., Jean-Denis Laredo, M.D., Zahi A. Fayad, Ph.D., Xueyan Mei, Ph.D., and Bachir Taouli, M.D., M.H.A.

Radiology is edited by Suhny Abbara, M.D., FACR, MSCCT, Mayo Clinic, Jacksonville, Florida, and owned and published by the Radiological Society of North America, Inc. (https://pubs.rsna.org/journal/radiology)

RSNA is an association of radiologists, radiation oncologists, medical physicists and related scientists promoting excellence in patient care and health care delivery through education, research and technologic innovation. The Society is based in Oak Brook, Illinois. (RSNA.org)

For patient-friendly information on X-ray and AI in medical imaging, visit RadiologyInfo.org.