Medical AI models need more context to prepare for the clinic
Marinka Zitnik outlines the challenges — and potential solutions
Harvard Medical School
Medical artificial intelligence is a hugely appealing concept. In theory, models can analyze vast amounts of information, recognize subtle patterns in data, and are never too tired or busy to provide a response. However, although thousands of these models have been and continue to be developed in academia and industry, very few of them have successfully transitioned into real-world clinical settings.
Marinka Zitnik, associate professor of biomedical informatics in the Blavatnik Institute at Harvard Medical School, and colleagues are exploring why — and how to close the gap between how well medical AI models perform on standardized test cases and how many issues the same models run into when they’re deployed in places like hospitals and doctors’ offices.
In a paper published Feb. 3 in Nature Medicine, the researchers identify a major contributor to this gap: contextual errors.
They explain that medical AI models may produce responses that are useful and correct to an extent but are not necessarily accurate for the specific context in which the models are being used — which include things like medical specialty, geographic location, and socioeconomic factors.
“This is not a minor fluke,” said Zitnik, who is also associate faculty at the Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University. “It is a broad limitation of all the types of medical AI models that we are developing in the field.”
In a conversation with Harvard Medicine News, Zitnik explains how contextual errors happen in medical AI models, how researchers might overcome this and other challenges, and what else she sees on the horizon for AI in medicine. This interview has been edited for length and clarity.
Harvard Medicine News: Why do contextual errors happen? How can they be fixed?
Marinka Zitnik: We think that they happen because important information for making clinical decisions is not contained in the datasets that are used to train medical AI models. The models then generate recommendations that seem reasonable and sensible but are not actually relevant or actionable for patients.
For medical AI models to perform better, they need to adapt their recommendations in real time based on specific contextual information. We suggest incorporating such information into the datasets used to train models. Additionally, we call for enhanced computational benchmarks (standardized test cases) to test models after training. Finally, we think information about context should be incorporated into the architecture, or structural design, of the models. These three steps will help ensure that models can take different contexts into account and that errors are detected before models are implemented in actual patient-care settings.
HMNews: You give three examples of how a lack of context can lead to errors in medical AI models. Can you expand on them?
Zitnik: Let’s start with medical specialties. Patients may have complex symptoms that span multiple specialties. If a patient comes to the emergency department with neurological symptoms and breathing problems, they might be referred to a neurologist followed by a pulmonologist. Each specialist brings deep expertise shaped by their training and experience, so understandably focuses on their own organ system. An AI model trained mostly on one specialty might do the same, meaning it may provide answers based on data from the wrong specialty or miss that the combination of symptoms points to a multisystem disease.
Instead, we need to develop medical AI models trained in multiple specialties that can switch between contexts in real time to focus on whatever information is most relevant.
HMNews: What about the context of geography?
Zitnik: If a model is presented with the same question in different geographic locations and gives the same answer, that answer is likely to be incorrect because each place will have specific conditions and constraints. If a patient is susceptible to a disease that could lead to organ dysfunction or failure, the clinician would need to figure out the patient’s risk and develop a plan to manage it. However, whether that patient is in South Africa, the United States, or Sweden may make a big difference in terms of how common that disease is and what treatments and procedures are approved and available.
We envision a model that can incorporate geographic information to produce location-specific, and therefore more accurate, responses. We are working on this in our lab, and we think it could have major implications for global health.
HMNews: And the third example, the socioeconomic and cultural factors that affect a patient’s behavior?
Zitnik: Say a patient shows up in the emergency department with severe symptoms after they were previously referred to an oncologist and never made an appointment. A typical response from the ED physician might be to remind the patient to schedule the oncology appointment. However, this overlooks potential barriers such as the patient living far from the oncologist, not having reliable childcare, or not being able to miss work. These types of constraints do not explicitly exist in the patient’s electronic health record, which means they also would not be factored in by an AI model that is helping to manage the patient.
A better model would take these factors into account to offer a more realistic recommendation, perhaps by providing an option for transportation or scheduling an appointment at a time that accommodates childcare or work constraints. Such a model would increase access to care for a broader range of patients rather than reinforcing inequities.
HMNews: What other major challenges are there in medical AI implementation besides contextual errors?
Zitnik: There are many. One relates to how much patients, clinicians, regulatory agencies, and other stakeholders trust medical AI models. We need to identify mechanisms and strategies that both ensure that models are trustworthy and promote trust in these models. We think the answer has to do with building models that provide transparent and easily interpretable recommendations and that say “I don’t know” when they are not confident in their conclusions.
Another challenge relates to human-AI collaboration. Currently, many people think about human-AI interfaces in the context of chatbots, in which you type in a question and get a response. We need interfaces where people can receive responses tailored to their specific backgrounds and levels of expertise — for example, content suitable for a lay audience versus a medical expert. We also need interfaces where clinicians or patients and AI models exchange information in both directions. True collaboration means that there is a question or goal or task that an AI model has to complete, and to do that, it might need to seek more information from the user.
HMNews: What do you see as the promise of medical AI if the challenges can be overcome?
Zitnik: Some models have already had an impact by making every day medical work more efficient. For example, models are helping clinicians draft patient notes and helping researchers quickly find scientific papers that may be relevant to a clinical question.
I am especially excited about the opportunities AI models can create for improving treatment. Models that can switch contexts could adjust their outputs based on the information that is most useful during different parts of the treatment process. For example, a model might shift from analyzing symptoms to suggesting possible causes to providing evidence about treatments that worked in similar patients. A model might then pivot to providing practical information about a patient’s prior medications, potential drug side effects, and what treatments are actually available. If this is done well, it could help clinicians tailor treatment decisions for complex patients with multiple conditions and medications that may fall outside standard treatment guidelines.
HMNews: How do we ensure that medical AI models are doing more good than harm?
Zitnik: I certainly think that AI in health care is here to stay. This technology, while imperfect, is already being used, so everyone in the medical AI community needs to work together to ensure that it is being developed and implemented in a responsible way. This includes considering real-world applications as we design and refine models, doing real-world testing to understand where models succeed and where they fall short, and developing guidelines for how models should be deployed. I feel optimistic that if AI researchers are aligned in the development of these models and we ask the right questions, we can detect any issues early on.
Ultimately, I think there will be many opportunities for medical AI models to improve the efficiency of medical research and clinical work and improve care for patients.
Authorship, Funding, Disclosures
Additional authors on the paper include Michelle M. Li, Ben Y. Reis, Adam Rodman, Tianxi Cai, Noa Dagan, Ran D. Balicer, Joseph Loscalzo, and Isaac S. Kohane.
Support for the research was provided by the Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at HMS and Clalit Research Institute; the National Institutes of Health (grant R01HD108794); the National Science Foundation (CAREER 2339524); the Department of Defense (FA8702-15-D-0001); ARPA-H (BDF program); the Chan Zuckerberg Initiative; the Bill & Melinda Gates Foundation INV-079038; Amazon Faculty Research; the Google Research Scholar Program; AstraZeneca Research; the Roche Alliance with Distinguished Scientists; Sanofi iDEA-TECH; Pfizer Research; the John and Virginia Kaneb Fellowship Award at HMS; a Biswas Family Foundation Computational Biology Grant; a Dean’s Innovation Award for the Use of Artificial Intelligence in Education, Research, and Administration at HMS; the Harvard Data Science Initiative; and the Kempner Institute.
Journal
Nature Medicine
Article Title
Scaling medical AI across clinical contexts
Article Publication Date
3-Feb-2026
No comments:
Post a Comment