MSU expert: How AI can help people understand research and increase trust in science
Michigan State University
EAST LANSING, Mich. – Have you ever read about a scientific discovery and felt like it was written in a foreign language? If you’re like most Americans, new scientific information can prove challenging to understand — especially if you try to tackle a science article in a research journal.
In an era when scientific literacy is crucial for informed decision-making, the abilities to communicate and comprehend complex content are more important than ever. Trust in science has been declining for years, and one contributing factor may be the challenge of understanding scientific jargon.
New research from David Markowitz, associate professor of communication at Michigan State University, points to a potential solution: using artificial intelligence, or AI, to simplify science communication. His work demonstrates that AI-generated summaries may help restore trust in scientists and, in turn, encourage greater public engagement with scientific issues — just by making scientific content more approachable. The question of trust is particularly important, as people often rely on science to inform decisions in their daily lives, from choosing what foods to eat to making critical heath care choices.
Responses are excerpts from an article originally published in The Conversation.
How did simpler, AI-generated summaries affect the general public’s comprehension of scientific studies?
Artificial intelligence can generate summaries of scientific papers that make complex information more understandable for the public compared with human-written summaries, according to Markowitz’s recent study, which was published in PNAS Nexus. AI-generated summaries not only improved public comprehension of science but also enhanced how people perceived scientists.
Markowitz used a popular large language model, GPT-4 by OpenAI, to create simple summaries of scientific papers; this kind of text is often called a significance statement. The AI-generated summaries used simpler language — they were easier to read according to a readability index and used more common words, like “job” instead of “occupation” — than summaries written by the researchers who had done the work.
In one experiment, he found that readers of the AI-generated statements had a better understanding of the science, and they provided more detailed, accurate summaries of the content than readers of the human-written statements.
How did simpler, AI-generated summaries affect the general public’s perception of scientists?
In another experiment, participants rated the scientists whose work was described in simple terms as more credible and trustworthy than the scientists whose work was described in more complex terms.
In both experiments, participants did not know who wrote each summary. The simpler texts were always AI-generated, and the complex texts were always human-generated. When I asked participants who they believed wrote each summary, they ironically thought the more complex ones were written by AI and simpler ones were written by humans.
What do we still need to learn about AI and science communication?
As AI continues to evolve, its role in science communication may expand, especially if using generative AI becomes more commonplace or sanctioned by journals. Indeed, the academic publishing field is still establishing norms regarding the use of AI. By simplifying scientific writing, AI could contribute to more engagement with complex issues.
While the benefits of AI-generated science communication are perhaps clear, ethical considerations must also be considered. There is some risk that relying on AI to simplify scientific content may remove nuance, potentially leading to misunderstandings or oversimplifications. There’s always the chance of errors, too, if no one pays close attention. Additionally, transparency is critical. Readers should be informed when AI is used to generate summaries to avoid potential biases.
Simple science descriptions are preferable to and more beneficial than complex ones, and AI tools can help. But scientists could also achieve the same goals by working harder to minimize jargon and communicate clearly — no AI necessary.
###
Michigan State University has been advancing the common good with uncommon will for more than 165 years. One of the world’s leading public research universities, MSU pushes the boundaries of discovery to make a better, safer, healthier world for all while providing life-changing opportunities to a diverse and inclusive academic community through more than 400 programs of study in 17 degree-granting colleges.
For MSU news on the web, go to MSUToday or x.com/MSUnews.
Journal
PNAS Nexus
Method of Research
Content analysis
Subject of Research
Not applicable
Article Title
From complexity to clarity: How AI enhances perceptions of scientists and the public's understanding of science
Q&A: Promises and perils of AI in medicine, according to UW experts in public health and AI
University of Washington
In most doctors’ offices these days, you’ll find a pattern: Everybody’s Googling, all the time. Physicians search for clues to a diagnosis, or for reminders on the best treatment plans. Patients scour WebMD, tapping in their symptoms and doomscrolling a long list of possible problems.
But those constant searches leave something to be desired. Doctors don’t have the time to sift through pages of results, and patients don’t have the knowledge to digest medical research. Everybody has trouble finding the most reliable information.
Optimists believe artificial intelligence could help solve those problems, but the bots might not be ready for prime time. In a recent paper, Dr. Gary Franklin, a University of Washington research professor of environmental & occupational health sciences and of neurology in the UW School of Medicine, described a troubling experience with Google’s Gemini chatbot. When Franklin asked Gemini for information on the outcomes of a specific procedure – a decompressive brachial plexus surgery – the bot gave a detailed answer that cited two medical studies, neither of which existed.
Franklin wrote that it’s “buyer beware when it comes to using AI Chatbots for the purposes of extracting accurate scientific information or evidence-based guidance.” He recommended that AI experts develop specialized chatbots that pull information only from verified sources.
One expert working toward a solution is Lucy Lu Wang, a UW assistant professor in the Information School who focuses on making AI better at understanding and relaying scientific information. Wang has developed tools to extract important information from medical research papers, verify scientific claims, and make scientific images accessible to blind and low-vision readers.
UW News sat down with Franklin and Wang to discuss how AI could enhance health care, what’s standing in the way, and whether there’s a downside to democratizing medical research.
Each of you has studied the possibilities and perils of AI in health care, including the experiences of patients who ask chatbots for medical information. In a best-case scenario, how do you envision AI being used in health and medicine?
Gary Franklin: Doctors use Google a lot, but they also rely on services like UpToDate, which provide really great summaries of medical information and research. Most doctors have zero time and just want to be able to read something very quickly that is well documented. So from a physician’s perspective trying to find truthful answers, trying to make my practice more efficient, trying to coordinate things better — if this technology could meaningfully contribute to any of those things, then it would be unbelievably great.
I’m not sure how much doctors will use AI, but for many years, patients have been coming in with questions about what they found on the internet, like on WebMD. AI is just the next step of patients doing this, getting some guidance about what to do with the advice they’re getting. As an example, if a patient sees a surgeon who’s overly aggressive and says they need a big procedure, the patient could ask an AI tool what the broader literature might recommend. And I have concerns about that.
Lucy Lu Wang: I’ll take this question from the clinician’s perspective, and then from the patient’s perspective.
From the clinician’s perspective, I agree with what Gary said. Clinicians want to look up information very quickly because they’re so taxed and there’s limited time to treat patients. And you can imagine if the tools that we have, these chatbots, were actually very good at searching for information and very good at citing accurately, that they could become a better replacement for a type of tool like UpToDate, right? Because UpToDate is good, it’s human-curated, but it doesn’t always contain the most fine-grained information you might be looking for.
These tools could also potentially help clinicians with patient communication, because there’s not always enough time to follow up or explain things in a way that patients can understand. It’s an add-on part of the job for clinicians, and that’s where I think language models and these tools, in an ideal world, could be really beneficial.
Lastly, on the patient’s side, it would be really amazing to develop these tools that help with patient education and help increase the overall health literacy of the population, beyond what WebMD or Google does. These tools could engage patients with their own health and health care more than before.
Zooming out from the individual to the systemic, do you see any ways AI could make health systems as a whole function more smoothly?
GF: One thing I’m curious about is whether these tools can be used to help with coordination across the health care system and between physicians. It’s horrible. There was a book called “Crossing the Quality Chasm” that argued the main problem in American medicine is poor coordination across specialties, or between primary care and anybody else. It’s still horrible, because there’s no function in the medical field that actually does that. So that’s another question: Is there a role here for this kind of technology in coordinating health care?
LLW: There’s been a lot of work on tools that can summarize a patient’s medical history in their clinical notes, and that could be one way to perform this kind of communication between specialties. There’s another component, too: If patients can directly interact with the system, we can construct a better timeline of the patient’s experiences and how that relates to their clinical medical care.
We’ve done qualitative research with health care seekers that suggests there are lots of types of questions that people are less willing to ask their clinical provider, but much more willing to put into one of these models. So the models themselves are potentially addressing unmet needs that patients aren’t willing to directly share with their doctors.
What’s standing in the way of these best-case scenarios?
LLW: I think there are both technical challenges and socio-technical challenges. In terms of technical challenges, a lot of these models’ training doesn’t currently make them effective for tasks like scientific search and summarization.
First, these current chatbots are mostly trained to be general-purpose tools, so they’re meant to be OK at everything, but not great at anything. And I think there will be more targeted development towards these more specific tasks, things like scientific search with citations that Gary mentioned before. The current training methods tend to produce models that are instruction-following, and have a very large positive response bias in their outputs. That can lead to things like generating answers with citations that support the answer, even if those citations don’t exist in the real world. These models are also trained to be overconfident in their responses. If the way the model communicates is positive and overconfident, then it’s going to lead to lots of problems in a domain like health care.
And then, of course, there’s socio-technical problems, like, maybe these models should be developed with the specific goal of supporting scientific search. People are, in fact, working toward these things and have demonstrated good preliminary results.
GF: So are the folks in your field pretty confident that that can be overcome in a fairly short time?
LLW: I think the citation problem has already been overcome in research demonstration cases. If we, for example, hook up an LLM to PubMed search and allow it only to cite conclusions based on articles that are indexed in PubMed, then actually the models are very faithful to citations that are retrieved from that search engine. But if you use Gemini and ChatGPT, those are not always hooked up to those research databases.
GF: The problem is that a person trying to search using those tools doesn’t know that.
LLW: Right, that’s a problem. People tend to trust these things because, as an example, we now have AI-generated answers at the top of Google search, and people have historically trusted Google search to only index documents that people have written, maybe putting the ones that are more trustworthy at the top. But that AI-generated response can be full of misinformation. What’s happening is that some people are losing trust in traditional search as a consequence. It’s going to be hard to build back that trust, even if we improve the technology.
We’re really at the beginning of this technology. It took a long time for us to develop meaningful resources on the internet — things like Wikipedia or PubMed. Right now, these chatbots are general-purpose tools, but there are already starting to be mixtures of models underneath. And in the future, they’re going to get better at routing people’s queries to the correct expert models, whether that’s to the model hooked up to PubMed or to trusted documents published by various associates related to health care. And I think that’s likely where we’re headed in the next couple of years.
Trust and reliability issues aside, are there any potential downsides to deploying these tools widely? I can see a potential problem with people using chatbots to self-diagnose when it might be preferable to see a provider.
LLW: You think of a resource like WebMD: Was that a net positive or net negative? Before its existence, patients really did have a hard time finding any information at all. And of course, there’s limited face time with clinicians where people actually get to ask those questions. So for every patient who wrongly self-diagnoses on WebMD, there are probably also hundreds of patients who found a quick answer to a question. I think that with these models, it’s going to be similar. They’re going to help address some of the gaps in clinical care where we don’t currently have enough resources.
Journal
PLOS Digital Health
Method of Research
Commentary/editorial
Subject of Research
Not applicable
Article Title
Google’s new AI Chatbot produces fake health-related evidence-then self-corrects