Researchers evaluate accuracy of online health news using easily accessible AI
DURHAM, N.H.—It can be challenging to gauge the quality of online news—questioning if it is real or if it is fake. When it comes to health news and press releases about medical treatments and procedures the issue can be even more complex, especially if the story is not complete and still doesn’t necessarily fall into the category of fake news. To help identify the stories with inflated claims, inaccuracies and possible associated risks, researchers at the University of New Hampshire developed a new machine learning model, an application of artificial intelligence, that news services, like social media outlets, could easily use to better screen medical news stories for accuracy.
“The way most people think about fake news is something that's completely fabricated, but, especially in healthcare, it doesn't need to be fake. It could be that maybe they're not mentioning something,” said Ermira Zifla, assistant professors of decision sciences at UNH’s Peter T. Paul College of Business and Economics. “In the study, we’re not making claims about the intent of the news organizations that put these out. But if things are left out, there should be a way to look at that.”
Zifla and study co-author Burcu Eke Rubini, assistant professors of decision sciences, found in their research, published in Decision Support Systems, that since most people don’t have the medical expertise to understand the complexities of the news, the machine learning models they developed outperformed the evaluations of laypeople in assessing the quality of health stories. They used data from Health News Review that included news stories and press releases on new healthcare treatments published in various outlets from 2013 to 2018. The articles had already been evaluated by a panel of healthcare experts—medical doctors, healthcare journalists and clinical professors—using ten different evaluation criteria the experts had developed. The criteria included cost and benefits of the treatment or test, any possible harm, the quality of arguments, the novelty and availability of the procedure and the independence of the sources. The researchers then developed an algorithm based on the same expert criteria, and trained the machine models to classify each aspect of the news story, matching that criteria as "satisfactory" or "not satisfactory".
The model's performance was then compared against layperson evaluations obtained through a separate survey where participants rated the same articles as "satisfactory" or "not satisfactory" based on the same criteria. The survey revealed an "optimism bias," with most of the 254 participants rating articles as satisfactory, markedly different from the model's more critical assessments.
Researchers stress that they are by no means looking to replace expert opinion but are hoping to start a conversation about evaluating news based on multiple criteria and offering an easily accessible and low-cost alternative via open-source software to better evaluate health news.
The University of New Hampshire inspires innovation and transforms lives in our state, nation and world. More than 16,000 students from 49 states and 82 countries engage with an award-winning faculty in top-ranked programs in business, engineering, law, health and human services, liberal arts and the sciences across more than 200 programs of study. A Carnegie Classification R1 institution, UNH partners with NASA, NOAA, NSF, and NIH, and received over $210 million in competitive external funding in FY23 to further explore and define the frontiers of land, sea and space.
###
JOURNAL
Decision Support Systems
ARTICLE TITLE
Multi-criteria evaluation of health news stories
AI can speed design of health software
Artificial intelligence helped clinicians to accelerate the design of diabetes prevention software, a new study finds.
Publishing online March 6 in the Journal of Medical Internet Research, the study examined the capabilities of a form of artificial intelligence (AI) called generative AI or GenAI, which predicts likely options for the next word in any sentence based on how billions of people used words in context on the internet. A side effect of this next-word prediction is that the generative AI “chatbots” like chatGPT can generate replies to questions in realistic language, and produce clear summaries of complex texts.
Led by researchers at NYU Langone Health, the current paper explores the application of ChatGPT to the design of a software program that uses text messages to counter diabetes by encouraging patients to eat healthier and get exercise. The team tested whether AI-enabled interchanges between doctors and software engineers could hasten the development of such a personalized automatic messaging system (PAMS).
In the current study, eleven evaluators in fields ranging from medicine to computer science successfully used ChatGPT to produce a version of the diabetes tool over 40 hours, where an original, non-AI-enabled effort had required more than 200 programmer hours.
“We found that ChatGPT improves communications between technical and non-technical team members to hasten the design of computational solutions to medical problems,” says study corresponding author Danissa Rodriguez, PhD, assistant professor in the Department of Population Health at NYU Langone, and member of its Healthcare Innovation Bridging Research, Informatics and Design (HiBRID) Lab. “The chatbot drove rapid progress throughout the software development life cycle, from capturing original ideas, to deciding which features to include, to generating the computer code. If this proves to be effective at scale it could revolutionize healthcare software design.”
AI as Translator
Generative AI tools are sensitive, say the study authors, and asking a question of the tool in two subtly different ways may yield divergent answers. The skill required to frame the questions asked of chatbots in a way that elicits the desired response, called prompt engineering, combines intuition and experimentation. Physicians and nurses, with their understanding of nuanced medical contexts, are well positioned to engineer strategic prompts that improve communications with engineers, and without learning to write computer code.
These design efforts, however, where care providers, the would-be users of a new software, seek to advise engineers about what it must include can be compromised by attempts to converse using “different” technical languages. In the current study, the clinical members of the team were able to type their ideas in plain English, enter them into chatGPT, and ask the tool to convert their input into the kind of language required to guide coding work by the team’s software engineers. AI could take software design only so far before human software developers were needed for final code generation, but the overall process was greatly accelerated, say the authors.
“Our study found that chatGPT can democratize the design of healthcare software by enabling doctors and nurses to drive its creation,” says senior study author Devin Mann, MD, director of the HiBRID Lab, and strategic director of Digital Health Innovation within NYU Langone Medical Center Information Technology (MCIT). “GenAI-assisted development promises to deliver computational tools that are usable, reliable, and in-line with the highest coding standards.”
Along with Rodriguez and Mann, study authors from the Department of Population Health at NYU Langone were Katharine Lawrence, MD, Beatrix Brandfield-Harvey, Lynn Xu, Sumaiya Tasneem, and Defne Levine. Javier Gonzalez, technical lead in the HIBRID Lab, was also a study author. This work was supported by the National Institute of Diabetes and Digestive and Kidney Diseases grant 1R18DK118545-01A1.
JOURNAL
Journal of Medical Internet Research
METHOD OF RESEARCH
Case study
SUBJECT OF RESEARCH
Not applicable
ARTICLE TITLE
Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study
ARTICLE PUBLICATION DATE
6-Mar-2024
No comments:
Post a Comment