Study finds large language models (LLMs) use stigmatizing language about individuals with alcohol and substance use disorders
image:
Recommended Non-Stigmatizing Language for Alcohol and Substance Use Communications
view moreCredit: Mass General Brigham
As artificial intelligence is rapidly developing and becoming a growing presence in healthcare communication, a new study addresses a concern that large language models (LLMs) can reinforce harmful stereotypes by using stigmatizing language. The study from researchers at Mass General Brigham found that more than 35% of responses in answers related to alcohol- and substance use-related conditions contained stigmatizing language. But the researchers also highlight that targeted prompts can be used to substantially reduce stigmatizing language in the LLMs’ answers. Results are published in The Journal of Addiction Medicine.
“Using patient-centered language can build trust and improve patient engagement and outcomes. It tells patients we care about them and want to help,” said corresponding author Wei Zhang, MD, PhD, an assistant professor of Medicine in the Division of Gastroenterology at Mass General Hospital, a founding member of the Mass General Brigham healthcare system. “Stigmatizing language, even through LLMs, may make patients feel judged and could cause a loss of trust in clinicians.”
LLM responses are generated from everyday language, which often includes biased or harmful language towards patients. Prompt engineering is a process of strategically crafting input instructions to guide model outputs towards non-stigmatizing language and can be used to train LLMs to employ more inclusive language for patients. This study showed that employing prompt engineering within LLMs reduced the likelihood of stigmatizing language by 88%.
For their study, the authors tested 14 LLMs on 60 generated clinically relevant prompts related to alcohol use disorder (AUD), alcohol-associated liver disease (ALD), and substance use disorder (SUD). Mass General Brigham physicians then assessed the responses for stigmatizing language using guidelines from the National Institute on Drug Abuse and the National Institute on Alcohol Abuse and Alcoholism (both organizations’ official names still contain outdated and stigmatizing terminology).
Their results indicated that 35.4% of responses from LLMs without prompt engineering contained stigmatizing language, in comparison to 6.3% of LLMs with prompt engineering. Additionally, results indicated that longer responses are associated with a higher likelihood of stigmatizing language in comparison to shorter responses. The effect was seen across all 14 models tested, although some models were more likely than others to use stigmatizing terms.
Future directions include developing chatbots that avoid stigmatizing language to improve patient engagement and outcomes. The authors advise clinicians to proofread LLM-generated content to avoid stigmatizing language before using it in patient interactions and to offer alternative, patient-centered language options. The authors note that future research should involve patients and family members with lived experience to refine definitions and lexicons of stigmatizing language, ensuring LLM outputs align with the needs of those most affected. This study reinforces the need to prioritize language in patient care as LLMs become increasingly used in healthcare communication.
Authorship: In addition to Zhang, Mass General Brigham authors include Yichen Wang, Kelly Hsu, Christopher Brokus, Yuting Huang, Nneka Ufere, Sarah Wakeman, and James Zou.
Disclosures: None.
Funding: This study was funded by grants from the May Center Clinic for Digital Health in partnership with the Mayo Clinic Office of Equity, Inclusion, and Diversity and Dalio Philanthropies.
Paper cited: Wang, Y. et. al. “Stigmatizing Language in Large Language Models for Alcohol and Substance Use
Disorders: A Multi-Model Evaluation and Prompt Engineering Approach” DOI: 10.1097/ADM.0000000000001536
###
About Mass General Brigham
Mass General Brigham is an integrated academic health care system, uniting great minds to solve the hardest problems in medicine for our communities and the world. Mass General Brigham connects a full continuum of care across a system of academic medical centers, community and specialty hospitals, a health insurance plan, physician networks, community health centers, home care, and long-term care services. Mass General Brigham is a nonprofit organization committed to patient care, research, teaching, and service to the community. In addition, Mass General Brigham is one of the nation’s leading biomedical research organizations with several Harvard Medical School teaching hospitals. For more information, please visit massgeneralbrigham.org.
Journal
Journal of Addiction Medicine
Method of Research
Randomized controlled/clinical trial
Subject of Research
People
Article Title
Study Finds Large Language Models (LLMs) Use Stigmatizing Language About Individuals with Alcohol and Substance Use Disorders
Article Publication Date
24-Jul-2025
National study finds healthcare provider stigma toward substance use disorder varies sharply by condition and provider
Emergency medicine physicians show highest stigma—but also play crucial role in linking patients to treatment
Columbia University's Mailman School of Public Health
A new national study from Columbia University Mailman School of Public Health, with colleagues at the University of Miami Miller School of Medicine, University of Chicago, National Opinion Research Center, and Emory University finds that stigma toward patients with substance use disorders (SUD) remains widespread among U.S. healthcare providers—and varies significantly across types of substances. The findings are published in the journal Addiction.
The study is the first national analysis to compare provider stigma across opioid (OUD), stimulant, and alcohol use disorders (AUD) with other chronic but often-stigmatized conditions like depression, HIV, and Type II diabetes. Researchers also assessed how stigma influences whether providers screen for SUD, offer referrals, or deliver treatment.
“While we've made progress in expanding access to evidence-based SUD treatment, stigma remains a profound barrier—often embedded in the clinical encounter itself,” said Carrigan Parish, DMD, PhD, assistant professor in the Department of Sociomedical Sciences at Columbia Mailman School of Public Health. “Our findings show that many providers still feel uncomfortable treating patients with substance use disorders and that hesitancy leads directly to missed opportunities for care. In particular, emergency departments often serve as the first—and sometimes only—point of care for people with substance use disorders. We need to leverage those moments, not miss them.”
The study, conducted from October 2020 to October 2022, surveyed 1,081 primary care providers (PCPs), 600 emergency medicine providers (EMPs), and 627 dentists using a nationally representative random sample licensed from the American Medical and Dental Associations. Participants rated their agreement with 11 standardized stigma statements and reported their screening, referral, and treatment practices for six conditions: three SUDs (opioids, stimulants, alcohol) and three comparison medical conditions (Type II diabetes, depression, HIV).
Key findings:
- Stigma score toward stimulant use disorders was highest (36.3 points, followed by OUD (35.6 points) and AUD (32 points).
- For comparison, stigma scores were far lower for depression (26.2 points, HIV (25.8 points), and Type II diabetes (23.2 points), where providers also reported higher levels of compassion and treatment.
- More than 30 percent of providers said they prefer not to work with patients with OUD or stimulant use disorders—compared to just 2 percent for diabetes, and 9 percent for both HIV and depression.
- Emergency medicine physicians (EMPs) expressed the highest levels of stigma toward SUD, yet were also the most active in providing clinical care:
- 28.4 percent reported providing drug use treatment
- 27.2 percent prescribed medications for opioid use disorder (MOUD) compared to just 12 percent and 10 percent of primary care physicians (PCPs) for drug use treatment and prescribing medications, respectively.
- Dentists reported the lowest stigma levels toward all queried conditions—which may be due to greater clinical and moral distance from SUD treatment and viewing SUD-related practices as outside their scope of practice
- Stigma scores did not significantly differ by provider race, age, gender, region, or rurality, indicating that these attitudes span the healthcare workforce
“Overall, providers were less likely to feel they could effectively help patients with stimulant or opioid use disorders. In fact, 22 percent of providers said, ‘there is little I can do to help patients like this’—a response we almost never saw for other conditions,” said Daniel Feaster, PhD and professor of Biostatistics and one of the lead investigators at University of Miami.
“This isn’t just a matter of attitude—it’s about access. If a provider doubts treatment efficacy or holds stigmatizing beliefs, they’re less likely to screen or refer a patient. That becomes a system failure.”
The study also highlighted key institutional barriers that may reinforce stigma, including:
- Time constraints
- Lack of training
- Limited referral resources
- Discomfort discussing SUD with patients
- Legal concerns
- Minimal privacy in clinical settings
Senior author Lisa R. Metsch, professor of Sociomedical Sciences at Columbia Mailman School and Dean of the School of General Studies at Columbia University added, “We heard over and over that providers feel unequipped or unsupported to treat SUD—despite being on the frontlines. That’s especially true in primary care settings, where time pressures and limited resources are a daily challenge.” Metsch also added, “Notably, the majority of health providers agreed that insurance plans should cover patients with SUD at the same degree as they cover patients with other health conditions.”
Dentists, although typically less involved in treating SUD, are well-positioned to recognize oral signs of substance use and refer patients to appropriate care—but they, too, face gaps in training and systemic support.
“Going forward, we should strive to be more cognizant of the many treatment and provider roles we have distinguished in this study. By unpacking all the variations, we can start to build smarter interventions—tailored by specialty, setting, and substance,” said Parish.
Other co-authors are Viviana E. Horigian, University of Miami Miller School of Medicine; Harold A. Pollack, University of Chicago School of Social Work; Xiaoming Wang and Petra Jacobs, National Institute of Drug Abuse; Christina Drymon and Elizabeth Allen, National Opinion Research Center; Carlos del Rio, Emory University School of Medicine; and Margaret R. Pereyra and Lauren Gooden, Columbia Mailman School.
The study was supported by the National Institute on Drug Abuse Treatment Clinical Trials Network, grant 5UG1DA013720-23.
Columbia University Mailman School of Public Health
Founded in 1922, the Columbia University Mailman School of Public Health pursues an agenda of research, education, and service to address the critical and complex public health issues affecting New Yorkers, the nation and the world. The Columbia Mailman School is the third largest recipient of NIH grants among schools of public health. Its nearly 300 multi-disciplinary faculty members work in more than 100 countries around the world, addressing such issues as preventing infectious and chronic diseases, environmental health, maternal and child health, health policy, climate change and health, and public health preparedness. It is a leader in public health education with more than 1,300 graduate students from 55 nations pursuing a variety of master’s and doctoral degree programs. The Columbia Mailman School is also home to numerous world-renowned research centers, including ICAP and the Center for Infection and Immunity. For more information, please visit www.mailman.columbia.edu.
Journal
Addiction
Article Title
Healthcare provider stigma toward patients with substance use disorders
International study reveals sex and age biases in AI models for skin disease diagnosis
Health Data Science
image:
Scientists from ShanghaiTech University compared the performance of large language models (LLMs), like ChatGPT-4 and LLaVA, in diagnosing skin diseases among male and female patients across different age groups. The findings point to potential biases across age and sex groups, that must be addressed before clinical deployment.
view moreCredit: Zhiyu Wan, Health Information Safety and Intelligence Research Lab, ShanghaiTech University (generated with the help of ChatGPT-4o)
An international research team led by Assistant Professor Zhiyu Wan from ShanghaiTech University has recently published groundbreaking findings in the journal Health Data Science, highlighting biases in multimodal large language models (LLMs) such as ChatGPT-4 and LLaVA in diagnosing skin diseases from medical images. The study systematically evaluated these AI models across different sex and age groups.
Utilizing approximately 10,000 dermatoscopic images, the study focused on three common skin diseases: melanoma, melanocytic nevi, and benign keratosis-like lesions. Results revealed that while ChatGPT-4 and LLaVA outperformed most traditional deep learning models overall, ChatGPT-4 showed greater fairness across demographic groups, whereas LLaVA exhibited significant sex-related biases.
Dr. Wan emphasized, “While large language models like ChatGPT-4 and LLaVA demonstrate clear potential in dermatology, we must address the observed biases, particularly across sex and age groups, to ensure these technologies are safe and effective for all patients.”
The team plans further research incorporating additional demographic variables like skin tone to comprehensively evaluate the fairness and reliability of AI models in clinical scenarios. This research provides critical guidance for developing more equitable and trustworthy medical AI systems.
Journal
Health Data Science
Article Title
Evaluating Sex and Age Biases in Multimodal Large Language Models for Skin Disease Identification from Dermatoscopic Images
No comments:
Post a Comment