Can AI help boost accessibility? These researchers tested it for themselves
Generative artificial intelligence tools like ChatGPT, an AI-powered language tool, and Midjourney, an AI-powered image generator, can potentially assist people with various disabilities. These tools could summarize content, compose messages or describe images. Yet the degree of this potential is an open question, since, in addition to regularly spouting inaccuracies and failing at basic reasoning, these tools can perpetuate ableist biases.
This year, seven researchers at the University of Washington conducted a three-month autoethnographic study — drawing on their own experiences as people with and without disabilities — to test AI tools’ utility for accessibility. Though researchers found cases in which the tools were helpful, they also found significant problems with AI tools in most use cases, whether they were generating images, writing Slack messages, summarizing writing or trying to improve the accessibility of documents.
The team presented its findings Oct. 22 at the ASSETS 2023 conference in New York.
“When technology changes rapidly, there’s always a risk that disabled people get left behind,” said senior author Jennifer Mankoff, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “I'm a really strong believer in the value of first-person accounts to help us understand things. Because our group had a large number of folks who could experience AI as disabled people and see what worked and what didn't, we thought we had a unique opportunity to tell a story and learn about this.”
The group presented its research in seven vignettes, often amalgamating experiences into single accounts to preserve anonymity. For instance, in the first account, “Mia,” who has intermittent brain fog, deployed ChatPDF.com, which summarizes PDFs, to help with work. While the tool was occasionally accurate, it often gave “completely incorrect answers.” In one case, the tool was both inaccurate and ableist, changing a paper’s argument to sound like researchers should talk to caregivers instead of to chronically ill people. “Mia” was able to catch this, since the researcher knew the paper well, but Mankoff said such subtle errors are some of the “most insidious” problems with using AI, since they can easily go unnoticed.
Yet in the same vignette, “Mia” used chatbots to create and format references for a paper they were working on while experiencing brain fog. The AI models still made mistakes, but the technology proved useful in this case.
Mankoff, who’s spoken publicly about having Lyme disease, contributed to this account. “Using AI for this task still required work, but it lessened the cognitive load. By switching from a ‘generation’ task to a ‘verification’ task, I was able to avoid some of the accessibility issues I was facing,” Mankoff said.
The results of the other tests researchers selected were equally mixed:
- One author, who is autistic, found AI helped to write Slack messages at work without spending too much time troubling over the wording. Peers found the messages “robotic,” yet the tool still made the author feel more confident in these interactions.
- Three authors tried using AI tools to increase the accessibility of content such as tables for a research paper or a slideshow for a class. The AI programs were able to state accessibility rules but couldn’t apply them consistently when creating content.
- Image-generating AI tools helped an author with aphantasia (an inability to visualize) interpret imagery from books. Yet when they used the AI tool to create an illustration of “people with a variety of disabilities looking happy but not at a party,” the program could conjure only fraught images of people at a party that included ableist incongruities, such as a disembodied hand resting on a disembodied prosthetic leg.
“I was surprised at just how dramatically the results and outcomes varied, depending on the task,” said lead author Kate Glazko, a UW doctoral student in the Allen School. “In some cases, such as creating a picture of people with disabilities looking happy, even with specific prompting — can you make it this way? — the results didn’t achieve what the authors wanted.”
The researchers note that more work is needed to develop solutions to problems the study revealed. One particularly complex problem involves developing new ways for people with disabilities to validate the products of AI tools, because in many cases when AI is used for accessibility, either the source document or the AI-generated result is inaccessible. This happened in the ableist summary ChatPDF gave “Mia” and when “Jay,” who is legally blind, used an AI tool to generate code for a data visualization. He could not verify the result himself, but a colleague said it “didn’t make any sense at all.” The frequency of AI-caused errors, Mankoff said, “makes research into accessible validation especially important.”
Mankoff also plans to research ways to document the kinds of ableism and inaccessibility present in AI-generated content, as well as investigate problems in other areas, such as AI-written code.
“Whenever software engineering practices change, there is a risk that apps and websites become less accessible if good defaults are not in place,” Glazko said. “For example, if AI-generated code were accessible by default, this could help developers to learn about and improve the accessibility of their apps and websites.”
Co-authors on this paper are Momona Yamagami, who completed this research as a UW postdoctoral scholar in the Allen School and is now at Rice University; Aashaka Desai, Kelly Avery Mack and Venkatesh Potluri, all UW doctoral students in the Allen School; and Xuhai Xu, who completed this work as a UW doctoral student in the Information School and is now at the Massachusetts Institute of Technology. This research was funded by Meta, Center for Research and Education on Accessible Technology and Experiences (CREATE), Google, an NIDILRR ARRT grant and the National Science Foundation.
For more information, contact Glazko at glazko@cs.washington.edu and Mankoff at jmankoff@cs.washington.edu.
ARTICLE TITLE
An Autoethnographic Case Study of Generative Artificial Intelligence's Utility for Accessibility
Oncology researchers raise ethics concerns posed by patient-facing Artificial Intelligence
BOSTON – Ready or not, patients with cancer are increasingly likely to find themselves interacting with artificial intelligence technologies to schedule appointments, monitor their health, learn about their disease and its treatment, find support, and more. In a new paper in JCO Oncology Practice, bioethics researchers at Dana-Farber Cancer Institute call on medical societies, government leaders, clinicians, and researchers to work together to ensure AI-driven healthcare preserves patient autonomy and respects human dignity.
The authors note that while AI has immense potential for expanding access to cancer care and improving the ability to detect, diagnose, and treat cancer, medical professionals and technology developers need to act now to prevent the technology from depersonalizing patient care and eroding relationships between patients and caregivers. While previous papers on AI in medicine have focused on its implications for oncology clinicians and AI researchers, the new paper is one of the first to address concerns about AI embedded in technology used by patients with cancer.
"To date, there has been little formal consideration of the impact of patient interactions with AI programs that haven't been vetted by clinicians or regulatory organizations," says the paper's lead author, Amar Kelkar, MD, a Stem Cell Transplantation Physician at Dana-Farber Cancer Institute. "We wanted to explore the ethical challenges of patient-facing AI in cancer, with a particular concern for its potential implications for human dignity."
As oncology clinicians and researchers have begun to harness AI – to help diagnose cancer and track tumor growth, predict treatment outcomes, or find patterns of occurrence – direct interface between patients and the technology has so far been relatively limited. That is expected to change.
The authors focus on three areas in which patients are likely to engage with AI now or in the future. Telehealth, currently a platform for patient-to-clinician conversations, may use AI to shorten wait times and collect patient data before and after appointments. Remote monitoring of patients' health may be enhanced by AI systems that analyze information reported by patients themselves or collected by wearable devices. Health coaching can employ AI – including natural language models that mimic human interactions – to provide personalized health advice, education, and psychosocial support.
For all its potential in these areas, AI also poses a variety of ethical challenges, many of which have yet to be adequately addressed, the authors write. Telehealth and remote health monitoring, for example, pose inherent risks to confidentiality when patient data are collected by AI. And as autonomous health coaching programs become more human-like, there is a danger that actual humans will have less oversight of them, eliminating the person-to-person contact that has traditionally defined cancer medicine.
The authors cite several principles to guide the development and adoption of AI in patient-facing situations – including human dignity, patient autonomy, equity and justice, regulatory oversight, and collaboration to ensure that AI-driven health care is ethically sound and equitable.
"No matter how sophisticated, AI cannot achieve the empathy, compassion, and cultural comprehension possible with human caregivers," the authors assert. "Overdependence of AI could lead to impersonal care and diminished human touch, potentially eroding patient dignity and therapeutic relationships."
To ensure patient autonomy, patients need to understand the limits of AI-generated recommendations, Kelkar says. "The opacity of some patient-facing AI algorithms can make it impossible to trace the 'thought process' that lead to a treatment recommendation. It needs to be clear whether a recommendation came from the patient's physician or from an algorithmic model raking through a vast amount of data."
Justice and equity require that AI models be trained on data reflecting the racial, ethnic, socioeconomic mix of the population as a whole, as opposed to many current models, which have been trained on historical data that overrepresent majority groups, Kelkar remarks.
"It is important for oncology stakeholders to work together to ensure AI technology promotes patient autonomy and dignity rather than undermining it,” says senior author Gregory Abel, MD, MPH, Director of the Older Adult Hematologic Malignancy Program at Dana-Farber and a member of Dana-Farber’s Population Sciences Division.
The co-authors of the paper are Andrew Hantel, MD, Corey Cutler, MD, and Marilyn Hammer, PhD, DC, RN, FAAN, and Erica Koranteng, MBChB, MBE, all of Dana-Farber.
JOURNAL
JCO Oncology Practice
ARTICLE PUBLICATION DATE
3-Nov-2023
Collective intelligence can help reduce medical misdiagnoses
A fully automated solution significantly increases diagnostic accuracy
Peer-Reviewed PublicationAn estimated 250,000 people die from preventable medical errors in the U.S. each year. Many of these errors originate during the diagnostic process. A powerful way to increase diagnostic accuracy is to combine the diagnoses of multiple diagnosticians into a collective solution. However, there has been a dearth of methods for aggregating independent diagnoses in general medical diagnostics. Researchers from the Max Planck Institute for Human Development, the Institute for Cognitive Sciences and Technologies (ISTC), and the Norwegian University of Science and Technology have therefore introduced a fully automated solution using knowledge engineering methods.
The researchers tested their solution on 1,333 medical cases provided by The Human Diagnosis Project (Human Dx), each of which was independently diagnosed by 10 diagnosticians. The collective solution substantially increased diagnostic accuracy: Single diagnosticians achieved 46% accuracy, whereas pooling the decisions of 10 diagnosticians increased accuracy to 76%. Improvements occurred across medical specialties, chief complaints, and diagnosticians’ tenure levels. “Our results show the life-saving potential of tapping into the collective intelligence,” says first author Ralf Kurvers. He is a senior research scientist at the Center for Adaptive Rationality of the Max Planck Institute for Human Development and his research focuses on social and collective decision making in humans and animals.
Collective intelligence has been proven to boost decision accuracy across many domains, such as geopolitical forecasting, investment, and diagnostics in radiology and dermatology (e.g., Kurvers et al., PNAS, 2016). However, collective intelligence has been mostly applied to relatively simple decision tasks. Applications in more open-ended tasks, such as emergency management or general medical diagnostics, are largely lacking due to the challenge of integrating unstandardized inputs from different people. To overcome this hurdle, the researchers used semantic knowledge graphs, natural language processing, and the SNOMED CT medical ontology, a comprehensive multilingual clinical terminology, for standardization.
“A key contribution of our work is that, while the human-provided diagnoses maintain their primacy, our aggregation and evaluation procedures are fully automated, avoiding possible biases in the generation of the final diagnosis and allowing the process to be more time- and cost-efficient,” adds co-author Vito Trianni from the Institute for Cognitive Sciences and Technologies (ISTC) in Rome.
The researchers are currently collaborating – along with other partners – within the HACID project to bring their application one step closer to the market. The EU-funded project will explore a new approach that brings together human experts and AI-supported knowledge representation and reasoning in order to create new tools for decision making in various domains. The application of the HACID technology to medical diagnostics showcases one of the many opportunities to benefit from a digitally based health system and accessible data.
Original publications:
Kurvers, R. H. J. M., Nuzzolese, A. G., Russo, A., Barabucci, G., Herzog, S. M., & Trianni, V. (2023). Automating hybrid collective intelligence in open-ended medical diagnostics. Proceedings of the National Academy of Sciences of the United States of America, 120(34), Article e2221473120. https://doi.org/10.1073/pnas.2221473120
Kurvers, R. H. J. M., Herzog, S. M., Hertwig, R., Krause, J., Carney, P. A., Bogart, A., Argenziano, G., Zalaudek, I., & Wolf, M. (2016). Boosting medical diagnostics by pooling independent judgments. Proceedings of the National Academy of Sciences of the United States of America, 113(31), 8777–8782. https://doi.org/10.1073/pnas.1601827113
JOURNAL
Proceedings of the National Academy of Sciences
METHOD OF RESEARCH
Computational simulation/modeling
SUBJECT OF RESEARCH
People
ARTICLE TITLE
Automating hybrid collective intelligence in open-ended medical diagnostics
COI STATEMENT
We thank the Human Dx team for providing the data and supporting this research. This work was funded by the Max Planck Institute for Human Development, Nesta, Horizon Europe (HACID–101070588), and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy–EXC 2002/1 “Science of Intelligence”–project number 390523135. Gioele Barabucci reports having received personal fees from the Human Diagnosis Project outside the submitted work and is a spokesperson of the Human Diagnosis Project.