The Conversation
October 22, 2024
SkillUp / Shutterstock
Artificial intelligence (AI) has taken center stage in basic science. The five winners of the 2024 Nobel Prizes in Chemistry and Physics shared a common thread: AI.
Indeed, many scientists – including the Nobel committees – are celebrating AI as a force for transforming science.
As one of the laureates put it, AI’s potential for accelerating scientific discovery makes it “one of the most transformative technologies in human history”. But what will this transformation really mean for science?
AI promises to help scientists do more, faster, with less money. But it brings a host of new concerns, too – and if scientists rush ahead with AI adoption they risk transforming science into something that escapes public understanding and trust, and fails to meet the needs of society.
The illusions of understanding
Experts have already identified at least three illusions that can ensnare researchers using AI.
The first is the “illusion of explanatory depth”. Just because an AI model excels at predicting a phenomenon — like AlphaFold, which won the Nobel Prize in Chemistry for its predictions of protein structures — that doesn’t mean it can accurately explain it. Research in neuroscience has already shown that AI models designed for optimized prediction can lead to misleading conclusions about the underlying neurobiological mechanisms.
Second is the “illusion of exploratory breadth”. Scientists might think they are investigating all testable hypotheses in their exploratory research, when in fact they are only looking at a limited set of hypotheses that can be tested using AI.
Finally, the “illusion of objectivity”. Scientists may believe AI models are free from bias, or that they can account for all possible human biases. In reality, however, all AI models inevitably reflect the biases present in their training data and the intentions of their developers.
Cheaper and faster science
One of the main reasons for AI’s increasing appeal in science is its potential to produce more results, faster, and at a much lower cost.
An extreme example of this push is the “AI Scientist” machine recently developed by Sakana AI Labs. The company’s vision is to develop a “fully AI-driven system for automated scientific discovery”, where each idea can be turned into a full research paper for just US$15 – though critics said the system produced “endless scientific slop”.
Do we really want a future where research papers can be produced with just a few clicks, simply to “accelerate” the production of science? This risks inundating the scientific ecosystem with papers with no meaning and value, further straining an already overburdened peer-review system.
We might find ourselves in a world where science, as we once knew it, is buried under the noise of AI-generated content.
A lack of context
The rise of AI in science comes at a time when public trust in science and scientists is still fairly high , but we can’t take it for granted. Trust is complex and fragile.
As we learned during the COVID pandemic, calls to “trust the science” can fall short because scientific evidence and computational models are often contested, incomplete, or open to various interpretations.
However, the world faces any number of problems, such as climate change, biodiversity loss, and social inequality, that require public policies crafted with expert judgement. This judgement must also be sensitive to specific situations, gathering input from various disciplines and lived experiences that must be interpreted through the lens of local culture and values.
As an International Science Council report published last year argued, science must recognize nuance and context to rebuild public trust. Letting AI shape the future of science may undermine hard-won progress in this area.
If we allow AI to take the lead in scientific inquiry, we risk creating a monoculture of knowledge that prioritises the kinds of questions, methods, perspectives and experts best suited for AI.
This can move us away from the transdisciplinary approach essential for responsible AI, as well as the nuanced public reasoning and dialogue needed to tackle our social and environmental challenges.
A new social contract for science
As the 21st century began, some argued scientists had a renewed social contract in which scientists focus their talents on the most pressing issues of our time in exchange for public funding. The goal is to help society move toward a more sustainable biosphere – one that is ecologically sound, economically viable and socially just.
The rise of AI presents scientists with an opportunity not just to fulfill their responsibilities but to revitalize the contract itself. However, scientific communities will need to address some important questions about the use of AI first.
For example, is using AI in science a kind of “outsourcing” that could compromise the integrity of publicly funded work? How should this be handled?
What about the growing environmental footprint of AI? And how can researchers remain aligned with society’s expectations while integrating AI into the research pipeline?
The idea of transforming science with AI without first establishing this social contract risks putting the cart before the horse.
Letting AI shape our research priorities without input from diverse voices and disciplines can lead to a mismatch with what society actually needs and result in poorly allocated resources.
Science should benefit society as a whole. Scientists need to engage in real conversations about the future of AI within their community of practice and with research stakeholders. These discussions should address the dimensions of this renewed social contract, reflecting shared goals and values.
It’s time to actively explore the various futures that AI for science enables or blocks – and establish the necessary standards and guidelines to harness its potential responsibly.
Ehsan Nabavi, Senior Lecturer in Technology and Society, Responsible Innovation Lab, Australian National University
This article is republished from The Conversation under a Creative Commons license. Read the original article.
Artificial intelligence is creating a new way of thinking, an external thought process outside of our minds
The 'System 0', which in the future will support and enhance our cognitive abilities, is the ongoing revolution described in the journal Nature Human Behaviour by a multidisciplinary group of scientists coordinated by experts from Università Cattolica, c
Universita Cattolica del Sacro Cuore
The interaction between humans and artificial intelligence is shaping a new thinking system, a new cognitive scheme, external to the human mind, but capable of enhancing its cognitive abilities. This is called System 0, which operates alongside the two models of human thought: System 1, characterized by intuitive, fast, and automatic thinking, and System 2, a more analytical and reflective type of thinking. However, System 0 introduces an additional level of complexity, radically altering the cognitive landscape in which we operate, and could thus mark a monumental step forward in the evolution of our ability to think and make decisions. It will be our responsibility to ensure that this progress will be used to improve our cognitive autonomy without compromising it.
This is reported by the prestigious scientific journal Nature Human Behaviour, in an article titled "The case for human-AI interaction as System 0 thinking" – [link](https://www.nature.com/articles/s41562-024-01995-5), written by a team of researchers led by Professor Giuseppe Riva, director of the Humane Technology Lab at Università Cattolica's Milan campus and the Applied Technology for Neuropsychology Lab at Istituto Auxologico Italiano IRCCS, Milan, and by Professor Mario Ubiali (I NEED THE COMPLETE AFFILIATION) from Università Cattolica's Brescia campus. The study was directed with Massimo Chiriatti from the Infrastructure Solutions Group, Lenovo, in Milan, Professor Marianna Ganapini from the Philosophy Department at Union College, Schenectady, New York, and Professor Enrico Panai from the Faculty of Foreign Languages and Language of Science at Università Cattolica's Milan campus.
A new form of external thinking
Just as an external drive allows us to store data that are not present on the computer, we can work by connecting our drive to a PC wherever we are, artificial intelligence, with its galactic processing and data-handling capabilities, can represent an external circuit to the human brain capable of enhancing it. Hence the idea of System 0, which is essentially a form of "external" thinking that relies on the capabilities of AI.
By managing enormous amounts of data, AI can process information and provide suggestions or decisions based on complex algorithms. However, unlike intuitive or analytical thinking, System 0 does not assign intrinsic meaning to the information it processes. In other words, AI can perform calculations, make predictions, and generate responses without truly "understanding" the content of the data it works with.
Humans, therefore, have to interpret on their ones and giving meaning to the results produced by AI. It's like having an assistant that efficiently gathers, filters, and organizes information but still requires our intervention to make informed decisions. This cognitive support provides valuable input, but the final control must always remain in human hands.
The risks of System 0: loss of autonomy and blind trust
“The risk,” professors Riva and Ubiali emphasize, “is relying too much on System 0 without exercising critical thinking. If we passively accept the solutions offered by AI, we might lose our ability to think autonomously and develop innovative ideas. In an increasingly automated world, it is crucial that humans continue to question and challenge the results generated by AI,” the experts stress.
Furthermore, transparency and trust in AI systems represent another major dilemma. How can we be sure that these systems are free from bias or distortion and that they provide accurate and reliable information? “The growing trend of using synthetic or artificially generated data could compromise our perception of reality and negatively influence our decision-making processes,” the professors warn.
AI could even hijack our introspective abilities, they note—i.e., the act of reflecting on one’s thoughts and feelings—a uniquely human process. However, with AI's advancement, it may become possible to rely on intelligent systems to analyze our behaviors and mental states. This raises the question: to what extent can we truly understand ourselves through AI analysis? And can AI replicate the complexity of subjective experience?
Despite these questions, System 0 also offers enormous opportunities, the professors point out. Thanks to its ability to process complex data quickly and efficiently, AI can support humanity in tackling problems that exceed our natural cognitive capacities. Whether solving complex scientific issues, analyzing massive datasets, or managing intricate social systems, AI could become an indispensable ally.
To leverage the potential of System 0, the study's authors suggest it is urgent to develop ethical and responsible guidelines for its use. “Transparency, accountability, and digital literacy are key elements to enable people to critically interact with AI,” they warn. “Educating the public on how to navigate this new cognitive environment will be crucial to avoid the risks of excessive dependence on these systems.”
The future of human thought
They conclude: If left unchecked, System 0 could interfere with human thinking in the future. “It is essential that we remain aware and critical in how we use it; the true potential of System 0 will depend on our ability to guide it in the right direction.”
Journal
Nature Human Behaviour
Method of Research
News article
Article Title
"The case for human-AI interaction as System 0 thinking"
Article Publication Date
22-Oct-2024
People hate stories they think were written by AI. Even if they were written by people
University of Florida
Stories written by the latest version of ChatGPT were nearly as good as those written by human authors, according to new research on the narrative skills of artificial intelligence.
But when people were told a story was written by AI — whether the true author was an algorithm or a person — they rated the story poorly, a sign that people distrust and dislike AI-generated art.
“People don’t like when they think a story is written by AI, whether it was or not,” said Haoran “Chris” Chu, Ph.D., a professor of public relations at the University of Florida and co-author of the new study. “AI is good at writing something that is consistent, logical and coherent. But it is still weaker at writing engaging stories than people are.”
The quality of AI stories could help people like public health workers create compelling narratives to reach people and encourage healthy behaviors, such as vaccination, said Chu, an expert in public health and science communication. Chu and his co-author, Sixiao Liu, Ph.D., of the University of Central Florida, published their findings Sept. 13 in the Journal of Communication.
The researchers exposed people to two different versions of the same stories. One was written by a person and the other by ChatGPT. Survey participants then rated how engaged they were with the stories.
To test how people’s beliefs about AI influenced their ratings, Chu and Liu changed how the stories were labeled. Sometimes the AI story was correctly labeled as written by a computer. Other times people were told it was written by a human. The human-authored stories also had their labels swapped.
The surveys focused on two key elements of narratives: counterarguing — the experience of picking a story apart — and transportation. These two story components work at odds with one another.
“Transportation is a very familiar experience,” Chu said. “It’s the feeling of being so engrossed in the narrative you don’t feel the sticky seats in the movie theater anymore. Because people are so engaged, they often lower their defenses to the persuasive content in the narrative and reduce their counterarguing.”
While people generally rated AI stories as just as persuasive as their human-authored counterparts, the computer-written stories were not as good as transporting people into the world of the narrative.
“AI does not write like a master writer. That’s probably good news for people like Hollywood screenwriters — for now,” Chu said.
Journal
Journal of Communication
Method of Research
Survey
Subject of Research
People
Article Title
Can AI tell good stories? Narrative transportation and persuasion with ChatGPT
Making it easier to verify an AI model’s responses
By allowing users to clearly see data referenced by a large language model, this tool speeds manual validation to help users spot AI errors.
CAMBRIDGE, MA — Despite their impressive capabilities, large language models are far from perfect. These artificial intelligence models sometimes “hallucinate” by generating incorrect or unsupported information in response to a query.
Due to this hallucination problem, an LLM’s responses are often verified by human fact-checkers, especially if a model is deployed in a high-stakes setting like health care or finance. However, validation processes typically require people to read through long documents cited by the model, a task so onerous and error-prone it may prevent some users from deploying generative AI models in the first place.
To help human validators, MIT researchers created a user-friendly system that enables people to verify an LLM’s responses much more quickly. With this tool, called SymGen, an LLM generates responses with citations that point directly to the place in a source document, such as a given cell in a database.
Users hover over highlighted portions of its text response to see data the model used to generate that specific word or phrase. At the same time, the unhighlighted portions show users which phrases need additional attention to check and verify.
“We give people the ability to selectively focus on parts of the text they need to be more worried about. In the end, SymGen can give people higher confidence in a model’s responses because they can easily take a closer look to ensure that the information is verified,” says Shannon Shen, an electrical engineering and computer science graduate student and co-lead author of a paper on SymGen.
Through a user study, Shen and his collaborators found that SymGen sped up verification time by about 20 percent, compared to manual procedures. By making it faster and easier for humans to validate model outputs, SymGen could help people identify errors in LLMs deployed in a variety of real-world situations, from generating clinical notes to summarizing financial market reports.
Shen is joined on the paper by co-lead author and fellow EECS graduate student Lucas Torroba Hennigen; EECS graduate student Aniruddha “Ani” Nrusimha; Bernhard Gapp, president of the Good Data Initiative; and senior authors David Sontag, a professor of EECS, a member of the MIT Jameel Clinic, and the leader of the Clinical Machine Learning Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Yoon Kim, an assistant professor of EECS and a member of CSAIL. The research was recently presented at the Conference on Language Modeling.
Symbolic references
To aid in validation, many LLMs are designed to generate citations, which point to external documents, along with their language-based responses so users can check them. However, these verification systems are usually designed as an afterthought, without considering the effort it takes for people to sift through numerous citations, Shen says.
“Generative AI is intended to reduce the user’s time to complete a task. If you need to spend hours reading through all these documents to verify the model is saying something reasonable, then it’s less helpful to have the generations in practice,” Shen says.
The researchers approached the validation problem from the perspective of the humans who will do the work.
A SymGen user first provides the LLM with data it can reference in its response, such as a table that contains statistics from a basketball game. Then, rather than immediately asking the model to complete a task, like generating a game summary from those data, the researchers perform an intermediate step. They prompt the model to generate its response in a symbolic form.
With this prompt, every time the model wants to cite words in its response, it must write the specific cell from the data table that contains the information it is referencing. For instance, if the model wants to cite the phrase “Portland Trailblazers” in its response, it would replace that text with the cell name in the data table that contains those words.
“Because we have this intermediate step that has the text in a symbolic format, we are able to have really fine-grained references. We can say, for every single span of text in the output, this is exactly where in the data it corresponds to,” Torroba Hennigen says.
SymGen then resolves each reference using a rule-based tool that copies the corresponding text from the data table into the model’s response.
“This way, we know it is a verbatim copy, so we know there will not be any errors in the part of the text that corresponds to the actual data variable,” Shen adds.
Streamlining validation
The model can create symbolic responses because of how it is trained. Large language models are fed reams of data from the internet, and some data are recorded in “placeholder format” where codes replace actual values.
When SymGen prompts the model to generate a symbolic response, it uses a similar structure.
“We design the prompt in a specific way to draw on the LLM’s capabilities,” Shen adds.
During a user study, the majority of participants said SymGen made it easier to verify LLM-generated text. They could validate the model’s responses about 20 percent faster than if they used standard methods.
However, SymGen is limited by the quality of the source data. The LLM could cite an incorrect variable, and a human verifier may be none-the-wiser.
In addition, the user must have source data in a structured format, like a table, to feed into SymGen. Right now, the system only works with tabular data.
Moving forward, the researchers are enhancing SymGen so it can handle arbitrary text and other forms of data. With that capability, it could help validate portions of AI-generated legal document summaries, for instance. They also plan to test SymGen with physicians to study how it could identify errors in AI-generated clinical summaries.
###
This work is funded, in part, by Liberty Mutual and the MIT Quest for Intelligence Initiative.
DOI
Showing AI users diversity in training data boosts perceived fairness and trust
UNIVERSITY PARK, Pa. — While artificial intelligence (AI) systems, such as home assistants, search engines or large language models like ChatGPT, may seem nearly omniscient, their outputs are only as good as the data on which they are trained. However, ease of use often leads users to adopt AI systems without understanding what training data was used or who prepared the data, including potential biases in the data or held by trainers. A new study by Penn State researchers suggests that making this information available could shape appropriate expectations of AI systems and further help users make more informed decisions about whether and how to use these systems.
The work investigated whether displaying racial diversity cues — the visual signals on AI interfaces that communicate the racial composition of the training data and the backgrounds of the typically crowd-sourced workers who labeled it — can enhance users’ expectations of algorithmic fairness and trust. Their findings were recently published in the journal Human-Computer Interaction.
AI training data is often systematically biased in terms of race, gender and other characteristics, according to S. Shyam Sundar, Evan Pugh University Professor and director of the Center for Socially Responsible Artificial Intelligence at Penn State.
“Users may not realize that they could be perpetuating biased human decision-making by using certain AI systems,” he said.
Lead author Cheng "Chris" Chen, assistant professor of communication design at Elon University, who earned her doctorate in mass communications from Penn State, explained that users are often unable to evaluate biases embedded in the AI systems because they don’t have information about the training data or the trainers.
"This bias presents itself after the user has completed their task, meaning the harm has already been inflicted, so users don’t have enough information to decide if they trust the AI before they use it,” Chen said
Sundar said that one solution would be to communicate the nature of the training data, especially its racial composition.
“This is what we did in this experimental study, with the goal of finding out if it would make any difference to their perceptions of the system,” Sundar said.
To understand how diversity cues can impact trust in AI systems, the researchers created two experimental conditions, one diverse and one non-diverse. In the former, participants viewed a short description of the machine learning model and data labeling practice, along with a bar chart showing an equal distribution of facial images in the training data from three racial groups: white, Black and Asian, each making up about one-third of the dataset. In the condition without racial diversity, the bar chart showed that 92% of the images belonged to a single dominant racial group. Similarly, for labelers’ backgrounds, balanced representation was maintained with roughly one-third each of white, Black and Asian labelers. The non-diverse condition showed a bar chart conveying that 92% of labelers were from a single racial group.
Participants first reviewed data cards that showed training data characteristics of an AI-powered facial expression classification AI tool called HireMe. They then watched automated interviews of three equally qualified male candidates of different races. The candidates’ neutral facial expressions and tone were analyzed in real time by the AI system and presented to participants, highlighting the most prominent expression and each candidate’s employability.
Half the participants were exposed to racially biased performance by the system, in that it was manipulated by the experimenters to favor the white candidate, rating his neutral expression as joyful and suitable for the job, while interpreting the Black and Asian candidates’ expressions as anger and fear, respectively. In the unbiased condition, the AI identified joy as each candidate’s prominent expression and equally noting them as good fits for the position. Participants were then asked to provide feedback on the AI’s analysis, rating their agreement on a five-point scale and selecting the most appropriate emotion if they disagreed.
“We found that showing racial diversity in training data and labelers’ backgrounds increased users’ trust in the AI,” Chen said. “The opportunity to provide feedback also helped participants develop a higher sense of agency and increased their potential to use the AI system in the future."
However, the researchers noted that providing feedback about an unbiased system reduced usability for white participants. Because their perception was the the system was already functioning correctly and fairly, they saw little need to provide feedback and viewed it as an unnecessary burden.
The researchers found that, when multiple racial diversity cues were present, they work independently, but both data diversity and labeler diversity cues are effective in shaping users’ perception of the system’s fairness. The researchers emphasized the idea of the representativeness heuristic, meaning users tended to believe that the training of the AI model is racially inclusive if its racial composition matches their understanding of diversity.
“If AI is just learning expressions labeled mostly by people of one race, the system may misrepresent emotions of other races,” said Sundar, who is also the James P. Jimirro Professor of Media Effects at the Penn State Bellisario College of Communications and co-director of the Media Effects Research Laboratory. “The system needs to take race into account when deciding if a face is cheerful or angry, for example, and that comes in the form of greater racial diversity of both images and labelers in the training process.”
According to the researchers, for an AI system to be credible, the origin of its training data must be made available, so users can review and scrutinize it to determine their level of trust.
“Making this information accessible promotes transparency and accountability of AI systems,” Sundar said. “Even if users don’t access this information, its availability signals ethical practice, and fosters fairness and trust in these systems.”
Journal
Human-Computer Interaction
Article Title
Communicating and combating algorithmic bias: effects of data diversity, labeler diversity, performance bias, and user feedback on AI trust
Wearable cameras allow AI to detect medication errors
With high proficiency, a deep-learning model identified contents of vials and syringes, confirming whether medication transfers were correct.
A team of researchers says it has developed the first wearable camera system that, with the help of artificial intelligence, detects potential errors in medication delivery.
In a test whose results were published today, the video system recognized and identified, with high proficiency, which medications were being drawn in busy clinical settings. The AI achieved 99.6% sensitivity and 98.8% specificity at detecting vial-swap errors.
The findings are reported Oct. 22 in npj Digital Medicine.
The system could become a critical safeguard, especially in operating rooms, intensive-care units and emergency-medicine settings, said co-lead author Dr. Kelly Michaelsen, an assistant professor of anesthesiology and pain medicine at the University of Washington School of Medicine.
“The thought of being able to help patients in real time or to prevent a medication error before it happens is very powerful,” she said. “One can hope for a 100% performance but even humans cannot achieve that. In a survey of more than 100 anesthesia providers, the majority desired the system to be more than 95% accurate, which is a goal we achieved.”
Drug administration errors are the most frequently reported critical incidents in anesthesia, and the most common cause of serious medical errors in intensive care. In the bigger picture, an estimated 5% to 10% of all drugs given are associated with errors. Adverse events associated with injectable medications are estimated to affect 1.2 million patients annually at a cost of $5.1 billion.
Syringe and vial-swap errors most often occur during intravenous injections in which a clinician must transfer the medication from vial to syringe to the patient. About 20% of mistakes are substitution errors in which the wrong vial is selected or a syringe is mislabeled. Another 20% of errors occur when the drug is labeled correctly but administered in error.
Safety measures, such as a barcode system that quickly reads and confirms a vial’s contents, are in place to guard against such accidents. But practitioners might sometimes forget this check during high-stress situations because it is an extra step in their workflow.
The researchers’ aim was to build a deep-learning model that, paired with a GoPro camera, is sophisticated enough to recognize the contents of cylindrical vials and syringes, and to appropriately render a warning before the medication enters the patient.
Training the model took months. The investigators collected 4K video of 418 drug draws by 13 anesthesiology providers in operating rooms where setups and lighting varied. The video captured clinicians managing vials and syringes of select medications. These video snippets were later logged and the contents of the syringes and vials denoted to train the model to recognize the contents and containers.
The video system does not directly read the wording on each vial, but scans for other visual cues: vial and syringe size and shape, vial cap color, label print size.
“It was particularly challenging, because the person in the OR is holding a syringe and a vial, and you don’t see either of those objects completely. Some letters (on the syringe and vial) are covered by the hands. And the hands are moving fast. They are doing the job. They aren’t posing for the camera,” said Shyam Gollakota, a coauthor of the paper and professor at the UW's Paul G. Allen School of Computer Science & Engineering.
Further, the computational model had to be trained to focus on medications in the foreground of the frame and to ignore vials and syringes in the background.
“AI is doing all that: detecting the specific syringe that the healthcare provider is picking up, and not detecting a syringe that is lying on the table,” Gollakota said.
This work shows that AI and deep learning have potential to improve safety and efficiency across a number of healthcare practices. Researchers are just beginning to probe the potential, Michaelsen said.
The study also included researchers from Carnegie Mellon University and Makerere University in Uganda. The Toyota Research Institute built and tested the system.
The Washington Research Foundation, Foundation for Anesthesia Education and Research, and a National Institutes of Health grant (K08GM153069) funded the work.
The authors’ declared their potential conflicts of interest in their paper, which will be made available on request.
Access downloadable video files showing how AI recognizes vial-swap errors and appropriate medication transfers in real time.
Journal
npj Digital Medicine
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Detecting clinical medication errors with AI enabled wearable cameras
Article Publication Date
22-Oct-2024
COI Statement
The authors declare the competing interests: S.G. and J.C. are co-founders of Wavely Diagnostics, Inc. The remaining authors declare no competing interests.