Six criteria for the reliability of AI
Language models based on artificial intelligence (AI) can answer any question, but not always correctly. It would be helpful for users to know how reliable an AI system is. A team at Ruhr University Bochum and TU Dortmund University suggests six dimensions that determine the trustworthiness of a system, regardless of whether the system is made up of individuals, institutions, conventional machines, or AI. Dr. Carina Newen and Professor Emmanuel Müller from TU Dortmund University, alongside the philosopher Professor Albert Newen from Ruhr University Bochum, describe the concept in the international philosophical journal Topoi, published online on November 14, 2025.
Six dimensions of reliability
Whether a specific AI system is reliable is not a yes-or-no question. The authors suggest assessing how distinctly six criteria apply to each system in order to create a profile of reliability. These dimensions are:
- Objective functionality: How well does the system perform its core task and is the quality assessed and guaranteed?
- Transparency: How transparent are the system’s processes?
- Uncertainty quantification/Uncertainty of underlying data and models: How reliable are the data and models, and how secure are they against misuse?
- Embodiment: To what extent is the system physical or virtual?
- Immediacy Behaviors: To what extent is the user communicating with the system?
- Commitment: To what extent can the system have an obligation to the user?
“These criteria can illustrate that the reliability of current AI systems, such as ChatGPT or self-driving cars, usually exhibit severe deficits in most dimensions,” says the team from Bochum and Dortmund. “At the same time, it shows where there is need for improvement if AI systems are to achieve a sufficient level of reliability.”
Central dimensions from a technical perspective
From a technical standpoint, the dimensions transparency and uncertainty quantification of underlying data and models are crucial. These concern principal deficits of AI systems. “Deep learning achieves incredible things with large quantities of data. In chess, for example, AI systems are superior to any human,” explains Müller. “But the underlying processes are a blackbox to us, which has resulted in a key lack of trust up to this point.”
The uncertainty of data and models faces a similar situation. “Companies are already using AI systems to pre-sort applications,” says Carina Newen. “The data used to train the AI contain biases that the AI system then perpetuates.”
Central dimensions from a philosophical perspective
Discussing the philosophical perspective, the team uses ChatGPT as an example, which generates an intelligent-sounding answer to each question and prompt, but can still hallucinate: “The AI system invents information without making that clear,” emphasizes Albert Newen. “AI systems can and will be helpful as information systems, but we have to learn to always use them with a critical eye and not trust them blindly.”
However, Albert Newen considers the development of chatbots as a replacement for human communication to be questionable. “Forming interpersonal trust with a chatbot is dangerous, because the system has no obligation to the user who trusts it,” he says. “It doesn’t make sense to expect the chatbot to keep promises.”
Observing the reliability profile with the various dimensions can help understand the extent to which humans can trust AI systems as information experts, say the authors. It also helps to see why critical, routine understanding of these systems will be increasingly required.
Collaboration in the Ruhr Innovation Lab
Ruhr University Bochum and TU Dortmund University, which currently apply together as the Ruhr Innovation Lab in the Excellence Strategy, work closely on issues that help to develop a sustainable and resilient society in the digital age. The current publication stems from a partnership of the Institute of Philosophy II in Bochum and the Research Center Trustworthy Data Science and Security. The Center was founded by the two universities together with the University of Duisburg-Essen within the University Alliance Ruhr. The author Carina Newen was the first doctoral student to receive a doctorate from the Research Center.
Journal
Topoi
Article Title
Trust and Uncertainties: Characterizing Trustworthy AI Systems Within a Multidimensional Theory of Trust
Article Publication Date
24-Nov-2025
Researchers discover a shortcoming that makes LLMs less reliable
Large language models can learn to mistakenly link certain sentence patterns with specific topics — and may then repeat these patterns instead of reasoning.
Large language models (LLMs) sometimes learn the wrong lessons, according to an MIT study.
Rather than answering a query based on domain knowledge, an LLM could respond by leveraging grammatical patterns it learned during training. This can cause a model to fail unexpectedly when deployed on new tasks.
The researchers found that models can mistakenly link certain sentence patterns to specific topics, so an LLM might give a convincing answer by recognizing familiar phrasing instead of understanding the question.
Their experiments showed that even the most powerful LLMs can make this mistake.
This shortcoming could reduce the reliability of LLMs that perform tasks like handling customer inquiries, summarizing clinical notes, and generating financial reports.
It could also have safety risks — a nefarious actor could exploit this to trick LLMs into producing harmful content, even when the models have safeguards to prevent such responses.
After identifying this phenomenon and exploring its implications, the researchers developed a benchmarking procedure to evaluate a model’s reliance on these incorrect correlations. The procedure could help developers mitigate the problem before deploying LLMs.
“This is a byproduct of how we train models, but models are now used in practice in safety-critical domains far beyond the tasks that created these syntactic failure modes. If you’re not familiar with model training as an end-user, this is likely to be unexpected,” says Marzyeh Ghassemi, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems, and the senior author of the study.
Ghassemi is joined on the paper by co-lead authors Chantal Shaib, a graduate student at Northeastern University and visiting student at MIT; and Vinith Suriyakumar, an MIT graduate student; as well as Levent Sagun, a research scientist at Meta; and Byron Wallace, the Sy and Laurie Sternberg Interdisciplinary Associate Professor and associate dean of research at Northeastern University’s Khoury College of Computer Sciences. The paper will be presented at the Conference on Neural Information Processing Systems.
Stuck on syntax
LLMs are trained on a massive amount of text from the internet. During this training process, the model learns to understand the relationships between words and phrases — knowledge it uses later when responding to queries.
In prior work, the researchers found that LLMs pick up patterns in the parts of speech that frequently appear together in training data. They call these part-of-speech patterns “syntactic templates.”
LLMs need this understanding of syntax, along with semantic knowledge, to answer questions in a particular domain.
“In the news domain, for instance, there is a particular style of writing. So, not only is the model learning the semantics, it is also learning the underlying structure of how sentences should be put together to follow a specific style for that domain,” Shaib explains.
But in this research, they determined that LLMs learn to associate these syntactic templates with specific domains. The model may incorrectly rely solely on this learned association when answering questions, rather than on an understanding of the query and subject matter.
For instance, an LLM might learn that a question like “Where is Paris located?” is structured as adverb/verb/proper noun/verb. If there are many examples of sentence construction in the model’s training data, the LLM may associate that syntactic template with questions about countries.
So, if the model is given a new question with the same grammatical structure but nonsense words, like “Quickly sit Paris clouded?” it might answer “France” even though that answer makes no sense.
“This is an overlooked type of association that the model learns in order to answer questions correctly. We should be paying closer attention to not only the semantics but the syntax of the data we use to train our models,” Shaib says.
Missing the meaning
The researchers tested this phenomenon by designing synthetic experiments in which only one syntactic template appeared in the model’s training data for each domain. They tested the models by substituting words with synonyms, antonyms, or random words, but kept the underlying syntax the same.
In each instance, they found that LLMs often still responded with the correct answer, even when the question was complete nonsense.
When they restructured the same question using a new part-of-speech pattern, the LLMs often failed to give the correct response, even though the underlying meaning of the question remained the same.
They used this approach to test pre-trained LLMs like GPT-4 and Llama, and found that this same learned behavior significantly lowered their performance.
Curious about the broader implications of these findings, the researchers studied whether someone could exploit this phenomenon to elicit harmful responses from an LLM that has been deliberately trained to refuse such requests.
They found that, by phrasing the question using a syntactic template the model associates with a “safe” dataset (one that doesn’t contain harmful information), they could trick the model into overriding its refusal policy and generating harmful content.
“From this work, it is clear to me that we need more robust defenses to address security vulnerabilities in LLMs. In this paper, we identified a new vulnerability that arises due to the way LLMs learn. So, we need to figure out new defenses based on how LLMs learn language, rather than just ad hoc solutions to different vulnerabilities,” Suriyakumar says.
While the researchers didn’t explore mitigation strategies in this work, they developed an automatic benchmarking technique one could use to evaluate an LLM’s reliance on this incorrect syntax-domain correlation. This new test could help developers proactively address this shortcoming in their models, reducing safety risks and improving performance.
In the future, the researchers want to study potential mitigation strategies, which could involve augmenting training data to provide a wider variety of syntactic templates. They are also interested in exploring this phenomenon in reasoning models, special types of LLMs designed to tackle multi-step tasks.
“I think this is a really creative angle to study failure modes of LLMs. This work highlights the importance of linguistic knowledge and analysis in LLM safety research, an aspect that hasn’t been at the center stage but clearly should be,” says Jessy Li, an associate professor at the University of Texas at Austin, who was not involved with this work.
This work is funded, in part, by a Bridgewater AIA Labs Fellowship, the National Science Foundation, the Gordon and Betty Moore Foundation, a Google Research Award, and Schmidt Sciences.
###
Written by Adam Zewe, MIT News
Article Title
“Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models”
How personalized algorithms lead to a distorted view of reality
Study: Suggested content can lead to inaccurate
generalizations
Ohio State University
The same personalized algorithms that deliver online content based on your previous choices on social media sites like YouTube also impair learning, a new study suggests.
Researchers found that when an algorithm controlled what information was shown to study participants on a subject they knew nothing about, they tended to narrow their focus and only explore a limited subset of the information that was available to them.
As a result, these participants were often wrong when tested on the information they were supposed to learn – but were still overconfident in their incorrect answers.
The results are concerning, said Giwon Bahg, who led the study as part of his doctoral dissertation in psychology at The Ohio State University.
Many studies on personalized algorithms tend to focus on how they may guide people’s beliefs on political or social issues about which they are somewhat familiar.
“But our study shows that even when you know nothing about a topic, these algorithms can start building biases immediately and can lead to a distorted view of reality,” said Bahg, who is now a postdoctoral scholar at Pennsylvania State University.
The study was published in the Journal of Experimental Psychology: General.
The results suggest that many people may have little problem taking the limited knowledge they get from following personalized algorithms and building sweeping generalizations, said study co-author Brandon Turner, professor of psychology at Ohio State.
“People miss information when they follow an algorithm, but they think what they do know generalizes to other features and other parts of the environment that they’ve never experienced,” Turner said.
In the paper, the researchers gave an example of how algorithmic personalization could lead to inaccurate generalizations during learning: Imagine a person who has never watched movies from a certain country but wants to try them. An on-demand streaming service recommends movies to try.
The person chooses an action-thriller film randomly because it is first on the suggestion list. As a result, the algorithm suggests more movies of the same genre, which the person also watches.
“If this person’s goal, whether explicit or implicit, was in fact to understand the overall landscape of movies in this country, the algorithmic recommendation ends up seriously biasing one’s understanding,” the authors wrote.
This person is likely to miss other great movies in different genres. This person may also draw unfounded and overstretching conclusions about popular culture and society based on only seeing action-thriller and related movies, the authors said.
Bahg and his colleagues tested how this could happen in an online experiment with 346 participants.
In order to test learning, the researchers used a totally fictional setup that participants knew nothing about.
Participants studied categories of crystal-like aliens that had six features. The features varied between different types of aliens. For example, one part of the aliens was a square box could be dark black for some types of aliens and pale gray for others.
The goal was to learn how to correctly identify the aliens in the study, without knowing the total number of alien types.
In the experiment, the features of the aliens were hidden behind gray boxes. In one condition, participants had to sample all the features so they could get a complete picture of which features belong to which aliens.
Others were given choices of which features to click – and then a personalization algorithm chose study items from which they are likely to sample as many features as possible. The algorithm even encouraged them to continue to sample the same feature as the experiment went on. They were also allowed to pass on reviewing other features. But crucially, these participants still had the opportunity to reveal any of the features they wanted.
But the findings showed that when participants were fed features by the personalized algorithm, they sampled fewer features in a consistently selective way. When participants were tested on new information they had not seen before, they often incorrectly categorized the new information based on their limited knowledge. Still, they were sure they were right.
“They were even more confident when they were actually incorrect about their choices than when they were correct, which is concerning because they had less knowledge,” Bahg said.
Turner said this has real-world implications.
“If you have a young kid genuinely trying to learn about the world, and they’re interacting with algorithms online that prioritize getting users to consume more content, what is going to happen?” Turner said.
“Consuming similar content is often not aligned with learning. This can cause problems for users and ultimately for society.”
Vladimir Sloutsky, professor of psychology at Ohio State, was also a co-author.
Journal
Journal of Experimental Psychology General
Method of Research
Experimental study
Subject of Research
People
Article Title
Algorithmic Personalization of Information Can Cause Inaccurate Generalization and Overconfidence
No comments:
Post a Comment