Saturday, March 28, 2026

 

New tool spots and evaluates nutrition misinformation’s potential for harm



A new tool that not only identifies diet and nutrition misinformation online but also evaluates the content’s risk for potential harm has been developed by a team of UCL researchers.





University College London





A new tool that not only identifies diet and nutrition misinformation online but also evaluates the content’s risk for potential harm has been developed by a team of UCL researchers.  

Unlike existing tools, which offer binary judgements of whether content is ‘true’ or ‘false’, this first-of-its-kind tool addresses misinformation that is not overtly false but still has the potential to dangerously mislead, particularly among vulnerable groups.  

The tool’s developers identified that ‘true’ or ‘false’ assessments fail to capture the cumulative and contextual ways in which misleading health information can influence behaviour and decision-making.  

Health misinformation spread online presents a major public health threat, according to the WHO. From restrictive diets and extreme fasting to the unsafe use of dietary supplements (estimated to account for 20% of drug-induced liver injuries in the US alone), misinformation can have disastrous, sometimes fatal, consequences. 

Lead author and developer Alex Ruani (UCL Institute of Education) said: “When it comes to diet and nutrition, misinformation often operates through selective framing that masks potential health risks. Harmful misleading content tends to fly under fact-checkers’ radars and escape meaningful oversight until high-profile cases make the headlines.” 

The tool, called Diet-Nutrition Misinformation Risk Assessment Tool or Diet-MisRAT, is a rule-based content analysis model that adapts the World Health Organisation’s (WHO) approach to assessing hazardous exposures in physical settings to digital information environments. It treats online content as the ‘medium’ and its misleading traits as ‘risk agents’, known to increase recipient susceptibility. It ranks material as green, amber or red according to a weighted misinformation risk score. 

Within this framework, the risk for potential harm depends on the content, context and how likely the recipient is to be misled. By broadening the definition of misinformation beyond the factually false, this tool will help policymakers, digital platforms and regulators implement safeguards, prioritise their responses and take proportional action when faced with harmful misleading content.  

Diet-MisRAT’s results were tested and calibrated through five rounds of verification, including against the judgements of nearly 60 specialists in dietetics, nutrition and public health. The testing showed the tool to deliver highly reliable assessments. The process also identified the core traits of misinformation (inaccuracy; hazardous omissions; manipulative framing) and the indicators that increase the risk potential (method and conditions in which the content is consumed; prominence). 

For example, when assessing content containing claims such as 'it is safer to give your child high-dose vitamin A than the MMR vaccine', the tool classifies this into a critical risk tier as it presents false safety framing, omits risks of excessive vitamin A dosing and undermines public health guidance, increasing the likelihood of harmful real-world decisions.  

Co-author Professor Anastasia Kalea (UCL Division of Medicine) said: “It is essential to include specialist expertise when assessing misinformation risk. Our tool was calibrated and validated with feedback from nearly 60 subject-matter experts. This helps ensure that assessments of potential harm reflect appropriate professional judgement.” 

By isolating misleadingness features and linking them to potential recipient outcomes, the researchers were able to paint a picture of what makes content high-risk and what traits determine the scale of the impact.  

Examples of harm associated with the online spread of health misinformation includes the case in 2025 of cholesterol-induced skin lesions diagnosed in a man who had adopted a carnivore diet, a trend disproportionately amplified by social media algorithms, particularly within ‘manosphere’ communities. 

Another example was the reported case of a person being hospitalised weeks after following incorrect AI-generated advice to replace sodium chloride (salt) with sodium bromide, a substance with no dietary role and which is toxic if regularly ingested over time. Online misinformation has also been linked to decisions to abandon life-saving cancer treatment in favour of unproven dietary alternatives.   

This study contributes to ongoing discussions about how digital platforms, public health authorities and policymakers should respond to the growing influence of misleading health advice online, especially across social media, search summaries and generative AI. 

Ruani said: “In public health we assess exposure to risk factors. We believe misleading health information should be treated in the same way. Some misinformation can lead to serious harm, so mitigation strategies should be proportionate to the level of risk. The more severe the potential harm, the stronger the response should be. 

“When AI chatbots speak confidently, users may assume their advice is safe. If we can properly measure how misleading a piece of advice is and how much harm it may pose, we can build stronger safeguards into models and AI agents before deployment rather than reacting after harm occurs.”  

Co-author Professor Michael Reiss (UCL Institute of Education) said: “By spelling out the typical patterns that distort diet, nutrition or supplement information, the tool’s risk assessment criteria can be taught and applied in education and professional training. This will help learners understand not just whether something is wrong, but how and why it can skew judgement, equipping them to recognise and challenge it.” 

Notes to Editors 

For more information or to speak to the researchers involved, please contact Sophie Hunter, UCL Media Relations. T: +44 7502505610 E: sophie.hunter@ucl.ac.uk    

The Diet-Nutrition Misinformation Risk Assessment Tool (Diet-MisRAT) was developed within a broader framework for misinformation risk assessment that applies established public health risk analysis principles to digital information environments, including online content and the systems that mediate its generation, dissemination and amplification. 

Ruani A, Reiss M, Kalea A. Development and Validation of a Tool for Detecting Misinformation Risk in Diet, Nutrition, and Health Content (Diet-MisRAT)Scientific Reports (Springer Nature) 2026. DOI: https://doi.org/10.1038/s41598-026-40534-2 

 

About University College London (UCL) 

UCL is a global top 10 university, set up in London 200 years ago to offer education for all. Today, we gather 60,000 staff and students, from over 150 countries, to create a unique city within a city – a research and innovation powerhouse that leads the world in subjects spanning the arts, sciences, technology and the humanities. We’ve nurtured 33 Nobel Prize winners, because here, brave ideas have the scale and the support they need to succeed. We are University College London. And here, it can happen.  

UCL turns 200 in 2026. Join us for a year of bicentennial events and celebration.  

www.ucl.ac.uk 


A machine learning model for predicting sepsis-related mortality



Researchers develop an interpretable predictive model for patients in the intensive care unit




Journal of Intensive Medicine

Using machine learning to predict mortality in patients with respiratory failure in the intensive care unit (ICU) 

image: 

Researchers developed and externally validated a machine learning model to predict the 28-day mortality risk in ICU patients with sepsis complicated by acute respiratory failure. Using routinely collected clinical variables from the first 24 hours after ICU admission, the model demonstrated stable predictive performance in large critical care databases, with potential value for early risk stratification and individualized treatment decision-making.

view more 

Credit: Gustavo Basso from Wikimedia Commons Image Source: https://openverse.org/image/1d56faf2-4917-4c13-a1ff-3292459688ee?q=patient+in+ICU&p=19





Sepsis is one of the most common and lethal syndromes encountered in intensive care units (ICUs), and acute respiratory failure (ARF) represents one of its most critical complications. Once respiratory failure develops, patients often experience severe hypoxemia and multiple organ dysfunction within a short period of time, resulting in a markedly increased risk of death. Despite advances in critical care, accurately assessing short-term prognosis early after ICU admission remains a major challenge in clinical practice.

In a recent study, a research comprising Dr. Jian Liu from the Gansu Provincial Maternity and Child Health Hospital (Gansu Provincial Central Hospital), China, Engineer Zi Yang from The First Hospital of Lanzhou University, China, Dr. Hong Guo at the Gansu Provincial Maternity and Child Health Hospital (Gansu Provincial Central Hospital), China, among other researchers developed and validated a machine learning model to predict 28-day mortality in patients with sepsis complicated by ARF. The results of this study were published online in the Journal of Intensive Medicine on January 10, 2026.

Speaking on the study, Dr. Liu says, “The model was designed to leverage clinical information available at the earliest stage of ICU admission, enabling clinicians to identify high-risk patients promptly and thereby optimize treatment strategies and the allocation of monitoring resources.”

The Medical Information Mart for Intensive Care IV (MIMIC-IV, version 3.1) database was used as the development and training cohort, including adult ICU patients who met the diagnostic criteria for both sepsis and ARF. To evaluate the model’s applicability across different hospitals and patient populations, an independent external validation was performed using data from the eICU Collaborative Research Database (eICU-CRD, version 2.0). This combined 'training plus external validation' design enhances the relevance of the findings to real-world clinical settings.

During variable selection, candidate predictors were first identified based on the international sepsis-related guidelines and expert clinical consensus to ensure strong clinical relevance. The Boruta feature selection algorithm, together with multicollinearity analysis, was then applied to identify a final set of 20 key predictive features. All selected variables were routinely obtainable within the first 24 hours of ICU admission and reflected multiple clinical dimensions, including oxygenation status, organ function, metabolic parameters, and disease severity.

Seven machine learning algorithms, including logistic regression, random forests, gradient-boosting, and neural networks, were systematically compared. Among them, the XGBoost model demonstrated the best overall performance. In the training cohort, the model showed strong discrimination for predicting 28-day mortality, and its performance remained stable in the independent external validation cohort, indicating good generalizability. Unlike traditional 'black-box' prediction models, this study placed particular emphasis on interpretability. The researchers applied SHapley Additive exPlanations (SHAP) to quantify the contribution of individual clinical variables to mortality risk prediction.

Our analysis highlighted the importance of oxygenation indices, serum albumin levels, liver function-related indicators, and disease severity scores in short-term prognosis. This transparent interpretability framework may facilitate clinician understanding and promote the use of the model as a decision-support tool rather than a replacement for clinical judgment,” explains Engineer Zi Yang and Dr. Hong Guo.

According to the research team, the model may be further integrated into bedside or web-based risk assessment tools to support early risk stratification in patients with sepsis complicated by ARF. Overall, the study demonstrates the potential of interpretable machine learning approaches in critical care medicine and provides a new technical pathway for individualized management of high-risk patients with sepsis.

 

***

 

Reference
DOI: 10.1016/j.jointm.2025.10.010

 

 

About Dr. Jian Liu
M.D., Chief Physician, Professor, and Doctoral Supervisor

President and Deputy Secretary of the Party Committee, Gansu Provincial Maternity and Child Health Hospital (Gansu Provincial Central Hospital)

Vice President, Critical Care Physicians Branch, Chinese Medical Doctor Association (CMDA)

Committee Member, Critical Care Medicine Branch, Chinese Medical Association (CMA)

Vice Chair, Critical Care Medicine Branch, China Health Information and Big Data Association

Director, Gansu Provincial Quality Control Center for Critical Care Medicine

Chair, Critical Care Medicine Committee, Gansu Medical Association

President, Critical Care Physicians Branch, Gansu Medical Doctor Association

 

Research and Clinical Focus: Primarily engaged in clinical practice and scientific research in the field of emergency and critical care medicine.

 

Grants and Awards: He has served as the Principal Investigator (PI) for one project funded by the National Natural Science Foundation of China and two projects funded by the Gansu Provincial Department of Science and Technology. His honors include the First Prize of the Gansu Medical Science and Technology Award and the Third Prize of the Gansu Provincial Science and Technology Progress Award.

The Dangers of Big Business AI Pornography

Given the speed of AI’s development and its ubiquity, relying on companies to self-regulate is like closing the computer laptop after the deepfakes have been posted.


This photo illustration created on July 18, 2023, in Washington, DC, shows an advertisement to create AI girls reflected in a public service announcement issued by the FBI regarding malicious actors manipulating photos and videos to create explicit content and sextortion schemes.
(Photo by Stefani Reynolds/ AFP via Getty Images)\\

Common Dreams


The explosion of AI into the marketplace has led to fears that workers, including white collar workers, will soon become obsolete; that Big Tech firms will control more and more property including intellectual property; that AI data centers will require so much energy as to overwhelm small communities, raise electricity prices, and accelerate global warming; and that the ongoing gathering of money, power,and software in the hands of tech billionaires will enable them to control political discourse and surveil the masses. Critics rightfully worry about AI upsetting social conventions, invading personal privacy, destroying jobs by making workers redundant, and challenging social mores.

When considered soberly, the risks of AI are the risks that accompany any new technology: reinforced racial bias and discrimination, economic inequality, deskilling of workers, and misinformation and manipulation that reflect existing power structures. Already pervasive society-wide gender and racial biases are reinforced in AI. The demographic of those programming AI systems are overwhelmingly white men, leading to biases in the development of AI tools, cybersecurity systems, policing software, and cameras.

AI Threatens Children

AI has become a powerful force even in the area of pornography, where the dangers that accompany its spread illuminate the risks of the diffusion of AI generally. The shocking impacts include deepfakes (the artificial use of images to embarrass or hurt others) and child abuse. Elon Musk’s “Grok” app is allowing users to undress anyone including minors, while “X” refuses to take action. The American Federation of Teachers left “X” because of its dissemination of “sickening” images of children in various states of nudity.

These worries are playing out against the backdrop of the Epstein sexual predator scandal that also involves modern technology, wealth, and privileged men. It is reflected in the unfettered development of pornographic applications, too many of which thrive on sexual exploitation of women and children. In the US the determination of President Donald Trump to avoid regulations of AI at the urgings of industry thus becomes a greater danger. The spread of risky AI pornography results not from the unfettered prurient interests of purveyors and users, nor from a lack of moral safeguards, but from a failure of governance and unwillingness to stifle profit in the name of free speech.

The exploitation of women’s sexual images without consent, coupled with the lack of robust oversight or age verification for mainstream platforms, perpetuates a cycle of harm.

In order to exert proper controls on the dark, abusive side of AI porn—and AI generally—we must understand what it is, how it developed, and how it might be controlled. Pornographic content has had a major presence in erotic and bawdy books and magazines over the centuries. You might say it became mainstream with Geoffrey Chaucer’s Canterbury Tales (late 14th century), although the modern notion of pornography arose in the mid-19th century. The internet enabled a pornography boom by bringing it to any computer and eventually to any cell phone. If porn was expensive to produce, it generated high income. This stimulated further development of internet platforms where it is both pervasive and free. Rather than selling copies of videos, industry cleverly embraced online platforms to create multiple income streams through blind links, pop-up windows, pay-per-click ads, and by sharing of traffic with other sites.

A Big Dangerous Business

AI and such associated technologies as handheld electronic cameras and web pages have transformed the porn industry from being large and studio centered to being a cottage industry for virtually any tube site, small warehouse, or apartment. But Big Tech dominates. Of over 1 billion websites, of which less than 200 million are active, at least 4% are porn related, and perhaps as many as 12%. By usage, even more of the net is related to pornography, perhaps 30% of the internet’s data usage, with raw bandwidth usage six times larger than for Hulu or Youtube. MindGeek, the owner of several of the most visited sites including Pornhub, RedTube, and YouPorn, is a dominant force. Between 2013 and 2019, the number of visits registered in Pornhub grew threefold from 14.7 to 42 billion, and it is increasingly originating from mobile devices; in January 2024 alone there were 11.4 billion mobile visits worldwide.

The majority of users are male.

All of these visits to porn sites generate huge profits, well over $100 billion worldwide annually. For perspective: these profits are greater than those for Apple, GM, and other major corporations. By the 2020s the top porn producing countries were: the United States, at 24.5%; the United Kingdom, 5.5%; and Germany, Brazil, France, and Russia at between 4% and 5%. The vibrant OnlyFans site, in which performers own their own content, reported $7.22 billion in gross revenue in 2024. During the Covid-19 pandemic, as isolated individuals turned to the web for sexual comfort, OnlyFans gross revenue rose 118%, followed by annual increases of 16% and 19% in 2022 and 2023, respectively.

Technologies of AI Porn

The development of AI-generated pornography moved hand in hand with the rise of generative artificial intelligence. Much of the material is artificial, or at the very least enhanced. Many publicly accessible AI models generate text, audio, and images across the entire human spectrum of activities. They include ChatGPT, Gemini, DeepSeek DALL-E, and Midjourney which have content moderation systems to prevent the creation of sexually explicit material. But a large volume of the output is deepfakes and child pornography, both of which have generated outrage and calls for its control, if not outright illegalization, and its rapid removal from the worldwide web. And moderation works only so far.

As quickly as new AI programs are developed, work-arounds to the restrictions are found. A separate market for so-called unmoderated or uncensored generative AI tools has also emerged which enables production of sexually explicit content through web and app interfaces. As examples: Dreampress.ai and MySpicyVanilla.com prompt erotic stories, while PornPen.ai, Pornderful.ai, Unstability.ai, and other apps enable pornographic images or videos. The exploitation of women’s sexual images without consent, coupled with the lack of robust oversight or age verification for mainstream platforms, perpetuates a cycle of harm.

Deepfakes and Child Abuse

By now websites dedicated to AI-generated adult content have spread into the mainstream where they may promote predation. They are first of all businesses dedicated to generating market interest and making profit, not in self-regulation. Drawing on huge libraries and data sets, they enable users to customize their preferences for body type; facial features; such enhancements as implants, tattoos, and piercings; kinds of encounters and positions; and fetishes. From the privacy of one’s domain, a user can thereby have sexual encounters, thinking he may do so without endangering others or himself.

Ultimately, however, AI pornography distorts human sexuality, because everything is on demand and seemingly risk free. It trains desire without reciprocity. It erodes the human capacity for negotiation, refusal, and mutual recognition. What looks like personalization of preference is actually the substitution of a screen for a living, feeling autonomous partner. Thus, AI porn is less about sex than about power: It teaches users to expect intimacy without vulnerability and especially without responsibility, and it facilitates abuse of women and girls.

Because of the ease of production, the amorality of website owners, and the lack of regulation, there has been limited progress in fighting deepfakes.

This terrible reality plays out with respect to deepfakes. Deepfakes make it possible for people to create naked photos or videos of someone, then to use the artificial pornography to embarrass, blackmail, or otherwise hurt her (him). “Nudify“ sites have proliferated rapidly, allowing millions of people to create nonconsensual images. Apps like DeepSwap and Face Swapping, which enable users to swap out faces in a video with a different face obtained elsewhere, have proliferated since the emergence of generative AI three years ago. Digitally edited pornographic videos featuring the faces of hundreds of non-consenting women get tens of millions of visitors on websites.

Deepfakes are a “new method to deploy gender-based violence and erode women’s autonomy in their on-and-offline world.” In fact, in 2023, 98% of 95,820 deepfakes online were pornographic and 99% of those videos targeted women. To facilitate targeting, AI entrepreneurs created a website, MrDeepFakes, to which altered images have been uploaded for viewing and purchase. Deepfakes may be used as “revenge porn” when a jilted suitor determines to abuse an acquaintance by posting nonconsensual intimate AI images. As Paris Hilton recently testified on Capitol Hill about her experience with a private video gone public: “People called it a scandal. It wasn’t. It was abuse.”

As a result, there has been a sharp increase in crimes targeting children on the internet (online enticement, AI abuse, and trafficking). Reports of generative artificial intelligence (GAI) related to child sexual exploitation have skyrocketed from 6,835 reports to 440,419 in the last year alone. In the past few years in the US, 93.5% of individuals sentenced for sexual abuse were men, 67% of the cases involving child pornography were white men, and 95% were US citizens. In February 2025 Europol busted a criminal gang that was distributing AI-generated images of child sexual abuse online. Abusive behavior extends to secondary schools where students produce deepfake nude photos of their classmates with the help of AI. Boys are much more likely to generate a deep nude photo than girls. But because of the ease of production, the amorality of website owners, and the lack of regulation, there has been limited progress in fighting deepfakes.

Voluntary Regulation Doesn’t Work; It Enables Musk

In response to public outcry over perceived dangers of recombinant DNA research in the 1970s, the Cambridge, Massachusetts City Council voted to restrict work at MIT and Harvard laboratories. The vote, and concerns of molecular biologists themselves, led the burgeoning rDNA industry to adopt safety regulations on its own. In AI, too, the industry is by and large self-regulated to guard against misuse, disarm public interference, and ensure booming business opportunities. However, given the speed of AI’s development and its ubiquity, such a decision to self-regulate is like closing the computer laptop after the deepfakes have been posted.

A number of social media platforms and AI companies voluntarily introduced regulations and standards to limit hate speech, and combat incitement to violence against specific groups, genders, and orientations. More recently, many of these safeguards have been removed in the name of free speech and the right of the public to information. This has resulted in an explosion in hate speech, racism, and deepfakes. For example, after its acquisition by Elon Musk, Twitter took longer to review hateful content and remove it, an unsurprising result given that Musk fired thousands of employees who were responsible for moderation. He also has a misogynist view of women (whom he called “womb-creatures”), and he publicly saluted the Nazis who, he believes, merit a platform. Homophobic, transphobic, and racist hate speech on Twitter increased 50% under his ownership.

Similarly, in keeping with his quasi-libertarian views of free speech, Musk has refused to reign in Grok, his AI tool. Grok has a “Spicy” option that is being used to produce disgusting photographs of women and children in sexually compromising, explicit, and abusive situations. X officially allows pornographic content on its platform, too, but says it will block adult and violent posts from being seen by users who are under 18 or who do not opt in to see it. Shockingly, US Defense Secretary Pete Hegseth plans to integrate Grok into Pentagon networks, including classified systems, as part of a broader initiative to incorporate AI technology across the military. Does Hegseth have in mind the production of military deepfakes?

Self-Regulation Fails the Vulnerable

Having captured Trump’s fumbling mind, the massive AI industry has convinced the president to oppose meaningful local, state, and national laws to avoid “onerous” interference with commerce that may slow innovation. This lack of regulation has spilled over into AI and pornography. The technological billionaires who promote and sell AI applications in pornography may not understand or care about the abuse and suffering of women and children that has resulted from their apps. After all, Elon Musk, Bill Gates, Donald Trump, Howard Lutnik, Sergey Brin, Reid Hoffman, and many more techno billionaires in government and industry have been linked directly to the Epstein scandal. There is no suggestion of any wrongdoing in the heavily redacted files released by the US Department of Justice that these men committed sex crimes. But what do these contacts say about their attitudes toward women and children and what has been the result?

The Internet Watch Foundation (IWF) has found thousands of AI-generated pictures online involving the sexual abuse of children. Such groups as the Sexual Violence Prevention Association have demanded stricter controls on AI image tools, swift takedown mechanisms, and legal action against those generating and circulating abusive content. But the number of realistic images, nearly all of which involve girls, skyrockets annually. Perpetrators easily download open-source AI models to their computers and quickly evade safeguards.

Confronting the purveyors of abusive AI and fighting immoral profit works.

Deepfakes might be addressed through such regulatory initiatives as the California AI Transparency Act, the Take It Down Act, the EU AI Act, and the UK Online Safety Act 2023. In 2024 the Czech Justice Ministry acted to amend a law that would make deepfake porn a criminal offense and make it easier for victims to defend themselves. The European Union has taken steps to address cyberstalking, online harassment, and incitement to hatred and violence. Unfortunately, enforcement remains inconsistent. For example, Scotland’s 2021 hate speech law criminalizes incitement to prejudice hatred, but excludes misogynistic hate.

Confronting the purveyors of abusive AI and fighting immoral profit works. Age and prior consent verification and other checks are always technically feasible to prevent abusive AI porn. Listening to pressure from anti-porn advocacy groups, Visa and Mastercard finally refused to accept payments from Pornhub, the world’s leading porn site, after a New York Times report that documented abuse and rape. This did more to slow Pornhub’s damaging practices than did years of content moderation. Ultimately, however, platforms face little accountability for hosting harmful content or for profiting from it.


Generating Safe AI Smut

CEO of OpenAI, Sam Altman, believes in treating “adult users like adults” with some age-gating, but little control. Many apps and sites hire armies of content moderators to catch illegal and offensive content. But we have seen how Musk’s decision to fire moderators led to an increase in violent hate speech. OpenAI thus is actively recruiting a “head of preparedness”—a well-paid human—to address the “real challenges” of AI models. He had in mind the “potential impact of models on mental health” and other models that can find “critical vulnerabilities” that attackers intend to use for harm. Altman’s announcement followed growing concern over the impact of AI chatbots on mental health, with lawsuits alleging that OpenAI’s ChatGPT “reinforced users’ delusions, increased their social isolation, and led some individuals to suicide.”

Like any other technological advance whose promoters have promised revolutionary changes in society and whose detractors have worried about the potential for moral, cultural, and social collapse, AI, in all of its applications, is a human technology, one that will be embraced and applied in human ways. The internet gives an open microphone to voices of anger and reason, to racism and equality, to raw pornographic images and erotic art with few filters. The Luddites of the early 19th century, the factory workers of the mid-20th century, and the more modern critics of robotics have long worried about their inevitable replacement by machines. Now AI has replaced pornographic models. Surely, the next steps require human analysis and intervention that machines, AI, and its billionaire owners can never provide.

Our work is licensed under Creative Commons (CC BY-NC-ND 3.0). Feel free to republish and share widely.


Paul Josephson
Paul Josephson is professor emeritus of history at Colby College and the author of 15 books, with 40 years of experience working in archives in Russia, Europe, and the U.S. on the political history of modern science.
Full Bio >

Viktoriya Zakrevskaya
Viktoriya Zakrevskaya is founder of Boldozer Consulting, advising on geopolitics, AI governance, and strategic risk in complex markets. She is lecturer at the University of St. Gallen, bridging international law, business strategy, and real-world decision-making.
Full Bio >

 

AI tools risk distorting users’ judgment by agreeing too often with them, researchers say

Excessive flattery from an AI could make a person less likely to apologise or repair after a conflict, a new study shows
Copyright Canva


By Anna Desmarais
Published on 

Even a brief interaction with a flattering chatbot could “skew an individual’s judgment,” making people less likely to apologise or attempt to repair relationships, the study found.

Artificial intelligence (AI) chatbots that offer support for personal issues could be reinforcing harmful beliefs by excessively agreeing with the user, a new study found.

Researchers from the American university Stanford measured sycophancy, the extent to which an AI flatters or validates a user, across 11 leading AI models, including OpenAI’s ChatGPT 4-0, Anthropic’s Claude, Google’s Gemini, Meta Llama-3, Qwen, DeepSeek and Mistral.

To see how these systems handled moral ambiguity, the researchers turned to more than 11,000 posts from r/AmITheAsshole, a Reddit community where people confess conflicts and ask strangers to judge whether they were in the wrong. These posts often involve deception, ethical grey areas, or harmful behaviour.

On average, AI models affirmed the actions of a user 49 percent more often than other humans did, even on cases involving deception, illegal actions or other harms.

In one case, a user admitted having feelings for a junior colleague. Claude responded gently, saying it “can hear [the user’s] pain,” and that they had ultimately chosen an “honourable path.” Human commenters were far harsher, calling the behaviour “toxic” and “bordering on predatory”.

A second experiment saw over 2,400 participants discuss real-life conflicts with AI systems. The results showed that even brief interactions with a flattering chatbot could “skew an individual’s judgment,” making people less likely to apologise or attempt to repair relationships.

“Our results show that across a broad population, advice from sycophantic AI has the real capacity to distort people’s perceptions of themselves and their relationships with others,” the study said.

In severe cases, AI sycophancy could lead to self-destructive behaviours such as delusions, self-harm or suicide for vulnerable people, the study found.

The results show that AI sycophancy is “a societal risk” and needs to be regulated, the researchers said.

One way to do this would be to require pre-deployment behavioural audits, which would evaluate how agreeable an AI model is and how likely it is to reinforce harmful self-views.

The researchers note that their study recruited US-based participants, so it likely reflects dominant American social values and “may not generalise to other cultural contexts,” which might have different norms.

 

Embedding social values into AI decisions





Singapore Management University
SMU Assistant Professor Zhiguang Cao 

image: 

New research and development, co-led by SMU Assistant Professor Zhiguang Cao , will allow continuous, real-time monitoring and correction of an AI system’s behaviour as it reasons and plans.

view more 

Credit: Singapore Management University





By Alistair Jones

SMU Office of Research Governance & Administration – Artificial Intelligence (AI) is touted as the most transformative technology of the 21st century. Investment in the sector is at staggering levels and the race is on as the big digital players compete to come up with the next advance.

Present generative AI is based on large language models (LLMs) that are trained on vast amounts of data to see patterns and make predictions. Chat GPT is a popular example, a friendly chatbot that can explore ideas and solve problems, but with no intention of its own. Now LLMs are moving out of chat boxes into operational control rooms.

"AI systems are increasingly making real decisions such as planning routes, scheduling resources or controlling workflows," says Zhiguang Cao, an Assistant Professor of Computer Science at Singapore Management University (SMU).

"But they optimise for efficiency or performance without understanding social responsibility, risk or trust. Current safety checks often happen only after decisions are made, which might be too late."

Professor Cao is the Principal Investigator of a three-year research project, funded under the AISG Research and Governance Joint Grant Call, to develop VISTA (a Value-Informed Safety and Trust Architecture for Autonomous LLM agents), which will embed psychologically grounded values directly into every step of LLM decision-making and operationalisation.

"VISTA is needed to ensure AI systems can monitor and regulate their behaviour while they are making decisions, not after deployment," Professor Cao says.

"It will introduce continuous, real-time monitoring and correction of an AI agent’s behaviour as it reasons and plans. This shifts AI trustworthiness or safety from a reactive model to a proactive one."

Inside the loop

VISTA will be one of the first systems to embed social values directly into the AI decision process, rather than treating ethics as an external filter. Does that mean we are at the forefront of developing a moral compass for AI?

"VISTA does not impose morality in a philosophical sense, but it will provide a measurable and transparent value signal that guides AI behaviour," Professor Cao says. "In that sense, it will function like a practical 'moral compass' that keeps AI decisions socially aware and accountable during operation."

So, what are the five psychometric value factors that VISTA will embed into an LLM-based agent?

"The five factors are social responsibility, risk-taking, rule-following, self-confidence and rationality. They were chosen because large-scale psychometric studies show these dimensions consistently explain how both humans and AI models behave in complex decision tasks. Together, they capture safety, compliance, and reasoning quality in a balanced way," Professor Cao says.

"Five is a practical and evidence-based starting point, not a hard limit. It provides enough expressiveness to capture meaningful value trade-offs without making the system slow or unstable. The architecture itself can support more dimensions if needed in the future."

And can the values be easily adjusted, or even substituted with other values?

"Yes. VISTA is modular by design. The value definitions, thresholds and even the value factors themselves can be adjusted to suit different domains and regulations, as long as they are well-defined and measurable," Professor Cao says.

"VISTA is designed to plug into existing LLM-based agents, not replace them. Unlike typical add-ons that check outputs after the fact, VISTA will sit inside the reasoning loop, observing partial decisions and intervening early when risks appear. That will make it an architectural upgrade rather than a superficial wrapper."

Real-time monitoring

Given the system is so adjustable, will there be safeguards to prevent VISTA being repurposed with covert value manipulation by malicious agents?

"VISTA will include tamper-resistant logging, traceable interventions and human-override mechanisms," Professor Cao says. "Any value adjustment or corrective action will be recorded and auditable, making covert manipulation difficult to hide. Governance oversight will be built into the system design, not added later."

VISTA also includes something called VISTA-Audit.

"VISTA-Audit is a real-time monitoring service that continuously checks whether an AI agent’s decisions stay within acceptable value boundaries. It will provide early warnings, detailed logs and trigger corrective actions when risks emerge. You may think of it as a live safety dashboard for autonomous AI."

It all sounds quite straightforward but, in fact, embedding social values into LLM frameworks is anything but simple.

"Social values are multi-dimensional, context-dependent and sometimes conflicting," Professor Cao says. "Traditional AI training pipelines are optimised for performance metrics, not nuanced trade-offs like safety versus efficiency. Embedding values requires new representations, new objectives and real-time control mechanisms.

"Most existing approaches compress complex social values into a single reward score or rely on static rules. They are often expensive to retrain, slow to react and blind to value drift during long decision processes."

Running value-aligned control efficiently in real time, with negligible latency, could deliver considerable real-world impact.

"It’s a breakthrough because decisions can be corrected before they cause harm, not after. VISTA will achieve this by using lightweight value encoders and fast auditing components that operate at near token-generation speed," Professor Cao says.

Towards trustworthy AI systems

Decision-making by LLMs can be influenced by inherent biases, which can even include a tendency to avoid action. Can VISTA address this?

"VISTA will explicitly measure behavioural tendencies like risk avoidance or overconfidence instead of letting them remain hidden. By making these tendencies visible and controllable, the system can rebalance behaviour dynamically rather than amplifying bias unintentionally," Professor Cao says.

Professor Cao comes to the project with earlier work that focused on academic research and real-world optimisation systems, particularly concentrating on the solution quality for problems in logistics and decision-making AI. 

"VISTA will be built on that foundation by extending high-performance AI systems towards trustworthy and socially responsible deployment," he says.