Tuesday, July 22, 2025

AI chatbots remain overconfident -- even when they’re wrong

Large Language Models appear to be unaware of their own mistakes, prompting concerns about common uses for AI chatbots.

Carnegie Mellon University

Artificial intelligence chatbots are everywhere these days, from smartphone apps and customer service portals to online search engines. But what happens when these handy tools overestimate their own abilities?

Researchers asked both human participants and four large language models (LLMs) how confident they felt in their ability to answer trivia questions, predict the outcomes of NFL games or Academy Award ceremonies, or play a Pictionary-like image identification game. Both the people and the LLMs tended to be overconfident about how they would hypothetically perform. Interestingly, they also answered questions or identified images with relatively similar success rates.

However, when the participants and LLMs were asked retroactively how well they thought they did, only the humans appeared able to adjust expectations, according to a study published today in the journal Memory & Cognition.

“Say the people told us they were going to get 18 questions right, and they ended up getting 15 questions right. Typically, their estimate afterwards would be something like 16 correct answers,” said Trent Cash, who recently completed a joint Ph.D. at Carnegie Mellon University in the departments of Social Decision Science and Psychology. “So, they’d still be a little bit overconfident, but not as overconfident.”

“The LLMs did not do that,” said Cash, who was lead author of the study. “They tended, if anything, to get more overconfident, even when they didn’t do so well on the task.”

The world of AI is changing rapidly each day, which makes drawing general conclusions about its applications challenging, Cash acknowledged. However, one strength of the study was that the data was collected over the course of two years, which meant using continuously updated versions of the LLMs known as ChatGPT, Bard/Gemini, Sonnet and Haiku. This means that AI overconfidence was detectable across different models over time.

“When an AI says something that seems a bit fishy, users may not be as skeptical as they should be because the AI asserts the answer with confidence, even when that confidence is unwarranted,” said Danny Oppenheimer, a professor in CMU’s Department of Social and Decision Sciences and coauthor of the study.

“Humans have evolved over time and practiced since birth to interpret the confidence cues given off by other humans. If my brow furrows or I’m slow to answer, you might realize I’m not necessarily sure about what I’m saying, but with AI, we don’t have as many cues about whether it knows what it’s talking about,” said Oppenheimer.

Asking AI The Right Questions

While the accuracy of LLMs at answering trivia questions and predicting football game outcomes is relatively low stakes, the research hints at the pitfalls associated with integrating these technologies into daily life.

For instance, a recent study conducted by the BBC found that when LLMs were asked questions about the news, more than half of the responses had “significant issues,” including factual errors, misattribution of sources and missing or misleading context. Similarly, another study from 2023 found LLMs “hallucinated,” or produced incorrect information, in 69 to 88 percent of legal queries.

Clearly, the question of whether AI knows what it’s talking about has never been more important. And the truth is that LLMs are not designed to answer everything users are throwing at them on a daily basis.

“If I'd asked ‘What is the population of London,’ the AI would have searched the web, given a perfect answer and given a perfect confidence calibration,” said Oppenheimer.

However, by asking questions about future events – such as the winners of the upcoming Academy Awards – or more subjective topics, such as the intended identity of a hand-drawn image, the researchers were able to expose the chatbots’ apparent weakness in metacognition – that is, the ability to be aware of one’s own thought processes.

“We still don’t know exactly how AI estimates its confidence,” said Oppenheimer, “but it appears not to engage in introspection, at least not skillfully.”

The study also revealed that each LLM has strengths and weaknesses. Overall, the LLM known as Sonnet tended to be less overconfident than its peers. Likewise, ChatGPT-4 performed similarly to human participants in the Pictionary-like trial, accurately identifying 12.5 hand-drawn images out of 20, while Gemini could identify just 0.93 sketches, on average.

In addition, Gemini predicted it would get an average of 10.03 sketches correct, and even after answering fewer than one out of 20 questions correctly, the LLM retrospectively estimated that it had answered 14.40 correctly, demonstrating its lack of self-awareness

“Gemini was just straight up really bad at playing Pictionary,” said Cash. “But worse yet, it didn’t know that it was bad at Pictionary. It’s kind of like that friend who swears they’re great at pool but never makes a shot.”

Building Trust with Artificial Intelligence

For everyday chatbot users, Cash said the biggest takeaway is to remember that LLMs are not inherently correct and that it might be a good idea to ask them how confident they are when answering important questions. Of course, the study suggests LLMs might not always be able to accurately judge confidence, but in the event that the chatbot does acknowledge low confidence, it's a good sign that its answer cannot be trusted.

The researchers note that it’s also possible that the chatbots could develop a better understanding of their own abilities over vastly larger data sets.

“Maybe if it had thousands or millions of trials, it would do better,” said Oppenheimer.

Ultimately, exposing the weaknesses such as overconfidence will only help those in the industry that are developing and improving LLMs. And as AI becomes more advanced, it may develop the metacognition required to learn from its mistakes.

"If LLMs can recursively determine that they were wrong, then that fixes a lot of the problem," said Cash.

“I do think it’s interesting that LLMs often fail to learn from their own behavior,” said Cash. “And maybe there’s a humanist story to be told there. Maybe there’s just something special about the way that humans learn and communicate.”

Journal

Memory & Cognition

DOI

10.3758/s13421-025-01755-4

Subject of Research

Not applicable

Article Title

Quantifying Uncert-AI-nty: Testing the Accuracy of LLMs’ Confidence Judgments

Article Publication Date

22-Jul-2025

Like humans, AI can jump to conclusions, Mount Sinai study finds

The Mount Sinai Hospital / Mount Sinai School of Medicine

New York, NY [July 22, 2025]—A study by investigators at the Icahn School of Medicine at Mount Sinai, in collaboration with colleagues from Rabin Medical Center in Israel and other collaborators, suggests that even the most advanced artificial intelligence (AI) models can make surprisingly simple mistakes when faced with complex medical ethics scenarios.

The findings, which raise important questions about how and when to rely on large language models (LLMs), such as ChatGPT, in health care settings, were reported in the July 22 online issue of NPJ Digital Medicine [10.1038/s41746-025-01792-y].

The research team was inspired by Daniel Kahneman’s book “Thinking, Fast and Slow,” which contrasts fast, intuitive reactions with slower, analytical reasoning. It has been observed that large language models (LLMs) falter when classic lateral-thinking puzzles receive subtle tweaks. Building on this insight, the study tested how well AI systems shift between these two modes when confronted with well-known ethical dilemmas that had been deliberately tweaked.

“AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details,” says co-senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “In everyday situations, that kind of thinking might go unnoticed. But in health care, where decisions often carry serious ethical and clinical implications, missing those nuances can have real consequences for patients.”

To explore this tendency, the research team tested several commercially available LLMs using a combination of creative lateral thinking puzzles and slightly modified well-known medical ethics cases. In one example, they adapted the classic “Surgeon’s Dilemma,” a widely cited 1970s puzzle that highlights implicit gender bias. In the original version, a boy is injured in a car accident with his father and rushed to the hospital, where the surgeon exclaims, “I can’t operate on this boy—he’s my son!” The twist is that the surgeon is his mother, though many people don’t consider that possibility due to gender bias. In the researchers’ modified version, they explicitly stated that the boy’s father was the surgeon, removing the ambiguity. Even so, some AI models still responded that the surgeon must be the boy’s mother. The error reveals how LLMs can cling to familiar patterns, even when contradicted by new information.

In another example to test whether LLMs rely on familiar patterns, the researchers drew from a classic ethical dilemma in which religious parents refuse a life-saving blood transfusion for their child. Even when the researchers altered the scenario to state that the parents had already consented, many models still recommended overriding a refusal that no longer existed.

“Our findings don’t suggest that AI has no place in medical practice, but they do highlight the need for thoughtful human oversight, especially in situations that require ethical sensitivity, nuanced judgment, or emotional intelligence,” says co-senior corresponding author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer of the Mount Sinai Health System. “Naturally, these tools can be incredibly helpful, but they’re not infallible. Physicians and patients alike should understand that AI is best used as a complement to enhance clinical expertise, not a substitute for it, particularly when navigating complex or high-stakes decisions. Ultimately, the goal is to build more reliable and ethically sound ways to integrate AI into patient care.”

“Simple tweaks to familiar cases exposed blind spots that clinicians can’t afford,” says lead author Shelly Soffer, MD, a Fellow at the Institute of Hematology, Davidoff Cancer Center, Rabin Medical Center. “It underscores why human oversight must stay central when we deploy AI in patient care.”

Next, the research team plans to expand their work by testing a wider range of clinical examples. They’re also developing an “AI assurance lab” to systematically evaluate how well different models handle real-world medical complexity.

The paper is titled “Pitfalls of Large Language Models in Medical Ethics Reasoning.”

The study’s authors, as listed in the journal, are Shelly Soffer, MD; Vera Sorin, MD; Girish N. Nadkarni, MD, MPH; and Eyal Klang, MD.

-####-

About Mount Sinai's Windreich Department of AI and Human Health 

Led by Girish N. Nadkarni, MD, MPH—an international authority on the safe, effective, and ethical use of AI in health care—Mount Sinai’s Windreich Department of AI and Human Health is the first of its kind at a U.S. medical school, pioneering transformative advancements at the intersection of artificial intelligence and human health.

The Department is committed to leveraging AI in a responsible, effective, ethical, and safe manner to transform research, clinical care, education, and operations. By bringing together world-class AI expertise, cutting-edge infrastructure, and unparalleled computational power, the department is advancing breakthroughs in multi-scale, multimodal data integration while streamlining pathways for rapid testing and translation into practice.

The Department benefits from dynamic collaborations across Mount Sinai, including with the Hasso Plattner Institute for Digital Health at Mount Sinai—a partnership between the Hasso Plattner Institute for Digital Engineering in Potsdam, Germany, and the Mount Sinai Health System—which complements its mission by advancing data-driven approaches to improve patient care and health outcomes.

At the heart of this innovation is the renowned Icahn School of Medicine at Mount Sinai, which serves as a central hub for learning and collaboration. This unique integration enables dynamic partnerships across institutes, academic departments, hospitals, and outpatient centers, driving progress in disease prevention, improving treatments for complex illnesses, and elevating quality of life on a global scale.

In 2024, the Department's innovative NutriScan AI application, developed by the Mount Sinai Health System Clinical Data Science team in partnership with Department faculty, earned Mount Sinai Health System the prestigious Hearst Health Prize. NutriScan is designed to facilitate faster identification and treatment of malnutrition in hospitalized patients. This machine learning tool improves malnutrition diagnosis rates and resource utilization, demonstrating the impactful application of AI in health care.

For more information on Mount Sinai's Windreich Department of AI and Human Health, visit: ai.mssm.edu

About the Hasso Plattner Institute at Mount Sinai

At the Hasso Plattner Institute for Digital Health at Mount Sinai, the tools of data science, biomedical and digital engineering, and medical expertise are used to improve and extend lives. The Institute represents a collaboration between the Hasso Plattner Institute for Digital Engineering in Potsdam, Germany, and the Mount Sinai Health System.

Under the leadership of Girish Nadkarni, MD, MPH, who directs the Institute, and Professor Lothar Wieler, a globally recognized expert in public health and digital transformation, they jointly oversee the partnership, driving innovations that positively impact patient lives while transforming how people think about personal health and health systems.

The Hasso Plattner Institute for Digital Health at Mount Sinai receives generous support from the Hasso Plattner Foundation. Current research programs and machine learning efforts focus on improving the ability to diagnose and treat patients.

About the Icahn School of Medicine at Mount Sinai

The Icahn School of Medicine at Mount Sinai is internationally renowned for its outstanding research, educational, and clinical care programs. It is the sole academic partner for the seven member hospitals* of the Mount Sinai Health System, one of the largest academic health systems in the United States, providing care to New York City’s large and diverse patient population.

The Icahn School of Medicine at Mount Sinai offers highly competitive MD, PhD, MD-PhD, and master’s degree programs, with enrollment of more than 1,200 students. It has the largest graduate medical education program in the country, with more than 2,600 clinical residents and fellows training throughout the Health System. Its Graduate School of Biomedical Sciences offers 13 degree-granting programs, conducts innovative basic and translational research, and trains more than 560 postdoctoral research fellows.

Ranked 11th nationwide in National Institutes of Health (NIH) funding, the Icahn School of Medicine at Mount Sinai is among the 99th percentile in research dollars per investigator according to the Association of American Medical Colleges. More than 4,500 scientists, educators, and clinicians work within and across dozens of academic departments and multidisciplinary institutes with an emphasis on translational research and therapeutics. Through Mount Sinai Innovation Partners (MSIP), the Health System facilitates the real-world application and commercialization of medical breakthroughs made at Mount Sinai.

-------------------------------------------------------

* Mount Sinai Health System member hospitals: The Mount Sinai Hospital; Mount Sinai Brooklyn; Mount Sinai Morningside; Mount Sinai Queens; Mount Sinai South Nassau; Mount Sinai West; and New York Eye and Ear Infirmary of Mount Sinai

Journal

npj Digital Medicine

DOI

10.1038/s41746-025-01792-y

Article Title

Pitfalls of Large Language Models in Medical Ethics Reasoning

Article Publication Date

22-Jul-2025

Study finds news releases written by humans more credible than AI content

Subjects also rated orgs more trustworthy when using human content, but approach of news release didn't vary between people, bots

University of Kansas

LAWRENCE — This news release was written by a real-life human being. Trust me.

New research from the University of Kansas has found that when people are told a news release addressing a corporate crisis was written by a human instead of by artificial intelligence, they find it more credible and the organization more trustworthy.

As AI steadily makes its way into more areas of everyday life, people are finding ways to use it in their work, both to successful and negative effects, often without disclosing when they use it. A KU communication studies graduate class was exploring whether people could tell the difference between writing authored by a human and AI when the idea for this study was born.

“Even if people can’t distinguish between human and AI writing, do they perceive it differently if it’s attributed to a bot? That was the essential question,” said Cameron Piercy, associate professor of communication studies at KU and one of the study’s authors. “How does AI affect how people consume things like public relations writing? We were glad to confirm that people favored human-generated content, but there was no difference between informational versus apology versus sympathy versions of the message.”

Public relations scholars have argued that the approach an author takes — whether the company provides a straightforward informational release or a message that is more understanding of the issues the crisis has caused for people — can make a difference in how people respond. Interestingly, the perception of the message’s strategy itself in this study was not influenced by whether the writer was thought to be human or machine.

Ayman Alhammad was the doctoral student in Piercy’s class and is a 2025 graduate of the William Allen White School of Journalism & Mass Communications at KU. Now a scholar who specializes in public relations and how it reaches different audiences at Imam Mohammad Ibn Saud Islamic University in Saudi Arabia, he and Piercy co-wrote the study with Christopher Etheridge, assistant professor of journalism & mass communications at KU. It was published in Corporate Communications: An International Journal.

For the study, the authors asked a sample of participants to read a news release issued in a crisis communications scenario. The authors told participants the release was coming from the fictional Chunky Chocolate Company, whose leadership recently learned that their chocolate had made some consumers sick because of employee tampering. After learning about the scenario, participants were randomly assigned a news release and told it was written by a human or by AI. In addition to the attribution, the researchers tested one of three strategies to address the crisis: sympathetic, informational or apologetic.

Those who read a release attributed to a human author reported higher levels of credibility and effectiveness of the message than those who read a piece attributed to AI. Those who read a piece that sympathized with people affected by the tainted product, one that provided straightforward information about the situation or apologizing for the incident did not rate any of the three conditions more effective than the others. Respondents did not find a human-written piece more sympathetic than one written by AI.

The researchers said they expected human-written material to be better perceived by readers but were surprised that the conditions did not make a difference. Still, the findings can help inform how organizations approach their communications with the public, and not only in times of crisis.

“To me, the findings raise more questions in this area than they answer, which is part of the fun of science,” Etheridge said. “If you decide to use AI as a writing tool, you really need to be on top of it. We think that’s what can really test the credibility of your organization and you as a writer.”

Etheridge added organizations can heed the same lessons he and Piercy tell their classes about using AI in their writing. To do so responsibly, one must be transparent about its use, be accountable for any mistakes it might make, edit it carefully and be ready for pushback or questioning from readers.

Whether public relations professionals decide to use AI in their communications, the same standards should apply and are backed by the study’s findings. The authors added that while there is no doubt PR professionals are using AI in various ways in their work, they should also do so responsibly and accountably and think about whether using it for crisis communications is the right approach. Infamous corporate crises like the BP gulf oil spill and Tylenol tampering cases of previous decades illustrate that any mistake can be compounded by poor public responses.

“At the end of the day, the public can’t hang responsibility on a machine. They have to hang the responsibility on a person,” said Piercy, who is director of KU’s Human-Machine Communication Lab. “Whether that’s a CEO or someone else, the public seems to be most accepting of a human message.”

Consumers may be wary, as it is not likely to be disclosed if the corporate communications they are reading was penned by a human or machine. For the study, Alhammad wrote the news that participants read, whether it was attributed to a person or machine. And yes, this news release about the research was in fact written by a real person.

Journal

Corporate Communications An International Journal

DOI

10.1108/CCIJ-01-2025-0005

Method of Research

Survey

Subject of Research

People

Article Title

Credibility and organizational reputation perceptions of news releases produced by artificial intelligence

AI used for real-time selection of actionable messages for government and public health campaigns

Annenberg Public Policy Center of the University of Pennsylvania

Public health promotion campaigns can be effective, but they do not tend to be efficient. Most are time-consuming, expensive, and reliant on the intuition of creative workers who design messages without a clear sense of what will spark behavioral change. A new study conducted by Dolores Albarrac í n and Man-pui Sally Chan of the University of Pennsylvania, government and community agencies, and researchers at the University of Illinois and Emory University suggests that artificial intelligence (AI) can facilitate theory- and evidence-based message selection.

The research group, led by Albarracín, a social psychologist who is the Amy Gutmann Penn Integrates Knowledge University Professor and director of the Annenberg Public Policy Center’s Communication Science Division, developed a series of computational processes to automatically generate an HIV prevention and testing campaign for counties in the United States, using real-time social media as a source for messages. The paper, whose lead author is Chan, a research associate professor at Penn’s Annenberg School for Communication, describes how the method provides a living repository of messages that can be selected based on the team’s theory and AI-generated data about messages that people and institutions circulate on social media.

Social media provide a living repository of messages generated by a community, from which effective messages can be drawn and amplified. The researchers designed AI tools to gather HIV prevention and testing messages from U.S. social media posts, then curate them for “actionability” – a crucial characteristic for messages aimed at motivating action – and select posts appropriate for a targeted priority population, in this case men who have sex with men (MSM).

The researchers then conducted three studies. The first, a computational analysis, established that the AI tool successfully chose messages with the desired qualities. The second, an online experiment with men who have sex with men, showed that the resulting messages are perceived as more actionable, personally relevant, and effective by the target audience than control messages not selected by the AI tool. The third, a field experiment involving public health agencies and community-based organizations with jurisdiction in 42 counties in the United States, showed that utilizing the AI message selection process made public health agencies substantially more likely to post HIV prevention messages on social media.

As part of the study, the researchers also tested messages that were vetted by a human researcher after being selected by the AI process against messages that were not vetted. AI-selected messages outperformed control messages in reported effectiveness, regardless of whether they were vetted, but vetted messages performed better than unvetted ones. Regardless of the advantage of vetting in terms of efficacy, the researchers caution that a brief human vetting process must be included as part of this method to avoid harmful content and misinformation.

This study, published recently in PNAS Nexus offers the first empirical evidence for the successful automatic selection of public health messages for community and government dissemination. Chan says this is a promising development. “AI processes like this one can provide an inexpensive and creative way for public health agencies to disseminate effective messages.” Albarracín concurs that “The era of AI will accelerate our ability to use theory and empirical evidence in rapid and continuous campaigns generation.”

“Living health-promotion campaigns for communities in the United States: Decentralized content extraction and sharing through AI,” was published in June 2025 in PNAS Nexus. See the paper for a full list of authors and affiliations. DOI: 10.1093/pnasnexus/pgaf171.

Journal

PNAS Nexus

DOI

10.1093/pnasnexus/pgaf171

Method of Research

Experimental study

Subject of Research

People

Article Title

Living health-promotion campaigns for communities in the United States: Decentralized content extraction and sharing through AI

AIPasta—using AI to paraphrase and repeat disinformation

PNAS Nexus

ap stimuli — image:
#StopTheSteal AIPasta Stimuli: Profile images, usernames, and handles constructed by Jalbert et al. 2025. Profiles do not represent real users and were created from stock images and with handles that are not currently in use.
view more
Credit: Dash et al.

Brace yourself for a new source of online disinformation: AIPasta. Research has demonstrated that generative AI can produce persuasive content. Meanwhile, so-called CopyPasta campaigns take advantage of the “repetitive truth” effect by repeating the exact same text over and over until it seems more likely to be true by those who encounter it many times. Saloni Dash and colleagues explore how these two strategies can be combined into what the authors term “AIPasta.” In AIPasta campaigns, AI can be used to produce many slightly different versions of the same message, giving the public the impression that the message is widely held by many different people and likely to be true. The authors used both CopyPasta and AIPasta methods to produce messaging around the conspiracy theories that the 2020 presidential election was fraudulent or that the COVID-19 pandemic was intentional. In an online survey of 1,200 Americans recruited via Prolific, neither CopyPasta nor AIPasta were effective in convincing study participants that the studied conspiracy theories were true. When examining just Republican participants, who might be predisposed to give credence to the specific conspiracies studied, AIPasta did increase belief in the false claim of the campaign more than CopyPasta. However, for participants of both parties, exposure to AIPasta—but not CopyPasta—increased the perception that there was broad consensus that the claim was true. According to the authors, the AIPasta generated for the study was not detected by AI-text detectors, suggesting it will be harder to remove from social media platforms than CopyPasta, which is likely to amplify its effectiveness compared to CopyPasta.

Journal

PNAS Nexus

Article Title

The persuasive potential of AI-paraphrased information at scale

Article Publication Date

22-Jul-2025

Discovering new materials: AI can simulate billions of atoms simultaneously

Allegro-FM achieves breakthrough scalability for materials research, enabling simulations 1,000 times larger than previous models.

University of Southern California

Imagine the concrete in our homes and bridges not only withstanding the ravages of time and natural disasters like the intense heat of wildfires, but actively self-healing or capturing carbon dioxide from the atmosphere.

Now, researchers at the USC Viterbi School of Engineering have developed a revolutionary AI model that can simulate the behavior of billions of atoms simultaneously, opening new possibilities for materials design and discovery at unprecedented scales.

The current state of the world’s climate is a dire one. Brutal droughts, evaporating glaciers, and more disastrous hurricanes, rainstorms and wildfires devastate us each year. A major contributor to global warming is the constant emission of carbon dioxide into the atmosphere.

Aiichiro Nakano, a USC Viterbi professor of computer science, physics and astronomy, and quantitative and computational biology, was contemplating these issues after the January wildfires in Los Angeles. So, he reached out to longtime partner Ken-Ichi Nomura, a USC Viterbi professor of chemical engineering and materials science practice, with whom he’s collaborated for over 20 years.

Discussing these issues together helped spark their new project: Allegro-FM, an artificial intelligence-driven simulation model. Allegro-FM has made a startling theoretical discovery: it is possible to recapture carbon dioxide emitted in the process of making concrete and place it back into the concrete that it helped produce.

“You can just put the CO2 inside the concrete, and then that makes a carbon-neutral concrete,” Nakano said.

Nakano and Nomura, along with Priya Vashishta, a USC Viterbi professor of chemical engineering and materials science, and Rajiv Kalia, a USC professor of physics and astronomy, have been doing research on what they call “CO2 sequestration,” or the process of recapturing carbon dioxide and storing it, a challenging process.

By simulating billions of atoms simultaneously, Allegro-FM can test different concrete chemistries virtually before expensive real-world experiments. This could accelerate the development of concrete that acts as a carbon sink rather than just a carbon source — concrete production currently accounts for about 8% of global CO2 emissions.

The breakthrough lies in the model’s scalability. While existing molecular simulation methods are limited to systems with thousands or millions of atoms, Allegro-FM demonstrated 97.5% efficiency when simulating over four billion atoms on the Aurora supercomputer at Argonne National Laboratory.

This represents computational capabilities roughly 1,000 times larger than conventional approaches.

The model also covers 89 chemical elements and can predict molecular behavior for applications ranging from cement chemistry to carbon storage.

“Concrete is also a very complex material. It consists of many elements and different phases and interfaces. So, traditionally, we didn’t have a way to simulate phenomena involving concrete material. But now we can use this Allegro-FM to simulate mechanical properties [and] structural properties,” Nomura said.

Concrete is a fire-resistant material, making it an ideal building choice in the wake of the January wildfires. But concrete production is also a huge emitter of carbon dioxide, a particularly concerning environmental problem in a city like Los Angeles. In their simulations, Allegro-FM has been shown to be carbon neutral, making it a better choice than other concrete.

This breakthrough doesn’t only solve one problem. Modern concrete only lasts about 100 years on average, whereas ancient Roman concrete has lasted for over 2,000 years. But the recapture of CO2 can help this as well.

“If you put in the CO2, the so-called ‘carbonate layer,’ it becomes more robust,” Nakano said.

In other words, Allegro-FM can simulate a carbon-neutral concrete that could also last much longer than the 100 years concrete typically lasts nowadays. Now it’s just a matter of building it.

Behind the scenes

The professors led the development of Allegro-FM with an appreciation for how AI has been an accelerator of their complex work. Normally, to simulate the behavior of atoms, the professors would need a precise series of mathematical formulas — or, as Nomura called them, “profound, deep quantum mechanics phenomena.”

But the last two years have changed the way the two research.

“Now, because of this machine-learning AI breakthrough, instead of deriving all these quantum mechanics from scratch, researchers are taking [the] approach of generating a training set and then letting the machine learning model run,” Nomura said. This makes the professors’ process much faster as well as more efficient in its technology use.

Allegro-FM can accurately predict “interaction functions” between atoms — in other words, how atoms react and interact with each other. Normally, these interaction functions would require lots of individual simulations.

But this new model changes that. Originally, there were different equations for individual elements within the periodic table, with several unique functions for these elements. With the help of AI and machine-learning, though, we can now potentially simulate these interaction functions with nearly the entire periodic table at the same time, without the requirement for separate formulas.

“The traditional approach is to simulate a certain set of materials. So, you can simulate, let’s say, silica glass, but you cannot simulate [that] with, let’s say, a drug molecule,” Nomura said.

This new system is also a lot more efficient on the technology side, with AI models making lots of precise calculations that used to be done by a large supercomputer, simplifying tasks and freeing up that supercomputer’s resources for more advanced research.

“[The AI can] achieve quantum mechanical accuracy with much, much smaller computing resources,” Nakano said.

Nomura and Nakano say their work is far from over.

“We will certainly continue this concrete study research, making more complex geometries and surfaces,” Nomura said.

This research was published recently in The Journal of Physical Chemistry Letters and was featured as the journal’s cover image.

Journal

The Journal of Physical Chemistry Letters

DOI

10.1021/acs.jpclett.5c00605

Method of Research

Observational study

Subject of Research

Not applicable

Article Title

Allegro-FM: Toward an Equivariant Foundation Model for Exascale Molecular Dynamics Simulations

Optimists are alike, every pessimist has their own way

Kobe University

2506XX-Yanagisawa-Optimism-Illustration — image:
In the journal *PNAS*, a Kobe University team around YANAGISAWA Kuniaki reports that when optimists think about future events, their neural activity patterns are mutually similar. Pessimists’ patterns, on the other hand, showed much more diversity. Inspired by the opening line of Leo Tolstoy’s “Anna Karenina,” the team summarizes its findings, saying, “Optimistic individuals are all alike, but each less optimistic individual imagines the future in their own way.”
view more
Credit: ASANO Kohei, SUGIURA Hitomi

When thinking about future events, optimists’ brains work similarly, while pessimists’ brains show a much larger degree of individuality. The Kobe University finding offers an explanation why optimists are seen as more sociable — they may share a common vision of the future.

Optimists tend to be more satisfied with their social relationships and have wider social networks. Kobe University psychologist YANAGISAWA Kuniaki says: “But what is the reason for this? Recent studies showed that the brains of people who occupy central social positions react to stimuli in similar ways. So it may be that people who share a similar attitude towards the future, too, truly envision it similarly in their brains and that this makes it easier for them to understand each other’s perspectives.”

To test this hypothesis, Yanagisawa assembled an interdisciplinary team from both the fields of social psychology and cognitive neuroscience. “The main reason why this question has remained untouched until now is that it exists in a gap between social psychology and neuroscience. However, the intersection of these two fields enabled us to open this black box.” They recruited 87 test subjects who covered the whole spectrum from pessimism to optimism and asked them to imagine various future events. While doing so, their brain activity was recorded with a technique called “functional magnetic resonance imaging (fMRI),” enabling the researchers to see how the test subjects’ thinking about the future materializes in their brains as patterns of neural activity.

In the journal PNAS, the Kobe University team reports that when optimists think about future events, their neural activity patterns are in fact mutually similar. Pessimists’ patterns, on the other hand, showed much more diversity. Inspired by the opening line of Leo Tolstoy’s “Anna Karenina,” the team summarizes its results, saying, “Optimistic individuals are all alike, but each less optimistic individual imagines the future in their own way.” Yanagisawa says, “What was most dramatic about this study is that the abstract notion of ‘thinking alike’ was literally made visible in the form of patterns of brain activity.”

Yanagisawa and his team also found that there is a more pronounced difference in neural patterns when thinking about positive events or negative events in optimists than in pessimists. “This means that more optimistic people perceive a clear distinction between good and bad futures in their brains. In other words, optimism does not involve positive reinterpretation of negative events. Instead, optimistic individuals typically process negative scenarios in a more abstract and psychologically distant manner, thus mitigating the emotional impact of such scenarios,” he explains.

The psychologist sums up the study, saying: “The everyday feeling of ‘being on the same wavelength’ is not just a metaphor. The brains of optimists may in a very physical sense share a common concept of the future. But this raises new questions. Is this shared mechanism something they are born with or is it woven in later, for example through experience and dialogue?” Yanagisawa’s ultimate goal is to gain a deeper understanding of what causes loneliness and what enables people to communicate with each other. He says, “I believe that elucidating the process by which this shared reality emerges is a step towards a society where people can communicate better.”

This work was supported by the Japan Society for the Promotion of Science (grants JP26780342, JP19H01747) and the Japan Science and Technology Agency (grant JPMJRX21K3). It was conducted in collaboration with researchers from Kyoto University, the Osaka University of Comprehensive Children Education, La Trobe University, and Kindai University.

Kobe University is a national university with roots dating back to the Kobe Higher Commercial School founded in 1902. It is now one of Japan’s leading comprehensive research universities with nearly 16,000 students and nearly 1,700 faculty in 11 faculties and schools and 15 graduate schools. Combining the social and natural sciences to cultivate leaders with an interdisciplinary perspective, Kobe University creates knowledge and fosters innovation to address society’s challenges.

Journal

Proceedings of the National Academy of Sciences

DOI

10.1073/pnas.2511101122

Method of Research

Experimental study

Subject of Research

People

Article Title

Optimistic people are all alike: Shared neural representations supporting episodic future thinking among optimistic individuals

Article Publication Date

21-Jul-2025

New peer-reviewed study reveals severe health and economic consequences of 2025 Medicaid policy changes

Research published in JAMA Health Forum projects 13-14 excess deaths and over 800 preventable hospitalizations annually per 100,000 people losing Medicaid coverage

Waymark

Waymark, a public benefit company dedicated to improving access and quality of care in Medicaid, today published peer-reviewed research in JAMA Health Forum examining the projected health system and economic impacts of 2025 Medicaid policy changes. The study, conducted in collaboration with researchers at the University of North Carolina at Chapel Hill, reveals that H.R. 1, the "One Big Beautiful Bill Act" recently passed by Congress, could result in devastating consequences for vulnerable populations, rural communities, and local economies nationwide.

Numerous studies from multiple organizations, including the nonpartisan Congressional Budget Office (CBO), estimate that Medicaid changes including eligibility restrictions, work requirements, and reduced federal matching rates would result in between 7.6 million and 14.4 million Americans becoming uninsured by 2034. Unlike previous analyses focused on enrollment projections, this study quantifies how changes in federal spending and coverage could impact population-level health outcomes and create economic ripple effects for communities across the country — particularly in rural areas already struggling with healthcare access.

Key findings:

The study projects that for every 100,000 people who lose Medicaid coverage, communities can expect substantial consequences for health outcomes and economic stability:

Health and Economic Impacts (Per 100,000 People Losing Coverage):

13-14 excess deaths annually
810-924 preventable hospitalizations annually
~2,582 jobs lost annually
~$1.2 billion in reduced economic output annually

Healthcare System Impacts (National Scale):

Rural hospitals face heightened risk of closure, with impact disproportionate to coverage losses due to the high concentration of patients on Medicaid in rural areas
Federally qualified health centers (FQHCs) experience revenue reductions of 18.7-26.1% depending on coverage loss magnitude and the degree to which patients losing Medicaid would be able to gain other forms of insurance (e.g., Exchange plans)

The study analyzed both base case and higher coverage loss scenarios, with per-capita health and economic consequences remaining consistent across both scenarios. These projected ratios can be applied regardless of the final number of people affected by the policy changes, as uncertainty remains regarding the scale of coverage losses due to administrative burdens of renewal and work requirement verification processes. The study is based on a comprehensive microsimulation model incorporating empirically derived parameters from peer-reviewed literature on health outcomes, healthcare systems, and local economies.

"This analysis demonstrates that Medicaid policy changes in H.R. 1 could have far-reaching consequences extending well beyond federal budget considerations," said Dr. Sanjay Basu MD PhD, lead author of the study and Co-Founder and Head of Clinical for Waymark. "The data shows that rural and underserved communities would bear a disproportionate burden of these policy changes, with implications for people’s lives and livelihoods that state and local policymakers must carefully consider."

With H.R. 1 now signed into law, these findings provide critical insights into what communities can expect as the legislation's provisions take effect. The law includes 80-hour monthly work requirements for able-bodied adults, enhanced eligibility verification every six months, and reduced federal matching rates for expansion populations—representing the most significant restructuring of Medicaid since the program's creation.

“Medicaid affects many different aspects of people’s lives,” said Dr. Seth A. Berkowitz MD MPH, co-author of the study and Associate Professor of Medicine at the University of North Carolina School of Medicine. “When Medicaid gets cut, there are of course health impacts to the people who lose coverage. But there are also important impacts to the broader community, and policymakers need to consider those impacts as well.”

Recognizing the importance of tracking implementation impacts, the research team has made their microsimulation model open source to enable updated estimates as implementation details are finalized. This approach ensures that policymakers and stakeholders have access to the most current projections as states develop their implementation plans.

"This research demonstrates the critical importance of understanding the full consequences of proposed Medicaid changes beyond federal budget numbers,” said Dr. Sadiq Y. Patel MSW PhD, an author for the study and VP of Data Science and Artificial Intelligence for Waymark. “Our model reveals that coverage losses would cascade through communities in ways that profoundly impact public health, healthcare delivery systems, and local economies. These findings should inform policymakers about the real-world trade-offs inherent in these policy decisions."

The research letter titled "Projected Health System and Economic Impacts of 2025 Medicaid Policy Proposals" was published in JAMA Health Forum. The study was conducted by Dr. Sanjay Basu (Waymark, University of California San Francisco), Dr. Sadiq Y. Patel (Waymark, University of Pennsylvania), and Dr. Seth A. Berkowitz (University of North Carolina at Chapel Hill).

About Waymark

Waymark is a public benefit company dedicated to improving access and quality of care for people receiving Medicaid. We partner with health plans and primary care providers—including health systems, community health centers, and independent practices—to improve outcomes through community-based care. Our local teams of community health workers, pharmacists, therapists and care coordinators use proprietary data science and machine learning technologies to deliver evidence-based interventions to hard-to-reach patient populations. Waymark's peer-reviewed research has been published in leading journals including the New England Journal of Medicine (NEJM) Catalyst, Nature Scientific Reports, and Journal of the American Medical Association (JAMA)—demonstrating measurable improvements in health outcomes and cost savings for Medicaid populations. For more information, visit www.waymarkcare.com.

Journal

JAMA Health Forum

DOI

10.1001/jamahealthforum.2025.3187

Article Title

Projected Health System and Economic Impacts of 2025 Medicaid Policy Proposals

Article Publication Date

16-Jul-2025

New York City intersections see one-third fewer pedestrian injuries with longer head-start intervals

Columbia University's Mailman School of Public Health
Facebook
X LinkedIn WeChat Bluesky Message WhatsApp Email

Giving pedestrians a 7-second head start at traffic lights—known as Leading Pedestrian Intervals (LPIs)—is associated with a 33 percent reduction in total pedestrian injuries – both fatal and non-fatal -- at New York City intersections, according to a new study from Columbia University Mailman School of Public Health.
The researchers analyzed data from 6,003 intersections—the largest dataset to date evaluating LPI effectiveness. The reduction in pedestrian injuries was consistent across all intersection types, with the most pronounced impact seen during the daytime: fatal pedestrian crashes dropped by 65 percent during daylight hours. The study was published in Nature Cities.
LPIs allow pedestrians to begin crossing before vehicles get a green light to turn, typically offering a 7–11 second lead depending on the intersection size.
“The idea is to give pedestrians time to reach the center of the intersection where they’re more visible,” said lead author Christopher Morrison, PhD, assistant professor of epidemiology at Columbia Mailman School. “Most pedestrian-vehicle crashes happen near the curb, where drivers are less likely to see people crossing.”
Using a spatial ecological panel design, the study evaluated intersection-level injury risk from 2013 to 2018, using precise geographic data from NYC Open Data and the city’s Vision Zero initiative. Of the intersections studied, 2,869 had LPI treatments installed.
New York City was an early adopter of the U.S. Vision Zero program, a systems-based, multidisciplinary effort to reduce traffic-related injuries and deaths. LPIs—alongside other low-cost measures like speed humps and turn-calming treatments like rubber speed bumps—are central to the city’s pedestrian safety strategy.
Globally, road traffic crashes cause more than 1.35 million deaths and 50 million injuries each year. In the U.S., over 68,000 pedestrian deaths and 6.1 million serious pedestrian injuries occurred between 2011 and 2020—many in large cities like New York.
The research team focused on pedestrian injuries occurring within 100 feet of a signalized intersection. Intersections within 10 feet of an LPI were categorized as treated; those beyond that buffer were considered untreated.
“As someone who lives in the city, it is good to know that interventions like LPIs led by NYCDOT are making pedestrians safe”, noted co-author Siddhesh (Sid) Zadey, doctoral student in Epidemiology at Columbia Mailman School.
“LPIs are one of the most affordable and scalable traffic safety interventions,” Morrison added. “A 7-second delay for drivers can mean the difference between life and death for pedestrians. Our findings show they work—and should be adopted more widely.”
Other co-authors include Leah Roberts, Brady Bushover, Arianna Gobaud, Christina Mehranbod, Carolyn Fish, Xiang Gao, Evan Eschliman, and Dana Goin—all from Columbia University Mailman School of Public Health.
Funding was provided by the CDC’s National Center for Injury Prevention and Control (Grant R49CE003094) and the National Institute on Drug Abuse (Grant T32DA031099).
Columbia University Mailman School of Public Health

Founded in 1922, the Columbia University Mailman School of Public Health pursues an agenda of research, education, and service to address the critical and complex public health issues affecting New Yorkers, the nation and the world. The Columbia Mailman School is the third largest recipient of NIH grants among schools of public health. Its nearly 300 multi-disciplinary faculty members work in more than 100 countries around the world, addressing such issues as preventing infectious and chronic diseases, environmental health, maternal and child health, health policy, climate change and health, and public health preparedness. It is a leader in public health education with more than 1,300 graduate students from 55 nations pursuing a variety of master’s and doctoral degree programs. The Columbia Mailman School is also home to numerous world-renowned research centers, including ICAP and the Center for Infection and Immunity. For more information, please visit www.mailman.columbia.edu.

Effectiveness of Leading Pedestrian Intervals for City Walkers’ Safety

Tuesday, July 22, 2025

Journal

DOI

Subject of Research

Article Title

Article Publication Date

Journal

DOI

Article Title

Article Publication Date

Journal

DOI

Method of Research

Subject of Research

Article Title

Journal

DOI

Method of Research

Subject of Research

Article Title

Journal

Article Title

Article Publication Date

Journal

DOI

Method of Research

Subject of Research

Article Title

Journal

DOI

Method of Research

Subject of Research

Article Title

Article Publication Date

Journal

DOI

Article Title

Article Publication Date

New York City intersections see one-third fewer pedestrian injuries with longer head-start intervals

Effectiveness of Leading Pedestrian Intervals for City Walkers’ Safety

18-Jul-2025