Thursday, March 26, 2026

 

Evaluation of large language model chatbot responses to psychotic prompts



JAMA Psychiatry





About The Study

All 3 versions of ChatGPT tested in this study showed high rates of inappropriate or partially appropriate responses to psychotic prompts with overlapping confidence intervals. In the across-version analysis, GPT-5 did perform better than the free product that most users interact with. This is notable because individuals with psychosis risk may be overrepresented among the economically disadvantaged. 




Corresponding Author: To contact the corresponding author, Amandeep Jutla, MD, email aj2843@cumc.columbia.edu.

To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/

(10.1001/jamapsychiatry.2026.0249)

Editor’s Note: Please see the article for additional information, including other authors, author contributions and affiliations, conflict of interest and financial disclosures, and funding and support.

Embed this link to provide your readers free access to the full-text article 

https://jamanetwork.com/journals/jamapsychiatry/fullarticle/10.1001/jamapsychiatry.2026.0249?guestAccessKey=750a37d4-b30c-4576-ae27-58280d483466&utm_source=for_the_media&utm_medium=referral&utm_campaign=ftm_links&utm_content=tfl&utm_term=032526

Large language models and creativity





PNAS Nexus
LLM scatterplot 

image: 

LLM responses cluster together in feature space more than do human responses. Each dot is a response from a human or LLM, and closer dots are more similar.
 

view more 

Credit: Emily Wenger and Yoed N. Kenett




Can using a large language model (LLM) make a person more creative? Prior work has shown that using LLMs can make creative outputs more homogeneous, but this homogenization could stem from the specific LLM used or from widespread use of the same model. 


Emily Wenger and Yoed N. Kenett asked humans recruited from the Prolific platform and a broad range of LLMs to complete multiple tasks designed to measure different facets of creativity. For example, one task asked participants to come up with as many uses as possible for an item like a fork or a pair of pants. Another task asked participants to think of 10 nouns that are as different from one another as possible. Across the board, the authors found that LLM responses were significantly more similar to each other than human responses. In isolation, a single LLM response to a task was typically rated as roughly equally creative or more creative than the average human response. However, when compared to other outputs from other LLMs—whether Gemini, GPT, or Llama—similar ideas and responses emerged again and again. Increasing model temperature, which describes the level of randomness in model outputs, made the responses more variable than those produced by lower-temperature settings, but higher temperatures also quickly turned the outputs into gibberish that did not fulfill task requirements. According to the authors, it is likely the use of LLMs in general, rather than the use of any specific LLM, that causes outputs to be homogeneous. Whether LLMs can be improved to reach or surpass human creativity is an open question, given that by their nature, they lack bodies, experiences, intentions, individuality, or understanding, some or all of which may be necessary to simulate human creativity. According to the authors, relying on LLMs for brainstorming, problem solving, or making art risks harming human thinking.

 

Fragmented phone use — not total screen time — is the main driver of information overload, study finds



Frequent micro checks and bursts of messaging are most strongly linked to feeling overloaded — and these habits are the hardest to change, says research from Aalto University



Aalto University

Stop fiddling with your phone 

image: 

The research show that fragmented use occurs most often on mobile devices and especially in messaging.

view more 

Credit: Matti Ahlgren / Aalto University





Amid hot discussion on screen time, social media use and the impact of digital devices on our well-being, a seven-month study from Aalto University in Finland sheds new light on what overwhelms users the most –– and the results aren’t what you might think. 

‘Screen time does matter, but the heaviest users aren’t the most overloaded,’ says doctoral researcher Henrik Lassila. ‘Those who feel most overwhelmed are the ones who return to their phone again and again for brief moments and then put it down shortly after.’

The seven-month study followed the digital behaviour of nearly 300 adults in Germany across smartphones and computers. Participants completed repeated surveys about information overload, while all apps and websites used were logged, creating a rich longitudinal dataset of real world device use.

The findings show that fragmented use occurs most often on mobile devices and especially in messaging. For example, watching a short clip, locking the screen, then returning a few minutes later — patterns that create gaps and constant task switching. These ‘bursty’ routines were most strongly associated with feeling overwhelmed, even when total time spent on devices was similar.

‘We feel overloaded when we can’t process all the incoming information and our minds feel ‘full’ or stressed,’ Lassila says. ‘Information overload is linked with negative emotions, which can in turn drive more checking — a vicious cycle.’ While the study doesn’t directly address the question of why fragmented checking is so stressful, Lassila suggests that task-switching has been identified in other studies as particularly cognitively tiring.

Interestingly, although fragmented use often includes messaging, the study found that more time spent messaging did not by itself correspond to higher digital overwhelm. Rather, it was the short, frequent returns to the device that mattered most.

Hard habits to break

Earlier surveys have suggested that people quit social media when they feel a sense of digital overwhelm. The new study found little evidence for that. ‘People find it hard to change their behaviour,’ says Professor Janne Lindqvist. ‘Surprisingly, highly overloaded and non-overloaded participants used their devices for roughly the same total time over the study period. Those at the highest levels of overload tended to stay there, and those not overloaded rarely became overloaded.’

According to the researchers, device use and the feeling of overload are tightly woven into daily routines, making them difficult to change. One practical idea is a ‘micro-check tracker’ that would show users how often they return to their phones in short bursts. ‘You don’t need to respond to every ping immediately. Do one thing at a time,’ Lindqvist advises. ‘Ideally, turn off non‑essential notifications and be present with whatever you’re doing.’

In a follow-up study currently under peer review, the team also finds that overload correlates with psychological stress, negative emotions and anxiety.

‘These days many of us are on our phones repeatedly,’ Lindqvist says. ‘Try batching: check messages twice a day and reply in one session. Based on our findings, you may feel less stressed.’

The paper, ‘Stop Fiddling With Your Phone and Go Offline’, will be presented at CHI 2026, the leading conference on human–computer interaction, and is available online here: https://goodlife.aalto.fi/resources/pdfs/CHI26_IO.pdf

Excessive screen time signals health risk for young adults



High screen time and low physical activity are strongly linked with cardiovascular



American College of Cardiology




People who reported spending six or more hours on screens outside of school or work had worse blood pressure, cholesterol and body mass index (BMI) compared with those with more limited screen time, according to a study being presented at the American College of Cardiology’s Annual Scientific Session (ACC.26).

Screen time was independently associated with these markers of cardiovascular risk even after accounting for differences in daily physical activity. The findings point to excessive time spent playing video games, watching videos and scrolling social media as an emerging risk factor among young people, researchers said, suggesting that clinicians could assess screen use as an early indicator that patients may be on a trajectory to develop heart disease.  

“Traditionally, lifestyle counseling focuses mainly on encouraging exercise, but our findings suggest that reducing excessive screen exposure could be an additional and independent target for intervention,” said Zain Islam, MD, a cardiologist at Liaquat University of Medical & Health Sciences and Taqi Medical Center in Hyderabad, Pakistan, and the study’s lead author. “This may lead to more nuanced counseling—not only promoting physical activity but also addressing digital behavior patterns, digital wellness and structured limits on prolonged screen use.”

Researchers analyzed heart health markers and daily habits of 382 adults who were about 35 years old, on average, living in Hyderabad and Karachi, two cities in Pakistan. South Asians carry a disproportionately high burden of premature cardiovascular disease, which affects people at younger ages compared with Western populations. Pakistan and other countries in the region have also seen an uptick in screen use among young adults due to rapid urbanization and adoption of digital technologies in homes and workplaces. This study is the first to examine how these changing lifestyle patterns may affect heart health specifically in South Asia and is among the first to focus on screen time as a specific risk factor, according to the researchers.

“What makes this study different is that we looked at screen time as a specific, measurable digital behavior rather than just broadly labeling people as sedentary,” Islam said. “While sedentary lifestyle has been studied before, fewer studies have separated screen exposure from general physical inactivity or examined how these two factors interact with each other.”

Based on questionnaires, researchers grouped participants according to their screen habits (more or less than six hours on screens outside of school or work per day) and physical activity levels (more or less than 150 minutes of exercise per week). After adjusting for age, sex and baseline clinical characteristics, they found that people spending more than six hours a day on screens had, on average, about 18 mmHg higher systolic blood pressure, over 28 mg/dL higher low-density lipoprotein (LDL) cholesterol and over 3.9 mg/dL lower high-density lipoprotein (HDL) cholesterol compared with those spending less than six hours on screens per day. In addition, these participants had significantly higher BMI, waist circumference and waist-to-height ratio. Higher screen time was also associated with increased cigarette smoking and vaping, with over one-quarter of these participants reporting nicotine use compared with 12% among those with lower screen exposure.

Although these relationships were independent from physical activity levels, researchers also found a synergistic effect between screen use and exercise. High screen time and low physical activity together produced a greater adverse impact on blood pressure and BMI than either factor alone. “In other words, these behaviors don’t just add risk independently—they seem to amplify each other when they occur together,” Islam said.

Based on the findings, Islam said that clinicians should incorporate screen time alongside traditional lifestyle factors to assess patients’ cardiovascular risk and develop tailored interventions that promote both physical activity and healthier screen habits.

As an observational study, researchers said that the research does not establish causality. In addition, screen time was self-reported, and assessments may not consistently differentiate between work-related and recreational use. Additional factors such as diet, sleep or stress may also play a role in the associations that were observed.

Because cultural, environmental and socioeconomic factors differ around the world, Islam said that it is important to continue to study behaviors like screen use in specific populations since findings from one country or region may not translate directly in other places. He said that future studies could include larger multicenter cohorts, objective digital tracking tools rather than self-reporting and longitudinal follow-up to evaluate hard cardiovascular outcomes. Interventional studies testing whether reducing screen time improves cardiometabolic markers would also be an important next step.

Islam will present the study, “Association Between Screen Time, Physical Inactivity, and Cardiovascular Risk Markers in Young Adults: A Prospective Observational Study,” on Saturday, March 28, at 2:00 p.m. CT / 19:00 UTC in Posters, Hall E.

ACC.26 will take place March 28-30, 2026, in New Orleans, bringing together cardiologists and cardiovascular specialists from around the world to share the newest discoveries in treatment and prevention. Follow @ACCinTouch@ACCMediaCenter and #ACC26 for the latest news from the meeting.

The American College of Cardiology (ACC) is the global leader in transforming cardiovascular care and improving heart health for all. As the preeminent source of professional medical education for the entire cardiovascular care team since 1949, ACC credentials cardiovascular professionals in over 140 countries who meet stringent qualifications and leads in the formation of health policy, standards and guidelines. Through its world-renowned family of JACC Journals, NCDR registries, ACC Accreditation Services, global network of Member Sections, CardioSmart.org patient resources and more, the College is committed to ensuring a world where science, knowledge and innovation optimize patient care and outcomes. Learn more at ACC.org.

###

 

Automatic estimation and evaluation of multi-objective human preferences for learning from demonstration



ELSP
A framework to learn and estimate user preferences through multiple interactions 

image: 

A framework to learn and estimate user preferences through multiple interactions

view more 

Credit: Brendan Hertel/University of Massachusetts Lowell




Researchers have explored human preferences for robot motion on a variety of household tasks. The study aimed to investigate whether preferences were similar between tasks, users, and if robots should behave in a human-like manner. The results found that preferences should be highly individualized, presenting a challenging future for integrating robots into everyday lives.

Robots are becoming increasingly common in the modern world, from industrial to domestic environments. As robots enter these new spaces, they should interact with users in ways that align with human preferences. Traditionally, this has been assumed to be mimicking human-like behaviors and motions, but a team of researchers at the University of Massachusetts Lowell is challenging this notion with a new experiment investigating human preference of robot motion.

This experiment allowed users to physically interact with robots in a series of experiments to determine their preference of motion. Not only did these experiments allow researchers to investigate user preference, but the robots also attempted to adjust their motion to the user preference over. This automatic adjustment allowed for individualized robot motion, which is a key step towards integrating robots into our daily lives.

In this experiment, users first provide demonstrations of a task, such as placing dishware into a dishwasher. Then, through a series of interactions, the robot learns and estimates the user preference, replicating the task according to these preferences. Once the preference is learned, a new task is demonstrated, such as closing the dishwasher. Again, preference is learned through a series of interactions but this time, the robot attempts to re-use the preference learned in a previous task. “What we are looking for is if different tasks can have similar preferences,” says lead researcher Brendan Hertel, “so that we can simplify the ways the robot has to learn to move. If preferences are more individualized, it makes learning harder.”

After performing these experiments, the results offer some key insights into human preferences for robot motion. The first is that preferences vary from task to task, meaning the preferred way of placing a dish in a dishwasher is unlike the preferred way of closing the dishwasher. This means that robots should learn individual motions for each task demonstrated, decreasing the viability of “one-size-fits-all” methods like state-of-the-art motion generation methods. Particularly, there was significant differences in the smoothness preferred by users for various tasks. This means that robot motions should adjust the smoothness of their motions according to the task at hand.

This experiment also investigated if users presented specific preferences, meaning motions should be individualized. As it turns out, different users have different preferences for different tasks. This means that motions should not only be adjusted according to the task at hand, but also to the user observing the task. “There may be many factors pertaining to someone’s preference for motion,” commented Brendan Hertel, “for example, in the U.S. they drive on the right but in the U.K. they drive on the left. What looks correct to someone may be completely wrong for someone else.”

Finally, this study looked at if user preferences aligned with human-like motion, as had been assumed in previous works. Overall, no such preference was found. The results indicated that “users do not perform demonstrations according to their preferences, and reproductions should be adjusted accordingly.” In fact, users wanted robots to move in a much smoother way than themselves.

While this work presents a step forward in understanding human preferences of robot motion and integrating robots with humans, there are some limitations. Notably, this study was limited to short tasks and used only a robotic arm, not the more human-like humanoid robots.

Also, further experiments and research are always needed. Understanding human preferences, especially with respect to robots, is an underdeveloped field that will only become more necessary as the impact of robots increases in our lives.

This paper “Automatic estimation and evaluation of multi-objective human preferences for Learning from Demonstration” was published in Robot Learning.

Hertel B, Nguyen T, Cabrera ME, Azadeh R. Automatic estimation and evaluation of multi-objective human preferences for Learning from Demonstration. Robot Learn. 2026(1):0006, https://doi.org/10.55092/rl20260006. 

DOI: 10.55092/rl20260006