Tuesday, September 30, 2025

 

Doctors and nurses are better than AI at triaging patients



European Society for Emergency Medicine (EUSEM)
Doctors and nurses are better than AI at triaging patients 

image: 

Dr Renata Jukneviciene

view more 

Credit: Dr Renata Jukneviciene




Vienna, Austria: Doctors and nurses are better at triaging patients in emergency departments than artificial intelligence (AI), according to research presented at the European Emergency Medicine Congress today (Tuesday) [1].

However, Dr Renata Jukneviciene, a postdoctoral researcher at Vilnius University, Lithuania, who presented the study, said that AI could be useful when used in conjunction with clinical staff, but should not be used as a stand-alone triage tool.

“We conducted this study to address the growing issue of overcrowding in the emergency department and the escalating workload of nurses,” said Dr Jukneviciene. “Given the rapid development of AI tools like ChatGPT, we aimed to explore whether AI could support triage decision-making, improve efficiency and reduce the burden on staff in emergency settings.”

The researchers distributed a paper and digital questionnaire to six emergency medicine doctors and 51 nurses working in the emergency department of Vilnius University Hospital Santaros Klinikos. They asked them to triage clinical cases selected randomly from 110 reports cited in the PubMed database on the internet. The clinical staff were required to classify the patients according to urgency, placing them in one of five categories from most to least urgent, using the Manchester Triage System. The same cases were analysed by ChatGPT (version 3.5).

A total of 44 nurses (86.3%) and six doctors (100%) completed the questionnaire.

“Overall, AI underperformed compared to both nurses and doctors across most of the metrics we measured,” said Dr Jukneviciene. “For example, AI’s overall accuracy was 50.4%, compared to 65.5% for nurses and 70.6% for doctors. Sensitivity – how well it identified true urgent cases – for AI was also lower at 58.3% compared to nurses, who scored 73.8%, and doctors, who scored 83.0%.”

Doctors had the highest scores in all the areas and categories of urgency that the researchers analysed.

“However, AI did outperform nurses in the first triage category, which are the most urgent cases; it showed better accuracy and specificity, meaning that it identified the truly life-threatening cases. For accuracy, AI scored 27.3% compared to 9.3% for nurses, and for the specificity AI scored 27.8% versus 8.3%.”

The distribution of cases across the five categories of urgency was as follows:

 1
Most urgent
2     3   3   5
Least urgent
      
Doctors

9%

21%

29%

23%

18%

Nurses

9%

15%

35%

35%

6%

AI

29%

24%

43%

3%

1%

 

“These results suggest that while AI generally tends to over-triage, it may be somewhat more cautious in flagging critical cases, which can be both a strength and a drawback,” said Dr Jukneviciene.

Doctors also performed better than AI when considering cases that required or involved surgery, and in cases that required treatment with medication or other non-invasive therapies. For surgical cases, doctors scored 68.4%, nurses scored 63.% and AI scored 39.5% for reliability. For therapeutic cases, doctors scored 65.9%, nurses scored 44.5% and AI did better than nurses, scoring 51.9% for reliability.

“While we anticipated that AI might not outperform experienced clinicians and nurses, we were surprised that in some areas AI performed quite well. In fact, in the most urgent triage category, it demonstrated higher accuracy than nurses. This indicates that AI should not replace clinical judgement, but could serve as a decision-support tool in specific clinical contexts and in overwhelmed emergency departments.

“AI may assist in prioritising the most urgent cases more consistently and in supporting new or less experienced staff. However, excessive triaging could lead to inefficiencies, so careful integration and human oversight are crucial. Hospitals should approach AI implementation with caution and focus on training staff to critically interpret AI suggestions,” concluded Dr Jukneviciene.

The researchers are planning follow-up studies using newer versions of AI and AI models that are fine-tuned for medical purposes. They want to test them in larger groups of participants, include ECG interpretation, and explore how AI can be integrated into nurse training, specifically for triage and incidents involving mass casualties.

Limitations of the study include its small numbers, that it took place in a single centre, and that the AI analysis took place outside a real-time hospital setting, so it was not possible to assess how it could be used in the daily workflow; nor was it possible to interact with patients, assess vital signs and have follow-up data. In addition, ChatGPT 3.5 was not trained specifically for medical use.

Strengths of the study were that it used real clinical cases for comparison by a multidisciplinary group of doctors and nurses, as well as AI; its accessibility and flexibility was increased by distributing the questionnaire digitally and on paper; it was clinically relevant to current healthcare challenges such as overcrowding and staff shortages in the emergency department; and the study identified that AI over-triages many patients, assigning higher urgency to them, which is crucial knowledge for the safe implementation of AI in emergency departments.

Dr Barbra Backus is chair of the EUSEM abstract selection committee. She is an emergency physician in Amsterdam, The Netherlands, and was not involved in the study. She said: “AI has the potential to be a useful tool for many aspects of medical care and it is already proving its worth in areas such as interpreting x-rays. However, it has its limitations, and this study shows very clearly that it cannot replace trained medical staff for triaging patients coming in to emergency departments. This does not mean it should not be used, as it could aid in speeding up decision-making. However, it needs to be applied with caution and with oversight from doctors and nurses. I expect AI will improve in the future, but should be tested at every stage of development.”

On Monday 29 September, a colleague of Dr Jukneviciene’s, assistant professor Rakesh Jalali, from the University of Warmia and Mazury (Olsztyn, Poland), gave a presentation at the congress on the use of virtual reality to train clinical staff how to treat patients who have been subject to multiple traumatic injuries [2].

(ends)

[1] Abstract no: OA008, “Patient triaging in the ED: can artificial intelligence become the gold standard?” by Renata Jukneviciene, AI/Innovations session, Tuesday 30 September, 16:45-18:15 hrs CEST, Schubert 5 room: https://eusem.floq.live/kiosk/eusem-2025/dailyprogramme?objectClass=timeslot&objectId=68871e9e626af251d24be41d&type=detail

[2] ‘Enhancing medical simulation training: multicenter MedEd polytrauma VR project’, by Rakesh Jalali, Multinational-multicentric research projects in 2025 session, Monday 29 September, 16:45-18:15 hrs CEST, Strauss 1 room: https://eusem.floq.live/kiosk/eusem-2025/dailyprogramme?objectClass=timeslot&objectId=684ac10b86142240f858937a&type=detail

Exploring a novel approach for improving generative AI models



Study reinterprets Schrödinger bridge models to reduce overfitting and training costs in generative AI models


DOES A SCHRODINGER BRIDGE APPEAR AND DISAPPEAR 



Institute of Science Tokyo

Novel approach to advance generative AI models 

image: 

The developed model modified Schrödinger bridge-type diffusion models to add noise to real data through the encoder and reconstructed samples through the decoder. It uses two objective functions, the prior loss and drift matching, to reduce computational cost and prevent overfitting.

view more 

Credit: Institute of Science Tokyo




A new framework for generative diffusion models was developed by researchers at Science Tokyo, significantly improving generative AI models. The method reinterpreted Schrödinger bridge models as variational autoencoders with infinitely many latent variables, reducing computational costs and preventing overfitting. By appropriately interrupting the training of the encoder, this approach enabled development of more efficient generative AI, with broad applicability beyond standard diffusion models.

Diffusion models are among the most widely used approaches in generative AI for creating images and audio. These models generate new data by gradually adding noise (noising) to real samples and then learning how to reverse that process (denoising) back into realistic data. A widely used version, the score-based model, achieves this by the diffusion process connecting the prior to the data with a sufficiently long-time interval. This method, however, has a limitation that when the data differs strongly from the prior, the time intervals of the noising and denoising processes become longer, which causes slowing down sample generation.

Now, a research team from Institute of Science Tokyo (Science Tokyo), Japan, has proposed a new framework for diffusion models that is faster and computationally less demanding. They achieved this by reinterpreting Schrödinger bridge (SB) models, a type of diffusion model, as variational autoencoders (VAEs).

The study was led by graduate student Mr. Kentaro Kaba and Professor Masayuki Ohzeki from the Department of Physics at Science Tokyo, in collaboration with Mr. Reo Shimizu (then a graduate student) and Associate Professor Yuki Sugiyama from the Graduate School of Information Sciences at Tohoku University, Japan. Their findings were published in Volume 7, Issue 3 of the Physical Review Research on September 3, 2025.

SB models offer greater flexibility than standard score-based models because they can connect any two probability distributions over a finite time using a stochastic differential equation (SDE). This supports more complex noising processes and higher-quality sample generation. The trade-off, however, is that SB models are mathematically complex and expensive to train.

The proposed method addresses this by reformulating SB models as VAEs with multiple latent variables. “The key insight lies in extending the number of latent variables from one to infinity, leveraging the data-processing inequality. This perspective enables us to interpret SB-type models within the framework of VAEs,” says Kaba.

In this setup, the encoder represents the forward process that maps real data onto a noisy latent space, while the decoder reverses the process to reconstruct realistic samples, and both processes are modeled as SDEs learned by neural networks.

The model employs a training objective with two components. The first is the prior loss, which ensures that the encoder correctly maps the data distribution to the prior distribution. The second is drift matching, which trains the decoder to mimic the dynamics of the reverse encoder process. Moreover, once the prior loss stabilizes, encoder training can be stopped early. This allows us to complete learning faster, reducing the risk of overfitting and preserving high accuracy in SB models.

“The objective function is composed of the prior loss and drift matching parts, which characterizes the training of neural networks in the encoder and the decoder, respectively. Together, they reduce the computational cost of training SB-type models. It was demonstrated that interrupting the training of the encoder, mitigated the challenge of overfitting,” explains Ohzeki.

This approach is flexible and can be applied to other probabilistic rule sets, even non-Markov processes, making it a broadly applicable training scheme.

 

***

About Institute of Science Tokyo (Science Tokyo)

Institute of Science Tokyo (Science Tokyo) was established on October 1, 2024, following the merger between Tokyo Medical and Dental University (TMDU) and Tokyo Institute of Technology (Tokyo Tech), with the mission of “Advancing science and human wellbeing to create value for and with society.”

  

Automatically disadvantaged? What benefit recipients think about the use of AI in welfare decisions


Surveys in the US and the UK on attitudes toward automated decision-making processes in the allocation of social benefits




Max Planck Institute for Human Development

Algorithms in public administration 

image: 

Using AI systems to approve social benefits promises greater speed and efficiency. But are these systems accepted by everyone?

view more 

Credit: MPI for Human Development






A few years ago, the city of Amsterdam piloted an AI program called Smart Check, designed to identify potential cases of welfare fraud. Instead of reviewing applications randomly, the system sifted through numerous data points from municipal records—such as addresses, family composition, income, assets, and prior welfare claims—to assign a “risk score.” Applications deemed “high-risk” were labeled as research-worthy and forwarded to the administrative staff for additional scrutiny. In practice, however, this process disproportionately flagged vulnerable groups, including immigrants, women, and parents, often without offering applicants a clear reason or an effective route to contest the suspicion. Mounting criticism from advocacy groups, legal scholars, and researchers led the city to suspend the program earlier this year, and a recent evaluation confirmed the system’s significant shortcomings. 

This case highlights a central dilemma in the use of AI in welfare administration: while such systems promise greater efficiency and faster decisions, they also risk reinforcing biases, eroding trust, and disproportionately burdening vulnerable groups. Against this backdrop, researchers have begun to investigate how those directly affected perceive the increasing role of AI in the distribution of social benefits. 

In a study published in Nature Communications, researchers at the Max Planck Institute for Human Development and the Toulouse School of Economics conducted three large-scale surveys with over 3,200 participants in the US and the UK to find out how people feel about the use of AI in the allocation of social benefits. The surveys focused on a realistic dilemma: Would people be willing to accept faster decisions made by a machine, even if this meant an increase in the rate of unjustified rejections? The key finding was that while many citizens are willing to accept minor losses in accuracy in favor of shorter waiting times, social benefit recipients have significantly greater reservations about AI-supported decisions. 

“There is a dangerous assumption in policy-making that the average opinion represents the reality of all stakeholders,” explains lead author Mengchen Dong, a research scientist at the Center for Humans and Machines at the Max Planck Institute for Human Development who deals with ethical issues surrounding the use of AI. In fact, the study reveals a clear divide: social welfare recipients reject AI-supported decisions significantly more often than non-recipients — even if the systems promise faster processing. 

Another problem is that non-recipients systematically overestimate how willing to trust AI those affected would be . This is true even when they are financially rewarded for realistically assessing the other group’s perspective. Vulnerable groups therefore understand the majority society's point of view better than their own is understood. 

Methodology: Simulated decision dilemmas and perspective shifts 

The researchers presented the participants with realistic decision-making scenarios: They could choose between processing by human administrators with a longer waiting time (e.g., eight weeks) or a faster decision by AI — combined with a 5 to 30 percent higher risk of incorrect rejections. 

Participants were asked to decide which option they would prefer – either from their own perspective or as part of a targeted change of perspective in which they were asked to put themselves in the shoes of the other group (benefit recipients or non-recipients). 

While the US sample was representative of the population (around 20 percent of respondents were currently receiving social benefits), the British study specifically aimed for a 50/50 ratio between recipients of Universal Credit — a social benefit for low-income households — and non-recipients. This allowed differences between the groups to be systematically recorded. Demographic factors such as age, gender, education, income, and political orientation were also taken into account. 

What are the benefits of a change of perspective? And does a right to object help? 

The British sub-study also tested whether financial incentives could improve the ability to adopt a realistic perspective. Participants received bonus payments if their assessment of the other group was close to their actual opinion. Despite the incentives, systematic misjudgments persisted, especially among those who did not receive benefits. 

Another attempt to strengthen trust in AI also had only limited success: Some participants were informed that the system offered a hypothetical possibility to appeal AI decisions to human administrators. Although this information slightly increased trust, it did little to change the fundamental assessment of AI use. 

Consequences for trust in government and administration 

According to the study, the acceptance of AI in the social welfare system is closely linked to trust in government institutions. The more people resent AI in making welfare decisions, the less they trust the governments that use it. This applies to both recipients and non-recipients. In the UK, where the study examined the planned use of AI in the allocation of Universal Credit, many participants said that even if AI’s performance on speed and accuracy were the same, they would prefer human case workers to AI. The mention of a possible appeal process did little to change this. 

Call for participatory development of AI systems 

The researchers warn against developing AI systems for the allocation of social benefits solely according to the will of the majority or on the basis of aggregated data. “If the perspectives of vulnerable groups are not actively taken into account, there is a risk of wrong decisions with real consequences — such as unjustified benefit withdrawals or false accusations,” says co-author Jean-François Bonnefon, Director of the Social and Behavioral Sciences Department at Toulouse School of Economics. 

The team of authors therefore calls for a reorientation of the development of public AI systems: away from purely technical efficiency metrics and toward participatory processes that explicitly include the perspectives of vulnerable groups. Otherwise, there is a risk of undesirable developments that will undermine trust in administration and technology in the long term. Building on this work in the US and UK, an ongoing collaboration will leverage Statistics Denmark’s infrastructure to engage vulnerable populations in Denmark and uncover their unique perspectives on broader public administration decisions. 

In brief:

  • Large-scale surveys: Surveys with more than 3,200 participants on attitudes toward AI-supported decision-making processes in the allocation of social benefits in the US and the UK. 
  • Differences between social welfare recipients and non-recipients: Social welfare recipients are more skeptical of AI-supported decisions than non-recipients; Non-recipients systematically overestimate the trust that those affected have in AI, even when they are rewarded for assessing their perspective realistically. 
  • Trust-building measures: Measures such as a hypothetical right of appeal only slightly increase trust in AI, but do not change the fundamental rejection among those affected. 
  • Design of AI systems: The study calls for participatory development processes for AI systems that actively incorporate the perspectives of vulnerable groups—otherwise there is a risk of losing trust in government and administration.