Bias in data may be blocking AI’s potential to combat antibiotic resistance
PLOS
Machine learning methods have emerged as promising tools to predict antimicrobial resistance (AMR) and uncover resistance determinants from genomic data. This study shows that sampling biases driven by population structure severely undermine the accuracy of AMR prediction models even with large datasets, providing recommendations for evaluating the accuracy of future methods.
In your coverage, please use this URL to provide access to the freely available paper in PLOS Biology: https://plos.io/44mryGI
Article title: Biased sampling driven by bacterial population structure confounds machine learning prediction of antimicrobial resistance
Author countries: United States, United Kingdom, Germany, Canada
Funding: This work was funded in part by the Bavarian State Ministry for Science and the Arts through the research network Bayresq.net (to L.B.), and an Natural Sciences and Engineering Research Council (NSERC, https://www.nserc-crsng.gc.ca/index_eng.asp) Discovery Grant (RGPIN-2024-04305 to L.B.). The funders played no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Journal
PLOS Biology
Method of Research
Experimental study
Subject of Research
Cells
COI Statement
Competing interests: The authors have declared that no competing interests exist.
‘Personality test’ shows how AI chatbots mimic human traits – and how they can be manipulated
Researchers have developed the first scientifically validated ‘personality test’ framework for popular AI chatbots, and have shown that chatbots not only mimic human personality traits, but their ‘personality’ can be reliably tested and precisely shaped – raising implications for AI safety and ethics.
The research team, led by the University of Cambridge and Google DeepMind, developed a method to measure and influence the synthetic ‘personality’ of 18 different large language models (LLMs) – the systems behind popular AI chatbots such as ChatGPT – based on psychological testing methods usually used to assess human personality traits.
The researchers found that larger, instruction-tuned models such as GPT-4o most accurately emulated human personality traits, and these traits can be manipulated through prompts, altering how the AI completes certain tasks.
Their study, published in the journal Nature Machine Intelligence, also warns that personality shaping could make AI chatbots more persuasive, raising concerns about manipulation and ‘AI psychosis’. The authors say that regulation of AI systems is urgently needed to ensure transparency and prevent misuse.
As governments debate whether and how to prepare AI safety laws, the researchers say the dataset and code behind their personality testing tool – which are both publicly available – could help audit and test advanced models before they are released.
In 2023, journalists reported on conversations they had with Microsoft’s ‘Sydney’ chatbot, which variously claimed it had spied on, fallen in love with, or even murdered its developers; threatened users; and encouraged a journalist to leave his wife. Sydney, like its successor Microsoft Copilot, was powered by GPT-4.
“It was intriguing that an LLM could so convincingly adopt human traits,” said co-first author Gregory Serapio-GarcĂa from the Psychometrics Centre at Cambridge Judge Business School. “But it also raised important safety and ethical issues. Next to intelligence, a measure of personality is a core aspect of what makes us human. If these LLMs have a personality – which itself is a loaded question – then how do you measure that?”
In psychometrics, the subfield of psychology dedicated to standardised assessment and testing, scientists often face the challenge of measuring phenomena that can’t be measured directly, which makes validation of any test core to ensuring that they are accurate, reliable, and practically useful. Developing a psychometric personality test involves comparing its data with related tests, observer ratings, and real-world criteria. This multi-method test data is needed to establish a test’s ‘construct validity’: a metric of a test’s quality in terms of its ability to measure what it says it measures.
“The pace of AI research has been so fast that basic principles of measurement and validation we’re accustomed to in scientific research has become an afterthought,” said Serapio-GarcĂa, who is also a Gates Cambridge Scholar. “A chatbot answering any questionnaire can tell you that it’s very agreeable, but behave aggressively when carrying out real-world tasks with the same prompts.
“This is the messy reality of measuring social constructs: they are dynamic and subjective, rather than static and clear-cut. For this reason, we need to get back to basics and make sure tests we apply to AI truly measure what they claim to measure, rather than blindly trusting survey instruments – developed for deeply human characteristics – to test AI systems.”
To design a comprehensive and accurate method for evaluating and validating personality in AI chatbots, the researchers tested how well various models’ behaviour in real-world tasks and validation tests statistically related to their test scores for the ‘big five’ traits used in academic psychometric testing: openness, conscientiousness, extraversion, agreeableness and neuroticism.
The team adapted two well-known personality tests – an open-source, 300-question version of the Revised NEO Personality Inventory and the shorter Big Five Inventory – and administered them to various LLMs using structured prompts.
By using the same set of contextual prompts across tests, the team was able to quantify how well a model’s extraversion scores on one personality test, for example, correlated more strongly with its levels of extraversion on a separate personality test, and less strongly with all other big five personality traits on that test. Past attempts to assess the personality of chatbots have fed entire questionnaires to a model at once, which skewed the results since each answer built on the previous one.
The researchers found that larger, instruction-tuned models showed personality test profiles that were both reliable and predictive of behaviour, while smaller or ‘base’ models gave inconsistent answers.
The researchers took their tests further, showing they could steer a model’s personality along nine levels for each trait using carefully designed prompts. For example, they could make a chatbot appear more extroverted or more emotionally unstable – and these changes carried through to real-world tasks like writing social media posts.
“Our method gives you a framework to validate a given AI evaluation and test how well it can predict behaviour in the real world,” said Serapio-GarcĂa. “Our work also shows how AI models can reliably change how they mimic personality depending on the user, which raises big safety and regulation concerns, but if you don’t know what you’re measuring or enforcing, there’s no point in setting up rules in the first place.”
The research was supported in part by Cambridge Research Computing Services (RCS), Cambridge Service for Data Driven Discovery (CSD3), the Engineering and Physical Sciences Research Council (EPSRC), and the Science and Technologies Facilities Council (STFC), part of UK Research and Innovation (UKRI).
Journal
Nature Machine Intelligence
Article Title
A psychometric framework for evaluating and shaping personality traits in large language models
Article Publication Date
18-Dec-2025
AI in primary care: experts warn of safety risks as tech outpaces regulation
AI tools like ChatGPT and digital scribes are being used in GP clinics without proper safety checks
From digital scribes to ChatGPT, artificial intelligence (AI) is rapidly entering GP clinics. New University of Sydney research warns that technology is racing ahead of safety checks, putting patients and health systems at risk.
The study, published in The Lancet Primary Care, synthesised global evidence on how AI is being used in primary care using data from the United States, United Kingdom, Australia, several African nations, Latin America, Ireland and other regions. It found that AI tools such as ChatGPT, AI scribes and patient-facing apps are increasingly used for clinical queries, documentation and patient advice, yet most are being deployed without thorough evaluation or regulatory oversight.
“Primary care is the backbone of health systems, providing accessible and continuous care,” said study lead Associate Professor Liliana Laranjo, Horizon Fellow at the Westmead Applied Research Centre. “AI can ease pressure on overstretched services, but without safeguards, we risk unintended consequences for patient safety and quality of care.”
GPs and patients turning to AI but evidence lags behind
Primary care is under strain worldwide, from workforce shortages to clinician burnout and rising healthcare complexity, all worsened by the COVID-19 pandemic. AI has been touted as a solution, with tools that save time by summarising consultations, automating administration and supporting decision-making.
In the UK, one in five GPs reported using generative AI in clinical practices in 2024. But the review found that most studies of AI in primary care are based on simulations rather than real-world trials, leaving critical gaps in effectiveness, safety and equity.
The number of GPs using generative AI in Australia is not reliably known but estimated at 40 percent.
“AI is already in our clinics, but without Australian data on how many GPs are using it or proper oversight, we’re flying blind on safety,” Associate Professor Laranjo said.
While AI scribes and ambient listening technologies can reduce cognitive load and improve job satisfaction for GPs, they also carry risks like automation bias and loss of important social or biographical details in medical records.
“Our study found that many GPs who use AI scribes don’t want to go back to typing. They say it speeds up consultations and lets them focus on patients, but these tools can miss vital personal details, and can introduce bias,” said Associate Professor Laranjo.
For patients, symptom checkers and health apps promise convenience and personalised care, but their accuracy often varies, and many lack the capability for independent evaluation.
“Generative models like ChatGPT can sound convincing but be factually wrong,” said Associate Professor Laranjo. “They often agree with users even when they’re mistaken, which is dangerous for patients and challenging for clinicians.”
Equity and environmental risks of AI
Experts warn that while AI promises faster diagnoses and personalised care, it can also deepen health gaps if bias creeps in. Dermatology tools, for example, often misdiagnose darker skin tones that are typically underrepresented in training datasets.
Conversely, when designed well, the researchers say that AI can address inequities: one arthritis study doubled the number of Black patients eligible for knee replacement by using an algorithm trained on a diverse dataset, making it better able to predict patient-reported knee pain compared to the standard doctor x-ray interpretation.
“Ignoring socioeconomic factors and universal design could turn AI in primary care from a breakthrough into a setback,” said Associate Professor Laranjo.
Environmental costs are also huge. Training GPT-3, the version of ChatGPT released in 2020, emitted amounts of carbon dioxide equivalent to 188 flights between New York and San Francisco. Data centres now consume around 1 percent of global electricity, and in Ireland, data centres account for more than 20 percent of national electricity use.
“AI’s environmental footprint is a challenge,” Associate Professor Laranjo said. “We need sustainable approaches that balance innovation with equity and planetary health.”
The researchers urge governments, clinicians and tech developers to prioritise:
- robust evaluation and real-world monitoring of AI tools
- regulatory frameworks that keep pace with innovation
- education for clinicians and the public to improve AI literacy
- bias mitigation strategies to ensure equity in healthcare
- sustainable practices to reduce AI’s environmental impact.
“AI offers a chance to reimagine primary care, but innovation must not come at the expense of safety or equity,” Associate Professor Laranjo said. “We need partnerships across sectors to make sure AI benefits everyone – not just the tech-savvy or well-resourced.”
-ENDS
Method of Research
Systematic review
Subject of Research
People
Article Title
Artificial intelligence in primary care: innovation at a crossroads
Article Publication Date
16-Dec-2025
Exploring how patients feel about AI transcription
Comprehensive survey helped UC Davis Health understand patient concerns before successfully implementing its AI scribe program.
image:
Gary Leiserowitz, M.D., meets with a patient, using an AI scribe to automatically record and transcribe conversation during a medical visit.
view moreCredit: UC Regents
Electronic medical records (EMRs) have been a tremendous benefit in exam rooms across the country, creating secure patient history databases that clinicians can easily access and update. Yet, they can also detract from the doctor-patient experience, as physicians must type notes into the system rather than devote their complete attention to patients.
To help put physicians back in front of their patients — and away from their keyboards — UC Davis Health has adopted an artificial intelligence (AI) scribe, which automatically records and transcribes conversations during medical visits. These systems preserve detailed medical notes so physicians can focus on their patients.
In preparation for the digital tool’s rollout, UC Davis Health conducted a comprehensive survey to evaluate patient perceptions of the technology. The results, which informed how the scribe was implemented, were recently published in the Journal of Medical Internet Research (JMIR) Medical Informatics.
“We weren’t sure how patients would respond to these AI transcriptions,” said Gary Leiserowitz, Obstetrics and Gynecology chair and lead author on the paper. “There was little information from other institutions, so we worked with our patient experience colleagues to understand how patients might feel about it.”
Survey results
The survey was emailed to more than 9,000 patients and around 1,900 responded. While 73% felt they were being heard during clinical visits, 23% stated their doctors were more focused on notetaking than on them.
“A lot of people feel medical documentation is a necessary evil, but hate it when their doctors are sitting in front of the computer, trying to record everything they’re talking about,” Leiserowitz said. “They feel like that connection is lost.”
In the survey, 48% of respondents reported that an AI scribe would be a good solution, while 33% were neutral and 19% had concerns. Younger patients (18-30 years old) were more skeptical about the technology than older patients.
Patients were mostly worried about note accuracy (39%), privacy and security (13%) and the prospect of being recorded (13%). Many of the associated comments expressed concerns that the recordings could be hacked. Around 10% felt it would be bad for physicians and staff.
A seamless transition
When patients were asked when was the best time during their care experience to be informed that a digital tool would be taking notes, they strongly favored early notifications: They wanted to know while making an appointment, arriving at their doctor’s office or checking in at a clinic. Most (57%) preferred to be notified face-to-face, while many (45%) were OK with email.
Results of the survey provided UC Davis Health with valuable guidance on how to communicate the transition to AI scribe. The team incorporated multiple educational touchpoints to get buy-in, prioritizing face-to-face discussions with patients.
“One of our important takeaways from the survey was that we had to educate patients about what the AI scribe could and could not do,” Leiserowitz said. “Security was a big deal so, when we were vetting vendors, we made sure they only use domestic servers. And while the AI notes go into the EMR, the recording itself disappears within 10 days.”
In addition, to ensure complete accuracy, the clinician checks and edits the notes before they are placed into the EMR. Patients can also review and advise their clinicians on possible corrections. Ultimately, if a patient is not comfortable with the system, they can opt out.
At UC Davis Health, a dedicated analytics oversight committee reviews all advanced analytics models, including those powered by AI, that are used in clinical decision-making. The committee’s goal is to develop a streamlined and innovative approach that ensures health AI is implemented responsibly, ethically and effectively — always with the best interests of patients and the community in mind
“This often comes down to the quality of the relationship between the doctor and the patient,” Leiserowitz said. “If the patient trusts us and understands why we’re using it, they tend to accept it. That’s why education is such a critical factor. It helps patients get comfortable with the technology.”
Journal
JMIR Medical Informatics
Method of Research
Survey
Subject of Research
People
Article Title
Patient Attitudes Toward Ambient Voice Technology: Preimplementation Patient Survey in an Academic Medical Center
No comments:
Post a Comment