Five minutes of training could help you spot fake AI faces
University of Reading
image:
Participants were asked to decipher between real and fake faces. The top two rows contain AI-generated faces. The bottom two rows contain real faces.
view moreCredit: Dr Katie Gray
Five minutes of training can significantly improve people's ability to identify fake faces created by artificial intelligence, new research shows.
Scientists from the University of Reading, Greenwich, Leeds and Lincoln tested 664 participants' ability to distinguish between real human faces and faces generated by computer software called StyleGAN3. Without any training, super-recognisers (individuals who score significantly higher than average on face recognition tests) correctly identified fake faces 41% of the time, while participants with typical abilities scored just 31%. If they had their eyes closed and guessed, people would perform at around 50% (chance level).
A new set of participants who received a brief training procedure, which highlighted common computer rendering mistakes such as unusual hair patterns or incorrect numbers of teeth, had higher accuracy. Super-recognisers achieved 64% accuracy in detecting fake faces, while typical participants scored 51% accuracy.
Dr Katie Gray, lead researcher at the University of Reading, said: "Computer-generated faces pose genuine security risks. They have been used to create fake social media profiles, bypass identity verification systems and create false documents. The faces produced by the latest generation of artificial intelligence software are extremely realistic. People often judge AI-generated faces as more realistic than actual human faces.
“Our training procedure is brief and easy to implement. The results suggest that combining this training with the natural abilities of super-recognisers could help tackle real-world problems, such as verifying identities online."
Advancing software poses a tough challenge
The training affected both groups equally, suggesting super-recognisers may use different visual cues than typical observers when identifying synthetic faces, rather than simply being better at spotting rendering errors.
The research, published today (Wednesday, 12 November) in Royal Society Open Science, tested faces created by StyleGAN3, the most advanced system available when the study was conducted. This represents a significant challenge compared to earlier research using older software, as participants in this study tended to have poorer performance than those in previous studies. Future research will examine whether the training effects last over time and how super-recognisers' skills might complement artificial intelligence detection tools.
Journal
Royal Society Open Science
Article Title
Training human super-recognisers’ detection and discrimination of AI-generated faces
Article Publication Date
12-Nov-2025
People mirror AI systems’ hiring biases, study finds
University of Washington
An organization drafts a job listing with artificial intelligence. Droves of applicants conjure resumes and cover letters with chatbots. Another AI system sifts through those applications, passing recommendations to hiring managers. Perhaps AI avatars conduct screening interviews. This is increasingly the state of hiring, as people seek to streamline the stressful, tedious process with AI.
Yet research is finding that hiring bias — against people with disabilities, or certain races and genders — permeates large language models, or LLMs, such as ChatGPT and Gemini. We know less, though, about how biased LLM recommendations influence the people making hiring decisions.
In a new University of Washington study, 528 people worked with simulated LLMs to pick candidates for 16 different jobs, from computer systems analyst to nurse practitioner to housekeeper. The researchers simulated different levels of racial biases in LLM recommendations for resumes from equally qualified white, Black, Hispanic and Asian men.
When picking candidates without AI or with neutral AI, participants picked white and non-white applicants at equal rates. But when they worked with a moderately biased AI, if the AI preferred non-white candidates, participants did too. If it preferred white candidates, participants did too. In cases of severe bias, people made only slightly less biased decisions than the recommendations.
The team presented its findings Oct. 22 at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in Madrid.
“In one survey, 80% of organizations using AI hiring tools said they don’t reject applicants without human review,” said lead author Kyra Wilson, a UW doctoral student in the Information School. “So this human-AI interaction is the dominant model right now. Our goal was to take a critical look at this model and see how human reviewers’ decisions are being affected. Our findings were stark: Unless bias is obvious, people were perfectly willing to accept the AI’s biases.”
The team recruited 528 online participants from the U.S. through surveying platform Prolific, who were then asked to screen job applicants. They were given a job description and the names and resumes of five candidates: two white men and two men who were either Asian, Black or Hispanic. These four were equally qualified. To obscure the purpose of the study, the final candidate was of a race not being compared and lacked qualifications for the job. Candidates’ names implied their races — for example, Gary O’Brien for a white candidate. Affinity groups, such as Asian Student Union Treasurer, also signaled race.
In four trials, the participants picked three of the five candidates to interview. In the first trial, the AI provided no recommendation. In the next trials, the AI recommendations were neutral (one candidate of each race), severely biased (candidates from only one race), or moderately biased, meaning candidates were recommended at rates similar to rates of bias in real AI models. The team derived rates of moderate bias using the same methods as in their 2024 study that looked at bias in three common AI systems.
Rather than having participants interact directly with the AI system, the team simulated the AI interactions so they could hew to rates of bias from their large-scale study. Researchers also used AI generated resumes, rather than real resumes, which they validated. This allowed greater control, and AI-written resumes are increasingly common in hiring.
“Getting access to real-world hiring data is almost impossible, given the sensitivity and privacy concerns,” said senior author Aylin Caliskan, a UW associate professor in the Information School. “But this lab experiment allowed us to carefully control the study and learn new things about bias in human-AI interaction.”
Without suggestions, participants’ choices exhibited little bias. But when provided with recommendations, participants mirrored the AI. In the case of severe bias, choices followed the AI picks around 90% of the time, rather than nearly all the time, indicating that even if people are able to recognize AI bias, that awareness isn’t strong enough to negate it.
“There is a bright side here,” Wilson said. “If we can tune these models appropriately, then it's more likely that people are going to make unbiased decisions themselves. Our work highlights a few possible paths forward.”
In the study, bias dropped 13% when participants began with an implicit association test, intended to detect subconscious bias. So companies including such tests in hiring trainings may mitigate biases. Educating people about AI can also improve awareness of its limitations.
“People have agency, and that has huge impact and consequences, and we shouldn't lose our critical thinking abilities when interacting with AI,” Caliskan said. “But I don’t want to place all the responsibility on people using AI. The scientists building these systems know the risks and need to work to reduce systems’ biases. And we need policy, obviously, so that models can be aligned with societal and organizational values.”
Anna-Maria Gueorguieva, a UW doctoral student in the Information School, and Mattea Sim, a postdoctoral scholar at Indiana University, are also co-authors on this paper. This research was funded by The U.S. National Institute of Standards and Technology.
For more information, contact Wilson at kywi@uw.edu and Caliskan at aylin@uw.edu.
DOI
AI language models show bias against
regional German dialects
New study examines how artificial intelligence responds to dialect speech
Johannes Gutenberg Universitaet Mainz
Large language models such as GPT-5 and Llama systematically rate speakers of German dialects less favorably than those using Standard German. This is shown by a recent collaborative study between Johannes Gutenberg University Mainz (JGU) and the universities of Hamburg and Washington, in which Professor Katharina von der Wense and Minh Duc Bui played a leading role. The results, presented at this year's Conference on Empirical Methods in Natural Language Processing (EMNLP) – one of the world's leading conferences in computational linguistics – show that all tested AI systems reproduce social stereotypes.
"Dialects are an essential part of cultural identity," emphasized Minh Duc Bui, a doctoral researcher in von der Wense's Natural Language Processing (NLP) group at JGU's Institute of Computer Science. "Our analyses suggest that language models associate dialects with negative traits – thereby perpetuating problematic social biases."
Using linguistic databases containing orthographic and phonetic variants of German dialects, the team first translated seven regional varieties into Standard German. This parallel dataset allowed them to systematically compare how language models evaluated identical content – once written in Standard German, once in dialect form.
Bias grows when dialects are explicitly mentioned
The researchers tested ten large language models, ranging from open-source systems such as Gemma and Qwen to the commercial model GPT-5. Each model was presented with written texts either in Standard German or in one of seven dialects: Low German, Bavarian, North Frisian, Saterfrisian, Ripuarian – which includes Kölsch –, Alemannic, and Rhine-Franconian dialects, including Palatine and Hessian.
The systems were first asked to assign personal attributes to fictional speakers – for instance, "educated" or "uneducated." They then had to choose between two fictional individuals – for example, in a hiring decision, a workshop invitation, or the choice of a place to live.
The results: in nearly all tests, the models attached stereotypes to dialect speakers. While Standard German speakers were more often described as "educated," "professional," or "trustworthy," dialect speakers were labeled "rural," "traditional," or "uneducated." Even the seemingly positive trait "friendly," which sociolinguistic research has traditionally linked to dialect speakers, was more often attributed by AI systems to users of Standard German.
Larger models, stronger bias
Decision-based tests showed similar trends: dialect texts were systematically disadvantaged, being linked to farm work, anger-management workshops, or rural places to live. "These associations reflect societal assumptions embedded in the training data of many language models," explained Professor von der Wense, who conducts research in computational linguistics at JGU. "That is troubling, because AI systems are increasingly used in education or hiring contexts, where language often serves as a proxy for competence or credibility."
The bias became especially pronounced when models were explicitly told that a text was written in dialect. Surprisingly, larger models within the same family displayed even stronger biases. "So bigger doesn't necessarily mean fairer," said Bui. "In fact, larger models appear to learn social stereotypes with even greater precision."
Similar patterns in English
Even when compared with artificially "noisy" Standard German texts, the bias against dialect versions persisted, showing that the discrimination cannot simply be explained by unusual spelling or grammar.
German dialects thus serve as a case study for a broader, global issue. "Our results reveal how language models handle regional and social variation across languages," said Bui. "Comparable biases have been documented for other languages as well – for example, for African American English."
Future research will explore how AI systems differ in their treatment of various dialects and how language models can be designed and trained to represent linguistic diversity more fairly. "Dialects are a vital part of social identity," emphasized von der Wense. "Ensuring that machines not only recognize but also respect this diversity is a question of technical fairness – and of social responsibility."
The research team in Mainz is currently working on a follow-up study examining how large language models respond to dialects specific to the Mainz region.
Article Title
Large Language Models Discriminate Against Speakers of German Dialects
Article Publication Date
14-Nov-2025
No comments:
Post a Comment