How AI can rig polls
Study shows how AI can mimic humans and complete surveys.
Dartmouth College
Public opinion polls and other surveys rely on data to understand human behavior.
New research from Dartmouth reveals that artificial intelligence can now corrupt public opinion surveys at scale—passing every quality check, mimicking real humans, and manipulating results without leaving a trace.
The findings, published in the Proceedings of the National Academy of Sciences, show just how vulnerable polling has become. In the seven major national polls before the 2024 election, adding as few as 10 to 52 fake AI responses—at five cents each—would have flipped the predicted outcome.
Foreign adversaries could easily exploit this weakness: the bots work even when programmed in Russian, Mandarin, or Korean, yet produce flawless English answers.
"We can no longer trust that survey responses are coming from real people," says study author Sean Westwood, associate professor of government at Dartmouth and director of the Polarization Research Lab, who conducted the research.
To examine the vulnerability of online surveys to large language models, Westwood created a simple AI tool ("an autonomous synthetic respondent") that operates from a 500-word prompt. In 43,000 tests, the AI tool passed 99.8% of attention checks designed to detect automated responses, made zero errors on logic puzzles, and successfully concealed its nonhuman nature. The tool tailored responses according to randomly assigned demographics, such as providing simpler answers when assigned less education.
"These aren't crude bots," said Westwood. "They think through each question and act like real, careful people making the data look completely legitimate."
When programmed to favor either Democrats or Republicans, presidential approval ratings swung from 34% to either 98% or 0%. Generic ballot support went from 38% Republican to either 97% or 1%.
The implications reach far beyond election polling. Surveys are fundamental to scientific research across disciplines—in psychology to understand mental health, economics to track consumer spending, and public health to identify disease risk factors. Thousands of peer-reviewed studies published each year rely on survey data to inform research and shape policy.
"With survey data tainted by bots, AI can poison the entire knowledge ecosystem," said Westwood.
The financial incentives to use AI to complete surveys are stark. Human respondents typically earn $1.50 for completing a survey, while AI bots can complete the same task for free or approximately five cents. The problem is already materializing, as a 2024 study found that 34% of respondents had used AI to answer an open-ended survey question.
Westwood tested every AI detection method currently in use and all failed to identify the AI tool. His study argues for transparency from companies that conduct surveys, requiring them to prove their participants are real people.
"We need new approaches to measuring public opinion that are designed for an AI world," says Westwood. "The technology exists to verify real human participation; we just need the will to implement it. If we act now, we can preserve both the integrity of polling and the democratic accountability it provides."
Westwood is available for comment at: Sean.J.Westwood@dartmouth.edu.
###
Journal
Proceedings of the National Academy of Sciences
Subject of Research
Not applicable
Article Title
The Potential Existential Threat of Large Language Models to Online Survey Research
New study reveals high rates of fabricated and inaccurate citations in LLM-generated mental health research
image:
New Study Reveals High Rates of Fabricated and Inaccurate Citations in LLM-Generated Mental Health Research
view moreCredit: JMIR Publications
(Toronto, November 17, 2025) A new study published in the peer-reviewed journal JMIR Mental Health by JMIR Publications highlights a critical risk in the growing use of Large Language Models (LLMs) like GPT-4o by researchers: the frequent fabrication and inaccuracy of bibliographic citations. The findings underscore an urgent need for rigorous human verification and institutional safeguards to protect research integrity, particularly in specialized and less publicly known fields within mental health.
Nearly 1 in 5 Citations Fabricated by GPT-4o in Literature Reviews
The article, titled "Influence of Topic Familiarity and Prompt Specificity on Citation Fabrication in Mental Health Research Using Large Language Models: Experimental Study," found that 19.9% of all citations generated by GPT-4o across six simulated literature reviews were entirely fabricated, meaning they could not be traced to any real publication. Furthermore, among the seemingly real citations, 45.4% contained bibliographic errors, most commonly incorrect or invalid Digital Object Identifiers (DOIs).
This timely research is highly relevant as academic journals have encountered instances of seemingly AI-hallucinated references in recent submissions. These bibliographic hallucinations and errors are not just formatting issues; they break the chain of verifiability, mislead readers, and fundamentally compromise the integrity and trustworthiness of scientific results and the cumulative knowledge base. This makes the need for careful scrutiny and verification paramount to safeguard academic rigor.
Reliability Varies by Topic Familiarity and Specificity
The research, conducted by a team including Jake Linardon, PhD, from Deakin University and his colleagues, systematically tested the reliability of GPT-4o's output across mental health topics with varying levels of public awareness and scientific maturity: major depressive disorder (high familiarity), binge eating disorder (moderate), and body dysmorphic disorder (low). They also tested general versus specialized review prompts (e.g., focusing on digital interventions).
Fabrication Risk is Highest for Less Familiar Topics: Fabrication rates were significantly higher for topics with lower public familiarity and research coverage, such as binge eating disorder (28%) and body dysmorphic disorder (29%), compared to major depressive disorder (6%).
Specialized Topics Pose a Higher Risk: While not universally true, stratified analysis showed that fabrication rates were significantly higher for specialized reviews (e.g., evidence for digital interventions) compared to general overviews for certain disorders, such as binge eating disorder.
Overall Inaccuracy is Pervasive: In total, nearly two-thirds of all citations generated by GPT-4o were either fabricated or contained errors, indicating a major reliability issue.
Urgent Call for Human Oversight and New Safeguards
The study’s conclusions issue a strong warning to the academic community: Citation fabrication and errors remain common in GPT-4o outputs. The authors stress that the reliability of LLM-generated citations is not fixed but is contingent on the topic and the way the prompt is designed.
Key Implications Highlighted in the Study:
Rigorous Verification is Mandatory: Researchers and students must subject all LLM-generated references to careful human verification to validate their accuracy and authenticity.
Journal and Institutional Role: Journal editors and publishers must implement stronger safeguards, potentially using detection software that flags citations that do not match existing sources, signaling a potential hallucination.
Policy and Training: Academic institutions must develop clear policies and training to equip users with the skills to critically assess LLM outputs and to design strategic prompts, especially when exploring less visible or highly specialized research topics.
Original article:
Linardon J, Jarman H, McClure Z, Anderson C, Liu C, Messer M. Influence of Topic Familiarity and Prompt Specificity on Citation Fabrication in Mental Health Research Using Large Language Models: Experimental Study. JMIR Ment Health 2025;12:e80371
URL: https://mental.jmir.org/2025/1/e80371
DOI: 10.2196/80371
About JMIR Publications
JMIR Publications is a leading open access publisher of digital health research and a champion of open science. With a focus on author advocacy and research amplification, JMIR Publications partners with researchers to advance their careers and maximize the impact of their work. As a technology organization with publishing at its core, we provide innovative tools and resources that go beyond traditional publishing, supporting researchers at every step of the dissemination process. Our portfolio features a range of peer-reviewed journals, including the renowned Journal of Medical Internet Research.
To learn more about JMIR Publications, please visit jmirpublications.com or connect with us via X, LinkedIn, YouTube, Facebook, and Instagram.
Head office: 130 Queens Quay East, Unit 1100, Toronto, ON, M5A 0P6 Canada
Media Contact:
Dennis O’Brien, Vice President, Communications & Partnerships
JMIR Publications
communications@jmir.org
The content of this communication is licensed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, published by JMIR Publications, is properly cited.
Journal
JMIR Mental Health
Method of Research
Literature review
Subject of Research
Not applicable
Article Title
Influence of Topic Familiarity and Prompt Specificity on Citation Fabrication in Mental Health Research Using Large Language Models: Experimental Study
Article Publication Date
12-Nov-2025
Researchers unveil first-ever defense against cryptanalytic attacks on AI
North Carolina State University
Security researchers have developed the first functional defense mechanism capable of protecting against “cryptanalytic” attacks used to “steal” the model parameters that define how an AI system works.
“AI systems are valuable intellectual property, and cryptanalytic parameter extraction attacks are the most efficient, effective, and accurate way to ‘steal’ that intellectual property,” says Ashley Kurian, first author of a paper on the work and a Ph.D. student at North Carolina State University. “Until now, there has been no way to defend against those attacks. Our technique effectively protects against these attacks.”
“Cryptanalytic attacks are already happening, and they’re becoming more frequent and more efficient,” says Aydin Aysu, corresponding author of the paper and an associate professor of electrical and computer engineering at NC State. “We need to implement defense mechanisms now, because implementing them after an AI model’s parameters have been extracted is too late.”
At issue are cryptanalytic parameter extraction attacks. Parameters are the essential information used to describe an AI model. Essentially, parameters are how AI systems perform tasks. Cryptanalytic parameter extraction attacks are a purely mathematical way of determining what a given AI model’s parameters are, allowing a third party to recreate the AI system.
“In a cryptanalytic attack, someone submits inputs and looks at outputs,” Aysu says. “They then use a mathematical function to determine what the parameters are. So far, these attacks have only worked against a type of AI model called a neural network. However, many – if not most – commercial AI systems are neural networks, including large language models such as ChatGPT.”
So, how do you defend against a mathematical attack?
The new defense mechanism relies on a key insight the researchers had regarding cryptanalytic parameter extraction attacks. While analyzing these attacks, the researchers identified a core principle that every attack relied on. To understand what they learned, you have to understand the basic architecture of a neural network.
The fundamental building block of a neural network model is called a “neuron.” Neurons are arranged in layers and are used in sequence to assess and respond to input data. Once the data has been processed by the neurons in the first layer, the outputs of that layer are passed to a second layer. This process continues until the data has been processed by the entire system, at which point the system determines how to respond to the input data.
“What we observed is that cryptanalytic attacks focus on differences between neurons,” says Kurian. “And the more different the neurons are, the more effective the attack is. Our defense mechanism relies on training a neural network model in a way that makes neurons in the same layer of the model similar to each other. You can do this only in the first layer, or on multiple layers. And you could do it with all of the neurons in a layer, or only on a subset of neurons.”
“This approach creates a ‘barrier of similarity’ that makes it difficult for attacks to proceed,” says Aysu. “The attack essentially doesn’t have a path forward. However, the model still functions normally in terms of its ability to perform its assigned tasks.”
In proof-of-concept testing, the researchers found that AI models which incorporated the defense mechanism had an accuracy change of less than 1%.
“Sometimes a model that was retrained to incorporate the defense mechanism was slightly more accurate, sometimes slightly less accurate – but the overall change was minimal,” Kurian says.
“We also tested how well the defense mechanism worked,” says Kurian. “We focused on models that had their parameters extracted in less than four hours using cryptanalytic techniques. After retraining to incorporate the defense mechanism, we were unable to extract the parameters with cryptanalytic attacks that lasted for days.”
As part of this work, the researchers also developed a theoretical framework that can be used to quantify the success probability of cryptanalytic attacks.
“This framework is useful, because it allows us to estimate how robust a given AI model is against these attacks without running such attacks for days,” says Aysu. “There is value in knowing how secure your system is – or isn’t.”
“We know this mechanism works, and we’re optimistic that people will use it to protect AI systems from these attacks,” says Kurian. “And we are open to working with industry partners who are interested in implementing the mechanism.”
“We also know that people trying to circumvent security measures will eventually find a way around them – hacking and security are engaged in a constant back and forth,” says Aysu. We’re hopeful that there will be sources of funding moving forward that allow those of us working on new security efforts to keep pace.”
The paper, “Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks,” will be presented at the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS) being held Dec. 2-7 in San Diego, California.
This work was done with support from the National Science Foundation under grant 1943245.
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks
Article Publication Date
2-Dec-2025
No comments:
Post a Comment