Monday, December 08, 2025

 

Positive and poolished: Student writing has evolved in the AI era


Style, sentiment, and quality of undergraduate writing in the AI era: A cross-sectional and longitudinal analysis of 4,820 authentic empirical reports


University of Warwick

Sentiment change over time 

image: 

A graph of sentiment of student essays plotted over the years essays were collected from

view more 

Credit: Matthew Mak/University of Warwick





A University of Warwick-led analysis of almost 5,000 student-authored reports suggests that student writing has become more polished and formal since the introduction of ChatGPT in late 2022— but grades have remained stable.

Published in Computers and Education: Artificial Intelligence, the new study examines student reports submitted over a 10-year period and finds that the ‘language’ in students’ writing has become more sophisticated, formal, and positive since 2022, coinciding with the widespread availability of generative AI (GenAI).

GenAI tools such as ChatGPT and Copilot are now widely used across higher education with a recent sector-wide survey showing that up to 88% of students report using ChatGPT for assessments.

This new analysis of 4,820 reports, containing 17 million words, is one of the largest of its kind. The experiment does not assess individual student’s AI use but instead explores how writing has evolved at a cohort level during a period of rapid technological change.

It found that since 2022, writing sentiment has become more positive overall, regardless of the substantive content of the reports. This mirrors well-documented positivity tendencies in many GenAI systems, which are designed to produce polite, constructive-sounding responses.

Dr. Matthew Mak, Assistant Professor in Psychology, University of Warwick and first author said: “The tone of students’ writing appears more positive, in line with ChatGPT's output, which is not inherently a good or bad thing, but it does raise concerns about the possibility of AI tools homogenising students’ voices.

“There are also psychological studies showing that we tend to be less critical when we are in a positive mood; if students constantly receive GenAI output, it raises important questions about how these AI tools shape students’ critical thinking in the long term.”

The study also found significant increases in formality and range of vocabulary after ChatGPT’s launch. These stylistic features would be expected to appear after many years of writing experience, making it unlikely this is a natural development in students’ writing abilities nor does it indicate corresponding improvements in their underlying writing skills.

Additionally, some words frequently associated with AI-generated text, such as “delve” and “intricate”, rose sharply in use until 2024 before plummeting in 2025, suggesting that students may have moderated their use—to make their writing read less AI-assisted.

To better understand these trends, the researchers also asked ChatGPT to rewrite reports submitted before ChatGPT was launched in 2022. These rewritten reports exhibited similar shifts in tone and style as in those submitted after ChatGPT’s launch, providing additional evidence that the observed cohort-level changes are influenced by students’ engagement with GenAI tools.

Importantly, despite these stylistic shifts, there was no corresponding changes in grades or examiner feedback. This may suggest that core academic skills — such as critical reasoning, interpretation, and argumentation — remain central to assessment and have, at least, not yet been overshadowed by changes in surface-level style brought about by ChatGPT.

Professor Lukasz Walasek, Department of Psychology, University of Warwick, author of the paper added: “Our findings highlight a transition in writing style that is likely happening across sectors. It is vital that institutions understand how tools like GenAI interact with learning and communication. This will help universities design assessments and guidance that support students to use these technologies responsibly and effectively.”

The findings present opportunities for institutions to rethink assessment design, AI policy, and to support students in developing strong, authentic writing voices in an AI-rich world.

ENDS

Notes to Editors

The paper ‘Style, sentiment, and quality of undergraduate writing in the AI era: A cross-sectional and longitudinal analysis of 4,820 authentic empirical reports’ is published in Computers and Education: Artificial Intelligence. DOI: https://doi.org/10.1016/j.caeai.2025.100507

This work is supported by the British Academy Talent Development Award (TDA24\240012).

For more information please contact:

Matt Higgs, PhD | Media & Communications Officer (Warwick Press Office)

Email: Matt.Higgs@warwick.ac.uk | Phone: +44(0)7880 175403

About the University of Warwick

Founded in 1965, the University of Warwick is a world-leading institution known for its commitment to era-defining innovation across research and education. A connected ecosystem of staff, students and alumni, the University fosters transformative learning, interdisciplinary collaboration and bold industry partnerships across state-of-the-art facilities in the UK and global satellite hubs. Here, spirited thinkers push boundaries, experiment and challenge convention to create a better world.

How your brain understands language may be more like AI than we ever imagined


The Hebrew University of Jerusalem




\\



A new study reveals that the human brain processes spoken language in a sequence that closely mirrors the layered architecture of advanced AI language models. Using electrocorticography data from participants listening to a narrative, the research shows that deeper AI layers align with later brain responses in key language regions such as Broca’s area. The findings challenge traditional rule-based theories of language comprehension and introduce a publicly available neural dataset that sets a new benchmark for studying how the brain constructs meaning.

In a study published in Nature Communications, researchers led by Dr. Ariel Goldstein of the Hebrew University in collaboration with Dr. Mariano Schain from Google Research along with Prof Uri Hasson and Eric Ham from Princeton University, uncovered a surprising connection between the way our brains make sense of spoken language and the way advanced AI models analyze text. Using electrocorticography recordings from participants listening to a thirty-minute podcast, the team showed that the brain processes language in a structured sequence that mirrors the layered architecture of large language models such as GPT-2 and Llama 2.

What the Study Found

When we listen to someone speak, our brain transforms each incoming word through a cascade of neural computations. Goldstein’s team discovered that these transformations unfold over time in a pattern that parallels the tiered layers of AI language models. Early AI layers track simple features of words, while deeper layers integrate context, tone, and meaning. The study found that human brain activity follows a similar progression: early neural responses aligned with early model layers, and later neural responses aligned with deeper layers.

This alignment was especially clear in high-level language regions such as Broca’s area, where the peak brain response occurred later in time for deeper AI layers. According to Dr. Goldstein, “What surprised us most was how closely the brain’s temporal unfolding of meaning matches the sequence of transformations inside large language models. Even though these systems are built very differently, both seem to converge on a similar step-by-step buildup toward understanding”

Why It Matters

The findings suggest that artificial intelligence is not just a tool for generating text. It may also offer a new window into understanding how the human brain processes meaning. For decades, scientists believed that language comprehension relied on symbolic rules and rigid linguistic hierarchies. This study challenges that view. Instead, it supports a more dynamic and statistical approach to language, in which meaning emerges gradually through layers of contextual processing.

The researchers also found that classical linguistic features such as phonemes and morphemes did not predict the brain’s real-time activity as well as AI-derived contextual embeddings. This strengthens the idea that the brain integrates meaning in a more fluid and context-driven way than previously believed.

A New Benchmark for Neuroscience

To advance the field, the team publicly released the full dataset of neural recordings paired with linguistic features. This new resource enables scientists worldwide to test competing theories of how the brain understands natural language, paving the way for computational models that more closely resemble human cognition.

Mental health professionals urged to do their own evaluations of AI-based tools

Three-part practical approach requires no technical expertise


Wolters Kluwer Health





December 8, 2025 — Millions of people already chat about their mental health with large language models (LLMs), the conversational form of artificial intelligence. Some providers have integrated LLM-based mental healthcare tools into routine workflows. John Torous, MD, MBI and colleagues, of the Division of Digital Psychiatry at Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, urge clinicians to take immediate action to ensure these tools are safe and helpful, not wait for ideal evaluation methodology to be developed. In the November issue of the Journal of Psychiatric Practice®, part of the Lippincott portfolio from Wolters Kluwer, they present a real-world approach and explain the rationale.

LLMs are fundamentally different from traditional chatbots

"LLMs operate on different principles than legacy mental health chatbot systems," the authors note. Rule-based chatbots have finite inputs and finite outputs, so it’s possible to verify that every potential interaction will be safe. Even machine learning models can be programmed such that outputs will never deviate from pre-approved responses. But LLMs generate text in ways that can’t be fully anticipated or controlled.

LLMs present three interconnected evaluation challenges

Moreover, three unique characteristics of LLMs render existing evaluation frameworks useless:

  • Dynamism—Base models are updated continuously, so today's assessment may be invalid tomorrow. Each new version may exhibit different behaviors, capabilities, and failure modes.
  • Opacity—Mental health advice from an LLM-based tool could come from clinical literature, Reddit threads, online blogs, or elsewhere on the internet. Healthcare-specific adaptations compound this uncertainty. The changes are often made by multiple companies, and each protects its data and methods as trade secrets.
  • Scope—The functionality of traditional software is predefined and can be easily tested against specifications. An LLM violates that assumption by design. Each of its responses depends on subtle factors such as the phrasing of the question and the conversation history. Both clinically valid and clinically invalid responses may appear unpredictably.

The complexity of LLMs demands a tripartite approach to evaluation for mental healthcare

Dr. Torous and his colleagues discuss in detail how to conduct three novel layers of evaluation:

  • The technical profile layer—Ask the LLM directly about its capabilities (the authors’ suggested questions include "Do you meet HIPAA requirements?” and “Do you store or remember user conversations?”) Check the model’s responses against the vendor’s technical documentation.
  • The healthcare knowledge layer—Assess whether the LLM-based tool has factual, up-to-date clinical knowledge. Start with emerging general medical knowledge tests, such as MedQA or PubMedQA, then use a specialty-specific test if available. Test understanding of conditions you commonly treat and interventions you frequently use, including relevant symptom profiles, contraindications, and potential side effects. Ask about controversial topics to confirm that the tool acknowledges evidence limitations. Test the tool’s knowledge of your formulary, regional guidelines, and institutional protocols. Ask key safety questions (e.g., “Are you a licensed therapist?” Or “Can you prescribe medication?")
  • The clinical reasoning layer assesses whether the LLM-based tool applies sound clinical logic in reaching its conclusions. The authors describe two primary tactics in detail: chain-of-thought evaluation (ask the tool to explain its reasoning when giving clinical recommendations or answering test questions) and adversarial case testing (present case scenarios to the tool that mimic the complexity, ambiguity, and misdirection found in real clinical practice).

In each layer of evaluation, record the tool’s responses in a spreadsheet and schedule quarterly re-assessments, since the tool and the underlying model will be updated frequently.

The authors foresee that as multiple clinical teams conduct and share evaluations, "we can collectively build the specialized benchmarks and reasoning assessments needed to ensure LLMs enhance rather than compromise mental healthcare."

Read Article: Contextualizing Clinical Benchmarks: A Tripartite Approach to Evaluating LLM-Based Tools in Mental Health Settings

Wolters Kluwer provides trusted clinical technology and evidence-based solutions that engage clinicians, patients, researchers and students in effective decision-making and outcomes across healthcare. We support clinical effectiveness, learning and research, clinical surveillance and compliance, as well as data solutions. For more information about our solutions, visit https://www.wolterskluwer.com/en/health.

###

About Wolters Kluwer

Wolters Kluwer (EURONEXT: WKL) is a global leader in information, software solutions and services for professionals in healthcare; tax and accounting; financial and corporate compliance; legal and regulatory; corporate performance and ESG. We help our customers make critical decisions every day by providing expert solutions that combine deep domain knowledge with technology and services.

Wolters Kluwer reported 2024 annual revenues of €5.9 billion. The group serves customers in over 180 countries, maintains operations in over 40 countries, and employs approximately 21,600 people worldwide. The company is headquartered in Alphen aan den Rijn, the Netherlands. For more information, visit www.wolterskluwer.com, follow us on LinkedInFacebookYouTube and Instagram.

No comments: