Artificial intelligence could soon allow powerful companies to charge each customer a different price for the same product, based on what they think each individual is willing to pay.
That is the warning from new research co-authored by competition law academic Dr Miroslava Marinova at the University of East London, which argues that the real risk is not simply higher prices, but hidden, personalised pricing that consumers cannot see or understand.
Traditionally, firms set prices in response to market conditions, such as demand, costs, or competition, meaning that all consumers are offered broadly the same price at a given moment.
A different model is now emerging. Algorithmic personalised pricing refers to the use of data-driven systems to adjust prices at the level of the individual consumer. The objective is not simply to respond to market demand, but to predict how likely a particular consumer is to accept a higher price rather than search elsewhere.
From standard pricing to personal pricing
AI systems can analyse data such as browsing history, location and purchase history to predict willingness to pay. That means the same product could be offered at different prices to different people at the same time.
This is not entirely new, but AI makes it far more precise and scalable, pushing markets closer to a world where everyone is quoted their own individual price.
The real issue is fairness
The study, co-authored with Dr Christian Bergqvist of the University of Copenhagen, finds that even if overall prices do not rise, consumers react strongly when they discover they are paying more than others without a clear reason. That sense of unfairness can reduce trust and affect behaviour.
Law Lecturer Dr Marinova said, “The concern is not just higher prices, but that people may be treated differently without knowing it. When pricing becomes invisible and personalised, fairness becomes a central issue.”
In competitive markets, consumers can switch to cheaper alternatives. But where a dominant firm is involved, the paper argues this kind of pricing could amount to an abuse of dominant position under EU and UK competition law, because it lacks transparency and justification.
A gap between technology and regulation
The paper argues that the law has the tools to respond but has not yet fully caught up with AI-driven pricing. As these systems become more widespread, regulators will face increasing pressure to decide whether personalised pricing crosses the line.
Dr Marinova, from the Royal Docks School of Business and Law, added: “The next step is for regulators to move from theory to action. As AI pricing becomes more sophisticated, the question is no longer whether this can happen, but how far we are willing to allow it to shape everyday markets before clear rules are put in place.”
Impact in the UK
Although the research focuses on EU law, the implications are the same in the UK. The legal framework on abuse of dominance is closely aligned. The UK Government has recently signalled that it is considering whether the Competition and Markets Authority should receive stronger powers to investigate algorithms across both its competition and consumer protection functions.
The research is published as Marinova, M. and Bergqvist, C. (2026), AI-enabled price discrimination as an exploitative abuse of dominance under EU competition law, in the Journal of Competition Law & Economics.
Journal
Journal of Competition Law & Economics
Method of Research
Content analysis
Article Title
AI-enabled price discrimination as an exploitative abuse of dominance under EU competition law
- Mass General Brigham research shows that publicly available AI chatbots are getting better at diagnostic accuracy when presented with comprehensive clinical information, but still underperform at differential diagnoses when information is lacking
- Researchers developed new measure called PrIME-LLM for benchmarking the clinical competence of different AI models
- Study reinforces necessity of “human in the loop” physician involvement for medical decision-making
Despite increasing use of artificial intelligence (AI) in health care, a new study led by Mass General Brigham researchers from the MESH Incubator shows that generative AI models continue to fall short at their clinical reasoning capabilities.
By asking 21 different large language models (LLMs) to play doctor in a series of clinical scenarios, the researchers showed that LLMs often fail often fail at navigating diagnostic workups and coming up with a testable list of potential or “differential” diagnoses. Though all tested LLMs arrived at a correct final diagnosis more than 90% of the time when provided with all pertinent information in a patient case, they consistently performed poorly at the earlier, reasoning-driven steps of the diagnostic process, according to the results published in JAMA Network Open.
“Despite continued improvements, off-the-shelf large language models are not ready for unsupervised clinical-grade deployment,” said corresponding author Marc Succi, MD, executive director of the MESH Incubator at Mass General Brigham. “Differential diagnoses are central to clinical reasoning and underlie the ‘art of medicine’ that AI cannot currently replicate. The promise of AI in clinical medicine continues to lie in its potential to augment, not replace, physician reasoning, provided all the relevant data is available – not always the case”
This new research is a follow-up to previous work led by Succi’s MESH group in which researchers evaluated ChatGPT 3.5 ability to accurately in diagnose a series of a clinical vignettes.
In the new study, the researchers developed a novel and more holistic measure of LLMs that looked beyond accuracy, called PrIME-LLM, which evaluates a model’s competency across different stages of clinical reasoning—coming up with potential diagnoses, conducting appropriate tests, arriving at a final diagnosis, and managing treatment. When models perform well in one area but poorly in another, this imbalance is reflected in the PrIME-LLM score, as opposed to averaging competency across tasks, which may mask areas of weakness, according to the researchers.
The study compared 21 general-purpose LLMs, including the latest models of ChatGPT, DeepSeek, Claude, Gemini, and Grok at the time of submission. The researchers tested the models’ ability to work through 29 published clinical cases. To simulate the way that clinical cases unfold, the researchers gradually fed the models information, beginning with basics like a patient’s age, gender and symptoms before adding physical examination findings and laboratory results. The LLMs’ performance at each stage was assessed by medical student evaluators, and these evaluations were used to calculate the models’ overall PrIME-LLM scores.
In line with their previous study, the researchers found that the LLMs were good at producing accurate final diagnoses. However, all of the models failed to produce an appropriate differential diagnosis more than 80% of the time. In the real world, a differential diagnosis is critical, but in this study, the models were given more information so that they could proceed to the next stage of the clinical workup even if they failed at the differential diagnosis step.
“By evaluating LLMs in a stepwise fashion, we move past treating them like test-takers and put them in the position of a doctor,” said Arya Rao, lead author, MESH researcher, and MD-PhD student at Harvard Medical School. “These models are great at naming a final diagnosis once the data is complete, but they struggle at the open-ended start of a case, when there isn't much information.”
Most of the LLMs showed improved accuracy when provided with laboratory results and imaging in addition to text. More recently released models generally outperformed older models, showing that LLMs are improving incrementally. The models’ PrIME-LLM scores ranged from 64% for Gemini 1.5 Flash to 78% for Grok 4 and GPT-5.
According to Succi, PrIME-LLM represents a standardized way to evaluate AI’s clinical competency that could be used by AI developers and hospital leaders to benchmark new technologies as they are released.
“We want to help separate the hype from the reality of these tools as they apply to health care,” he said. “Our results reinforce that large language models in healthcare continue to require a ‘human in the loop’ and very close oversight.”
Authorship: In addition to Succi, Mass General Brigham authors include Arya S. Rao, Kaiz P. Esmail, Richard S. Lee, Sharon Jiang, Bianca Arraiza Carlo, Jasleen Gill, Praneet Khanna, Ezra Kalmowitz, Basile Montagnese, Kimia Heydari, Qiao Jiao, Ethan Bott, Dan Nguyen, Grace Wang, Michael Hood, Adam B. Landman.
Disclosures: Landman is a consultant on the Abbott Medical Device Cybersecurity Council (unrelated to the current work).
Funding: Rao is supported in part by award Number T32GM144273 from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health. The funding organization was not involved in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Paper cited: Rao, A et al. “Large Language Model Performance and Clinical Reasoning Tasks” JAMA Network Open DOI: 10.1001/jamanetworkopen.2026.4003
###
About Mass General Brigham
Mass General Brigham is an integrated academic health care system, uniting great minds to solve the hardest problems in medicine for our communities and the world. Mass General Brigham connects a full continuum of care across a system of academic medical centers, community and specialty hospitals, a health insurance plan, physician networks, community health centers, home care, and long-term care services. Mass General Brigham is a nonprofit organization committed to patient care, research, teaching, and service to the community. In addition, Mass General Brigham is one of the nation’s leading biomedical research organizations with several Harvard Medical School teaching hospitals. For more information, please visit massgeneralbrigham.org.
Method of Research
Computational simulation/modeling
Subject of Research
People
Article Title
Large Language Model Performance and Clinical Reasoning Tasks
Article Publication Date
13-Apr-2026
COI Statement
Landman is a consultant on the Abbott Medical Device Cybersecurity Council (unrelated to the current work).