Friday, May 30, 2025

  Generative AI’s most prominent skeptic doubles down



By AFP
May 29, 2025


Generative AI critic Gary Marcus, speaks at the Web Summit Vancouver 2025 tech conference in Vancouver Canada - Copyright AFP 

Don MacKinnon

Two and a half years since ChatGPT rocked the world, scientist and writer Gary Marcus still remains generative artificial intelligence’s great skeptic, playing a counter-narrative to Silicon Valley’s AI true believers.

Marcus became a prominent figure of the AI revolution in 2023, when he sat beside OpenAI chief Sam Altman at a Senate hearing in Washington as both men urged politicians to take the technology seriously and consider regulation.

Much has changed since then. Altman has abandoned his calls for caution, instead teaming up with Japan’s SoftBank and funds in the Middle East to propel his company to sky-high valuations as he tries to make ChatGPT the next era-defining tech behemoth.

“Sam’s not getting money anymore from the Silicon Valley establishment,” and his seeking funding from abroad is a sign of “desperation,” Marcus told AFP on the sidelines of the Web Summit in Vancouver, Canada.

Marcus’s criticism centers on a fundamental belief: generative AI, the predictive technology that churns out seemingly human-level content, is simply too flawed to be transformative.

The large language models (LLMs) that power these capabilities are inherently broken, he argues, and will never deliver on Silicon Valley’s grand promises.

“I’m skeptical of AI as it is currently practiced,” he said. “I think AI could have tremendous value, but LLMs are not the way there. And I think the companies running it are not mostly the best people in the world.”

His skepticism stands in stark contrast to the prevailing mood at the Web Summit, where most conversations among 15,000 attendees focused on generative AI’s seemingly infinite promise.

Many believe humanity stands on the cusp of achieving super intelligence or artificial general intelligence (AGI) technology that could match and even surpass human capability.

That optimism has driven OpenAI’s valuation to $300 billion, unprecedented levels for a startup, with billionaire Elon Musk’s xAI racing to keep pace.

Yet for all the hype, the practical gains remain limited.

The technology excels mainly at coding assistance for programmers and text generation for office work. AI-created images, while often entertaining, serve primarily as memes or deepfakes, offering little obvious benefit to society or business.

Marcus, a longtime New York University professor, champions a fundamentally different approach to building AI — one he believes might actually achieve human-level intelligence in ways that current generative AI never will.

“One consequence of going all-in on LLMs is that any alternative approach that might be better gets starved out,” he explained.

This tunnel vision will “cause a delay in getting to AI that can help us beyond just coding — a waste of resources.”

– ‘Right answers matter’ –


Instead, Marcus advocates for neurosymbolic AI, an approach that attempts to rebuild human logic artificially rather than simply training computer models on vast datasets, as is done with ChatGPT and similar products like Google’s Gemini or Anthropic’s Claude.

He dismisses fears that generative AI will eliminate white-collar jobs, citing a simple reality: “There are too many white-collar jobs where getting the right answer actually matters.”

This points to AI’s most persistent problem: hallucinations, the technology’s well-documented tendency to produce confident-sounding mistakes.

Even AI’s strongest advocates acknowledge this flaw may be impossible to eliminate.

Marcus recalls a telling exchange from 2023 with LinkedIn founder Reid Hoffman, a Silicon Valley heavyweight: “He bet me any amount of money that hallucinations would go away in three months. I offered him $100,000 and he wouldn’t take the bet.”

Looking ahead, Marcus warns of a darker consequence once investors realize generative AI’s limitations. Companies like OpenAI will inevitably monetize their most valuable asset: user data.

“The people who put in all this money will want their returns, and I think that’s leading them toward surveillance,” he said, pointing to Orwellian risks for society.

“They have all this private data, so they can sell that as a consolation prize.”

Marcus acknowledges that generative AI will find useful applications in areas where occasional errors don’t matter much.

“They’re very useful for auto-complete on steroids: coding, brainstorming, and stuff like that,” he said.

“But nobody’s going to make much money off it because they’re expensive to run, and everybody has the same product.”


Stevens team teaches AI models to spot misleading scientific reporting



Using AI to flag unscientific claims could empower people to engage more confidently with media reports




Stevens Institute of Technology





Hoboken, N.J., May 28, 2025 — Artificial intelligence isn’t always a reliable source of information: large language models (LLMs) like Llama and ChatGPT can be prone to “hallucinating” and inventing bogus facts. But what if AI could be used to detect mistaken or distorted claims, and help people find their way more confidently through a sea of potential distortions online and elsewhere? 

As presented at a workshop at the annual conference of the Association for the Advancement of Artificial Intelligence, researchers at Stevens Institute of Technology present an AI architecture designed to do just that, using open source LLMs and free versions of commercial LLMs to identify potential misleading narratives in news reports on scientific discoveries.  

“Inaccurate information is a big deal, especially when it comes to scientific content — we hear all the time from doctors who worry about their patients reading things online that aren’t accurate, for instance,” said K.P. Subbalakshmi, the paper’s co-author and a professor in the Department of Electrical and Computer Engineering at Stevens. “We wanted to automate the process of flagging misleading claims and use AI to give people a better understanding of the underlying facts.”

To achieve that, the team of two PhD students and two Masters students led by Subbalakshmi, first created a dataset of 2,400 news reports on scientific breakthroughs. The dataset included both human-generated reports, drawn either from reputable science journals or low-quality sources known to publish fake news, and AI-generated reports of which half were reliable and half contained inaccuracies. Each report was then paired with original research abstracts related to the technical topic, enabling the team to check each report for scientific accuracy. Their work is the first attempt at systematically directing LLMs to detect inaccuracies in science reporting in public media according to Subbalakshmi.

“Creating this dataset is an important contribution in its own right, since most existing datasets typically do not include information that can be used to test systems developed to detect inaccuracies ‘in the wild’” Dr. Subbalakshmi said. “These are difficult topics to investigate, so we hope this will be a useful resource for other researchers.”

Next, the team created three LLM-based architectures to guide an LLM through the process of determining a news report’s accuracy. One of these architectures is a three-step process. First, the AI model summarized each news report and identified the salient features. Next, it conducted sentence-level comparisons between claims made in the summary and evidence contained in the original peer-reviewed research. Finally, the LLM made a determination as to whether the report accurately reflected the original research.

The team also defined “dimensions of validity” and asked the LLM to think about these five “dimensions of validity” — specific mistakes, such as oversimplification or confusing causation and correlation, commonly present in inaccurate news reports. “We found that asking the LLM to use these dimensions of validity made quite a big difference to the overall accuracy,” Dr. Subbalakshmi said and added that these dimensions of validity can be expanded upon, to better capture domain specific inaccuracies, if needed.

Using the new dataset, the team’s LLM pipelines were able to correctly distinguish between reliable and unreliable news reports with about 75% accuracy — but proved markedly better at identifying inaccuracies in human-generated content than in AI-generated reports. The reasons for that aren’t yet clear, although Dr. Subbalakshmi notes that non-expert humans similarly struggle to identify technical errors in AI-generated text. “There’s certainly room for improvement in our architecture,” Dr. Subbalakshmi says. “The next step might be to create custom AI models for specific research topics, so they can ‘think’ more like human scientists.”

In the long run, the team’s research could open the door to browser plugins that automatically flag inaccurate content as people use the Internet, or to rankings of publishers based on how accurately they cover scientific discoveries. Perhaps most importantly, Dr. Subbalakshmi says, the research could also enable the creation of LLM models that describe scientific information more accurately, and that are less prone to confabulating when describing scientific research.  

“Artificial intelligence is here — we can’t put the genie back in the bottle,” Dr. Subbalakshmi said. “But by studying how AI ‘thinks’ about science, we can start to build more reliable tools — and perhaps help humans to spot unscientific claims more easily, too.”

 

About Stevens Institute of Technology
Stevens Institute of Technology is a premier, private research university situated in Hoboken, New Jersey. Since our founding in 1870, technological innovation has been the hallmark of Stevens’ education and research. Within the university’s three schools and one college, more than 8,000 undergraduate and graduate students collaborate closely with faculty in an interdisciplinary, student-centric, entrepreneurial environment. Academic and research programs spanning business, computing, engineering, the arts and other disciplines actively advance the frontiers of science and leverage technology to confront our most pressing global challenges. The university continues to be consistently ranked among the nation’s leaders in career services, post-graduation salaries of alumni and return on tuition investment.

Horses ‘mane’ inspiration for new generation of social robots



University of Bristol
Fig 1 

image: 

Ellen receiving equine-assisted intervention (EAIs) therapy.

view more 

Credit: Ellen Weir




Interactive robots should not just be passive companions, but active partners–like therapy horses who respond to human emotion–say University of Bristol researchers.

Equine-assisted interventions (EAIs) offer a powerful alternative to traditional talking therapies for patients with PTSD, trauma and autism, who struggle to express and regulate emotions through words alone.

The study, presented at the CHI '25: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems held in Yokohama, recommends that therapeutic robots should also exhibit a level of autonomy, rather than one-dimensional displays of friendship and compliance.

Lead author Ellen Weir from Bristol’s Faculty of Science and Engineering explains: “Most social robots today are designed to be obedient and predictable - following commands and prioritising user comfort.

“Our research challenges this assumption.”

In EAIs, individuals communicate with horses through body language and emotional energy. If someone is tense or unregulated, the horse resists their cues. When the individual becomes calm, clear, and confident, the horse responds positively. This ‘living mirror’ effect helps participants recognise and adjust their emotional states, improving both internal well-being and social interactions.

However, EAIs require highly trained horses and facilitators, making them expensive and inaccessible.

Ellen continued: “We found that therapeutic robots should not be passive companions but active co-workers, like EAI horses.

“Just as horses respond only when a person is calm and emotionally regulated, therapeutic robots should resist engagement when users are stressed or unsettled. By requiring emotional regulation before responding, these robots could mirror the therapeutic effect of EAIs, rather than simply providing comfort.”

This approach has the potential to transform robotic therapy, helping users develop self-awareness and regulation skills, just as horses do in EAIs.

Beyond therapy, this concept could influence human-robot interaction in other fields, such as training robots for social skills development, emotional coaching, or even stress management in workplaces.

A key question is whether robots can truly replicate - or at least complement - the emotional depth of human-animal interactions. Future research must explore how robotic behaviour can foster trust, empathy, and fine tuning, ensuring these machines support emotional well-being in a meaningful way.

Ellen added: “The next challenge is designing robots that can interpret human emotions and respond dynamically—just as horses do. This requires advances in emotional sensing, movement dynamics, and machine learning.

“We must also consider the ethical implications of replacing sentient animals with machines. Could a robot ever offer the same therapeutic value as a living horse? And if so, how do we ensure these interactions remain ethical, effective, and emotionally authentic?”

  

Caption

Diagram showing how Equine-Assisted Interventions (EAIs) work

Diagram showing how Equine-Assisted Interventions (EAIs) work.

Credit

Ellen Weir

Paper:

"You Can Fool Me, You Can’t Fool Her!": Autoethnographic Insights from Equine-Assisted Interventions to Inform Therapeutic Robot Design by Ellen Weir, Ute Leonards and Anne Roudaut Metatla presented at CHI '25: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems.

No comments: