Sunday, December 29, 2024


'Godfather of AI' demands strict regulations to stop technology from wiping out humanity

Image by Metamorworks, Shutterstock
Artificial Intelligence has a gender bias problem

December 29, 2024

Warning that the pace of development of artificial intelligence is "much faster" than he anticipated and is taking place in the absence of far-reaching regulations, the computer scientist often called the "Godfather of AI" on Friday said he believes chances are growing that AI could wipe out humanity.

Speaking to BBC Radio 4's "Today" program, Geoffrey Hinton said there is a "10% to 20%" chance AI could lead to human extinction in the next three decades.

Previously Hinton had said he saw a 10% chance of that happening.

"We've never had to deal with things more intelligent than ourselves before," Hinton explained. "And how many examples do you know of a more intelligent thing being controlled by a less intelligent thing? There are very few examples. There's a mother and baby. Evolution put a lot of work into allowing the baby to control the mother, but that's about the only example I know of."

Hinton, who was awarded the Nobel Prize in physics this year for his research into machine learning and AI, left his job at Google last year, saying he wanted to be able to speak out more about the dangers of unregulated AI.

"Just leaving it to the profit motive of large companies is not going to be sufficient to make sure they develop it safely."

He has warned that AI chatbots could be used by authoritarian leaders to manipulate the public, and said last year that "the kind of intelligence we're developing is very different from the intelligence we have."

On Friday, Hinton said he is particularly worried that "the invisible hand" of the market will not keep humans safe from a technology that surpasses their intelligence, and called for strict regulations of AI.

"Just leaving it to the profit motive of large companies is not going to be sufficient to make sure they develop it safely," said Hinton.

More than 120 bills have been proposed in the U.S. Congress to regulate AI robocalls, the technology's role in national security, and other issues, while the Biden administration has taken some action to rein in AI development.

An executive order calling for "Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence" said that "harnessing AI for good and realizing its myriad benefits requires mitigating its substantial risks." President-elect Donald Trump is expected to rescind the order.

The White House Blueprint for an AI Bill of Rights calls for safe and effective systems, algorithmic discrimination protections, data privacy, notice and explanation when AI is used, and the ability to opt out of automated systems.

But the European Union's Artificial Intelligence Act was a deemed a "failure" by rights advocates this year, after industry lobbying helped ensure the law included numerous loopholes and exemptions for law enforcement and migration authorities.

"The only thing that can force those big companies to do more research on safety," said Hinton on Friday, "is government regulation."

AI has a stupid secret


Photo by Sandy Millar on Unsplash


December 27, 2024


Two of San Francisco’s leading players in artificial intelligence have challenged the public to come up with questions capable of testing the capabilities of large language models (LLMs) like Google Gemini and OpenAI’s o1. Scale AI, which specialises in preparing the vast tracts of data on which the LLMs are trained, teamed up with the Center for AI Safety (CAIS) to launch the initiative, Humanity’s Last Exam.

Featuring prizes of US$5,000 (£3,800) for those who come up with the top 50 questions selected for the test, Scale and CAIS say the goal is to test how close we are to achieving “expert-level AI systems” using the “largest, broadest coalition of experts in history”.

Why do this? The leading LLMs are already acing many established tests in intelligence, mathematics and law, but it’s hard to be sure how meaningful this is. In many cases, they may have pre-learned the answers due to the gargantuan quantities of data on which they are trained, including a significant percentage of everything on the internet.

Data is fundamental to this whole area. It is behind the paradigm shift from conventional computing to AI, from “telling” to “showing” these machines what to do. This requires good training datasets, but also good tests. Developers typically do this using data that hasn’t already been used for training, known in the jargon as “test datasets”.

If LLMs are not already able to pre-learn the answer to established tests like bar exams, they probably will soon. The AI analytics site Epoch estimates that 2028 will mark the point at which the AIs will effectively have read everything ever written by humans. An equally important challenge is how to keep assessing AIs once that rubicon has been crossed.

Of course, the internet is expanding all the time, with millions of new items being added daily. Could that take care of these problems?

Perhaps, but this bleeds into another insidious difficulty, referred to as “model collapse”. As the internet becomes increasingly flooded by AI-generated material which recirculates into future AI training sets, this may cause AIs to perform increasingly poorly. To overcome this problem, many developers are already collecting data from their AIs’ human interactions, adding fresh data for training and testing.

Some specialists argue that AIs also need to become “embodied”: moving around in the real world and acquiring their own experiences, as humans do. This might sound far-fetched until you realise that Tesla has been doing it for years with its cars. Another opportunity is human wearables, such as Meta’s popular smart glasses by Ray-Ban. These are equipped with cameras and microphones, and can be used to collect vast quantities of human-centric video and audio data.
Narrow tests

Yet even if such products guarantee enough training data in future, there is still the conundrum of how to define and measure intelligence – particularly artificial general intelligence (AGI), meaning an AI that equals or surpasses human intelligence.

Traditional human IQ tests have long been controversial for failing to capture the multifaceted nature of intelligence, encompassing everything from language to mathematics to empathy to sense of direction.

There’s an analagous problem with the tests used on AIs. There are many well established tests covering such tasks as summarising text, understanding it, drawing correct inferences from information, recognising human poses and gestures, and machine vision.

Some tests are being retired, usually because the AIs are doing so well at them, but they’re so task-specific as to be very narrow measures of intelligence. For instance, the chess-playing AI Stockfish is way ahead of Magnus Carlsen, the highest scoring human player of all time, on the Elo rating system. Yet Stockfish is incapable of doing other tasks such as understanding language. Clearly it would be wrong to conflate its chess capabilities with broader intelligence.

But with AIs now demonstrating broader intelligent behaviour, the challenge is to devise new benchmarks for comparing and measuring their progress. One notable approach has come from French Google engineer François Chollet. He argues that true intelligence lies in the ability to adapt and generalise learning to new, unseen situations. In 2019, he came up with the “abstraction and reasoning corpus” (ARC), a collection of puzzles in the form of simple visual grids designed to test an AI’s ability to infer and apply abstract rules.

Unlike previous benchmarks that test visual object recognition by training an AI on millions of images, each with information about the objects contained, ARC gives it minimal examples in advance. The AI has to figure out the puzzle logic and can’t just learn all the possible answers.

Though the ARC tests aren’t particularly difficult for humans to solve, there’s a prize of US$600,000 to the first AI system to reach a score of 85%. At the time of writing, we’re a long way from that point. Two recent leading LLMs, OpenAI’s o1 preview and Anthropic’s Sonnet 3.5, both score 21% on the ARC public leaderboard (known as the ARC-AGI-Pub).

Another recent attempt using OpenAI’s GPT-4o scored 50%, but somewhat controversially because the approach generated thousands of possible solutions before choosing the one that gave the best answer for the test. Even then, this was still reassuringly far from triggering the prize – or matching human performances of over 90%.

While ARC remains one of the most credible attempts to test for genuine intelligence in AI today, the Scale/CAIS initiative shows that the search continues for compelling alternatives. (Fascinatingly, we may never see some of the prize-winning questions. They won’t be published on the internet, to ensure the AIs don’t get a peek at the exam papers.)

We need to know when machines are getting close to human-level reasoning, with all the safety, ethical and moral questions this raises. At that point, we’ll presumably be left with an even harder exam question: how to test for a superintelligence. That’s an even more mind-bending task that we need to figure out.

Andrew Rogoyski, Innovation Director - Surrey Institute of People-Centred AI, University of Surrey

This article is republished from The Conversation under a Creative Commons license. Read the original article.


No comments: