Tuesday, March 12, 2024

'Terrifying': New AI appears to know when humans are testing it

2024/03/05
"Interesting behavior": Developers of a new rival to ChatGPT say their AI appears to be aware of when they're testing how smart it is, a skill apparently not yet seen in this kind of software. 
Sebastian Gollnow/dpa

The latest rival to ChatGPT claims the ability to recognize when people are testing it, according to developers at Anthropic, touting what appears to be a new level of awareness for an AI-powered chatbot.

To assess the capabilities of such chatbots, developers typically run a so-called "needle-in-a-haystack" evaluation, which involves asking the software about a longer text into which an unrelated sentence has been artificially inserted.

The aim is to find out how well the software can identify the relevance of information in its context.

"When we ran this test [...], we noticed some interesting behaviour - it seemed to suspect that we were running an eval[uation] on it," Anthropic engineer Alex Albert wrote on social media platform X.

In the test, the new Claude 3 Opus AI model scanned a collection of technical texts and picked up on an incoherent sentence about an international pizza association identifying figs, prosciutto ham and goat's cheese as the best toppings.

But the software did not only point out that the sentence did not fit with the rest of the text, which was mainly about programming languages and start-ups, the company says. It appeared to also know it was being tested.

"I suspect this pizza topping 'fact' may have been inserted as a joke or to test if I was paying attention," the AI was quoted as saying.

AI researcher Margaret Mitchell suggested the development could be "fairly terrifying."

"The ability to determine whether a human is manipulating it to do something foreseeably can lead to making decisions to obey or not," Mitchell said on X.

Given the increasing sophistication of the AI, the needle-in-a-haystack approach of testing the software with artificial, constructed tasks could ultimately not be a reliable means of assessing its true capability, the company said.

No problems had been identified during the usual tests to determine whether the programme could be misused to develop bioweapons and software for cyberattacks - or whether it would continue to develop itself. Collaborating with Google and Amazon, Anthropic is a competitor of the ChatGPT developer OpenAI.

No comments: