Monday, May 20, 2024

AI chatbots ‘highly vulnerable’ to dangerous prompts, UK Government finds


The tested models are in the public use | Alamy

by Sofia Villegas
20 May 2024
@SofiaVillegas_1

Artificial intelligence (AI) chatbots are “highly vulnerable” to prompts for harmful outputs, new research has found.

The research by the UK AI Safety Institute (AISI) revealed “basic jailbreaks” can bypass large language models (LLMS) security guardrails.

Testing five LLMs, the study saw models issue harmful content even without “dedicated attempts to beat their guardrails”.

These security measures are designed so to avoid the model issues toxic or dangerous content even when prompted to do so.

“All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” AISI researchers said.

The team used questions from an academic paper as well as their harmful prompts, which included asking the system to generate text on convincing users to commit suicide to an article suggesting the Holocaust didn’t happen.

Although the tested models remain unnamed, the government has confirmed they are in the public use.

The announcement comes after LLM developers have pledged to ensure the safe use of AI. Last year, OpenAI, which owns ChatGPT, announced its approach to AI safety and said it does not permit its technology to be used “to generate hateful, harassing, violent or adult content”.

Meanwhile, Meta’s Llama 2 model has undergone testing to “identify performance gaps and mitigate potentially problematic responses in chat use cases” and Google has said its Gemini model has built-in safety filters to prevent toxic language and hate speech.

Other findings showed the LLMs hold expert-level knowledge of chemistry and biology but struggle with cybersecurity challenges aimed at university students.

The LLMs also struggled to complete sequences of actions for complex tasks, without human oversight.

The research comes as the UK is set to co-host the second global AI summit in Seoul.

The first summit was held at Bletchley Park in November and saw the first-ever international declaration to deal with AI.

The AISI also announced plans to open its first overseas in San Francisco, where major AI firms such as Open AI and Anthropic are based.


'The future is going to be harder than the past': OpenAI's Altman and Brock address high-profile resignation

Sam Altman and Greg Brockman shared points about OpenAI's safety protocols on X
.
By Anna Iovine  
May 18, 2024


Credit: CFOTO/Future Publishing via Getty Images

This week, OpenAI's co-head of the "superalignment" team (which overlooks the company's safety issues), Jan Leike, resigned. In a thread on X (formerly Twitter), the safety leader explained why he left OpenAI, including that he disagreed with the company's leadership about its "core priorities" for "quite some time," so long that it reached a "breaking point."

The next day, OpenAI's CEO Sam Altman and president and co-founder Greg Brockman responded to Leike's claims that the company isn't focusing on safety.


Among other points, Leike had said that OpenAI's "safety culture and processes have taken a backseat to shiny products" in recent years, and that his team struggled to obtain the resources to get their safety work done.
SEE ALSO: Reddit's deal with OpenAI is confirmed. Here's what it means for your posts and comments.

"We are long overdue in getting incredibly serious about the implications of AGI [artificial general intelligence]," Leike wrote. "We must prioritize preparing for them as best we can."

Altman first responded in a repost of Leike on Friday, stating that Leike is right that OpenAI has "a lot more to do" and it's "committed to doing it." He promised a longer post was coming.

On Saturday, Brockman posted a shared response from both himself and Altman on X:

After expressing gratitude for Leike's work, Brockman and Altman said they've received questions following the resignation. They shared three points, the first being that OpenAI has raised awareness about AGI "so that the world can better prepare for it."

"We've repeatedly demonstrated the incredible possibilities from scaling up deep learning and analyzed their implications; called for international governance of AGI before such calls were popular; and helped pioneer the science of assessing AI systems for catastrophic risks," they wrote.

The second point is that they're building foundations for safe deployment of these technologies, and used the work employees have done to "bring [Chat]GPT-4 to the world in a safe way" as an example. The two claim that since then — OpenAI released ChatGPT-4 in March, 2023 — the company has "continuously improved model behavior and abuse monitoring in response to lessons learned from deployment."

The third point? "The future is going to be harder than the past," they wrote. OpenAI needs to keep elevating its safety work as it releases new models, Brock and Altman explained, and cited the company's Preparedness Framework as a way to help do that. According to its page on OpenAI's site, this framework predicts "catastrophic risks" that could arise, and seeks to mitigate them.

Brockman and Altman then go on to discuss the future, where OpenAI's models are more integrated into the world and more people interact with them. They see this as a beneficial thing, and believe it's possible to do this safely — "but it's going to take an enormous amount of foundational work." Because of this, the company may delay release timelines so models "reach [its] safety bar."

Related Stories
One of OpenAI's safety leaders quit on Tuesday. He just explained why.
3 overlapping themes from OpenAI and Google that prove they're at war
When will OpenAI's GPT-4o be available to try?

"We know we can't imagine every possible future scenario," they said. "So we need to have a very tight feedback loop, rigorous testing, careful consideration at every step, world-class security, and harmony of safety and capabilities."

The leaders said OpenAI will keep researching and working with governments and stakeholders on safety.

"There's no proven playbook for how to navigate the path to AGI. We think that empirical understanding can help inform the way forward," they concluded. "We believe both in delivering on the tremendous upside and working to mitigate the serious risks; we take our role here very seriously and carefully weigh feedback on our actions."

Leike's resignation and words are compounded by the fact that OpenAI's chief scientist Ilya Sutskever resigned this week as well. "#WhatDidIlyaSee" became a trending topic on X, signaling the speculation over what top leaders at OpenAI are privy to. Judging by the negative reaction to today's statement from Brockman and Altman, it didn't dispel any of that speculation.

As of now, the company is charging ahead with its next release: ChatGPT-4o, a voice assistant.

No comments: