Researchers at Carnegie Mellon have uncovered a new vulnerability that causes aligned language models like ChatGPT to generate objectionable behaviors at a high success rate.
You can make top LLMs break their own rules with gibberish theregister.com - get the latest breaking news, showbiz & celebrity photos, sport news & rumours, viral videos and top stories from theregister.com Daily Mail and Mail on Sunday newspapers.
When artificial intelligence companies build online chatbots, like ChatGPT, Claude and Google Bard, they spend months adding guardrails that are supposed to prevent their systems from generating hate speech, disinformation and other toxic material.
A study claims to have discovered a relatively simple addition to prompt questions that can trick many of the most popular LLMs into providing forbidden answers.
In a report released Thursday, researchers at Carnegie Mellon University in Pittsburgh and the Center for AI Safety in San Francisco showed how anyone could circumvent AI safety measures and use any of the leading chatbots to generate nearly unlimited amounts of harmful information.