![]() ![]() Some of the strings seem to incorporate language people already discovered can sometimes jailbreak guardrails. But the researchers determined, thanks to the alien way in which LLMs build statistical connections, that this string will fool the LLM into providing the response the attacker desires. These suffixes look to human eyes, for the most part, like a long string of random characters and nonsense words. (Weights are the mathematical coefficients that determine how much influence each node in a neural network has on the other nodes to which it’s connected.) Knowing this information, the researchers were able to use a computer program to automatically search for suffixes that could be appended to a prompt that would be guaranteed to override the system’s guardrails. That’s because the attack the researchers developed works best when an attacker has access to the entire A.I. ![]() ![]() But the news was particularly troubling for those hoping to build public-facing applications based on open-source LLMs, such as Meta’s LLaMA models. ![]() The attack method the researchers found worked, to some extent, on every chatbot, including OpenAI’s ChatGPT (both the GPT-3.5 and GPT-4 versions), Google’s Bard, Microsoft’s Bing Chat, and Anthropic’s Claude 2. It turns out that there may be no way to prevent such agents from being easily hijacked for malicious purposes. It also has frightening implications for those hoping to turn LLMs into powerful digital assistants that can perform actions and complete tasks across the internet. It means that attackers could get the model to engage in racist or sexist dialogue, write malware, and do pretty much anything that the models’ creators have tried to train the model not to do. The discovery could spell big trouble for anyone hoping to deploy a LLM in a public-facing application. developers put on their language models to prevent them from providing bomb-making recipes or anti-Semitic jokes, for instance-of pretty much every large language model out there. Safety announced that they had found a way to successfully overcome the guardrails-the limits that A.I. That is what a lot of people were thinking yesterday when researchers from Carnegie Mellon University and the Center for A.I. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |