A newly discovered LegalPwn attack leverages Gemini, ChatGPT, and various other AI tools to execute harmful code by manipulating disclaimers.
A new attack method known as “LegalPwn” has been identified, exploiting the tendency of AI models to comply with legal-sounding text. This sophisticated prompt injection technique, revealed by Pangea AI Security, manipulates large language models (LLMs) into executing malicious code by embedding harmful instructions within seemingly legitimate legal disclaimers, copyright notices, and terms of service. The attack has successfully bypassed safety measures in popular development tools, including GitHub Copilot, Google’s Gemini CLI, and ChatGPT. By disguising malicious payloads within familiar legal language, attackers can exploit the models’ inherent strengths in interpreting and contextualising information, which can inadvertently become a vulnerability.
During testing, the LegalPwn technique demonstrated alarming effectiveness. Researchers presented malicious code containing a reverse shell, concealed within legal disclaimers, and many AI systems failed to recognise the security threat. Instead, these systems classified the dangerous code as safe, with some even recommending its execution. For instance, GitHub Copilot overlooked a reverse shell payload disguised as a simple calculator program, while Google’s Gemini CLI not only failed to detect the threat but actively encouraged users to execute the malicious command. Testing across 12 major AI models revealed that approximately two-thirds were vulnerable to LegalPwn attacks, with notable susceptibility observed in ChatGPT 4o, Gemini 2.5, and various Grok models. However, some models, such as Anthropic’s Claude and Microsoft’s Phi, exhibited greater resilience against this sophisticated attack method.