Scientists have discovered vulnerabilities in GPT-5 that allow for jailbreaks and zero-click AI agent attacks, which threaten the security of cloud and IoT systems.
Cybersecurity researchers have discovered a jailbreak technique that circumvents the ethical guardrails implemented by OpenAI in its latest large language model, GPT-5, enabling the generation of illicit instructions. The generative artificial intelligence security platform, NeuralTrust, reported that it utilised a known method called Echo Chamber in conjunction with narrative-driven steering to manipulate the model into producing undesirable responses. Security researcher Martí Jordà explained that this approach involves seeding and reinforcing a subtly poisonous conversational context while guiding the model with low-salience storytelling, which avoids explicit intent signalling. This combination effectively nudges the model towards the desired outcome while minimising triggers for refusal.
The Echo Chamber technique, initially detailed by NeuralTrust in June 2025, deceives the model into generating responses on prohibited topics through indirect references and multi-step inference. Recently, this method has been paired with a multi-turn jailbreaking technique called Crescendo to bypass xAI’s Grok 4 defences. In the latest attack on GPT-5, researchers found that harmful procedural content could be elicited by framing it within a story context. By providing the AI system with a set of keywords and creating sentences around those words, they could iteratively steer the model towards generating instructions without overtly stating the request. This manipulation occurs through a “persuasion” loop, allowing the narrative to progress while minimising refusal triggers. The findings highlight the inadequacy of keyword or intent-based filters in multi-turn settings, where context can be gradually poisoned and echoed back under the guise of continuity.