A vulnerability chain in NVIDIA Triton has the potential to allow attackers to gain control over AI servers.
A critical vulnerability chain has been identified in NVIDIA’s Triton Inference Server, allowing unauthenticated attackers to achieve complete remote code execution (RCE) and gain full control over AI servers. This vulnerability chain, designated as CVE-2025-23319, CVE-2025-23320, and CVE-2025-23334, exploits the server’s Python backend through a sophisticated three-step attack process involving shared memory manipulation. The attack begins with an information leak that escalates to a complete system compromise, posing significant risks such as theft of proprietary AI models, exposure of sensitive data, and manipulation of AI model responses. The vulnerability specifically targets the Python backend, which is widely used and serves as a dependency for other backends, thereby expanding the potential attack surface for organisations relying on Triton for AI and machine learning operations.
The attack chain employs a sophisticated Inter-Process Communication (IPC) exploitation method through shared memory regions located at /dev/shm/. In the first step, attackers trigger an information disclosure vulnerability by sending crafted large requests that cause exceptions, revealing the backend’s internal shared memory name in error messages. The second step exploits Triton’s user-facing shared memory API, which lacks proper validation, allowing attackers to register the leaked internal shared memory key and gain read/write access to the Python backend’s private memory. In the final step, attackers leverage this access to corrupt existing data structures and manipulate pointers, ultimately achieving remote code execution. NVIDIA has released patches in Triton Inference Server version 25.07, and organisations are urged to update immediately to mitigate these critical vulnerabilities.