Vulnerabilities in NVIDIA Triton allow unauthenticated attackers to run code and take control of AI servers.
A newly disclosed set of security flaws in NVIDIA’s Triton Inference Server for Windows and Linux, an open-source platform for running artificial intelligence (AI) models at scale, poses significant risks to susceptible servers. According to Wiz researchers Ronen Shustin and Nir Ohfeld, when these vulnerabilities are chained together, they could allow a remote, unauthenticated attacker to gain complete control of the server, achieving remote code execution (RCE). The vulnerabilities include CVE-2025-23319 (CVSS score: 8.1), which allows an out-of-bounds write via the Python backend; CVE-2025-23320 (CVSS score: 7.5), which enables exceeding the shared memory limit with a large request; and CVE-2025-23334 (CVSS score: 5.9), which permits an out-of-bounds read. Successful exploitation could lead to information disclosure, remote code execution, denial of service, and data tampering, particularly with CVE-2025-23319.
The issues are rooted in the Python backend designed to handle inference requests for major AI frameworks such as PyTorch and TensorFlow. Wiz researchers highlighted that a threat actor could exploit CVE-2025-23320 to leak the unique name of the backend’s internal IPC shared memory region, a key that should remain private. This could then be leveraged alongside the other two flaws to gain full control of the inference server. The researchers warned that this poses a critical risk to organisations using Triton for AI and machine learning, as successful attacks could result in the theft of valuable AI models, exposure of sensitive data, and manipulation of AI model responses. NVIDIA’s August bulletin also noted fixes for three additional critical bugs, and while there is no evidence of these vulnerabilities being exploited in the wild, users are advised to apply the latest updates for optimal protection.