A research paper published by Cambridge University researchers Ross Anderson and Nicholas Boucher, titled “Trojan Source: Invisible Vulnerabilities,” reveals details of a unique class of vulnerabilities that can be exploited to inject malware in the source code without getting detected.
According to the research, the malware can alter the source code’s defined logic, allowing a range of first-party and supply-chain risks. The issue lies in Unicode, a digital text encoding standard that enables computers to exchange information no matter which language is used.
Currently, Unicode defines over 143,000 characters in 154 different languages scripts and many non-script character sets like emojis.
About Trojan Source Attacks
This technique exploits the text-encoding standards’ subtleties, including Unicode, so as to produce a different source code, the tokens of which are logically encoded in a completely different order from the original one. This can create vulnerabilities that human code reviewers cannot perceive directly.
Go C# C, C++ Rust Java Python