Journey to the Center of the PDF Stream
Malware authors use numerous unconventional techniques in their attempts to create malicious code that is not detected by antivirus software. As malicious code analysts, though, it is our job to analyze their creations, and as such we have to be constantly vigilant for the latest tricks that the malware authors employ.
While looking at some PDFs yesterday, something suspicious caught my eye. The PDF file format supports compression and encoding of embedded data, and also allows multiple cascading filters to be specified so that multi-level compression and encoding of that data is possible. The PDF stream filters usually look something like this:
Apparently, malware authors figured they could try to use this multi-level compression and encoding to attempt to evade detection. Antivirus software that does not support the complete set of PDF compression and encoding types will not be able to decode the data and scan for malicious code. In fact, they may have been somewhat successful in doing so—VirusTotal results indicate that many vendors did not detect the threat.
Since Symantec products have support for all compression and encoding schemes specified by Adobe, we were able to decompress the data and scan the plaintext for malicious code. Interestingly, the use of such a high number of filters is in itself anomalous, since a non-malicious PDF file wouldn’t need to use that many.
After multiple decompression and decoding operations, attempts to exploit two known vulnerabilities become visible:
Symantec products detect threats such as these heuristically as Bloodhound.PDF!gen.