I Want More Obfuscation!
Did I just say that? Usually security researchers hate obfuscation. But I say, let them obfuscate more!
Obfuscation is a loosely defined term, but it basically refers to a method of concealing your exploit code to avoid detection. Attackers employ various techniques and methodologies to achieve obfuscation. Some techniques are very clever and take even the most seasoned security researcher by surprise. In most cases, attackers try to obfuscate their exploit by stretching the limits of the language or protocol they are using. Some take advantage of the detection engine limitations as well.
Today many detection engines parse files and network streams to detect vulnerabilities and odd behavior by using pattern-matching algorithms. However, in many cases the detection logic used has some limitations and assumptions built in. Some limitations stem from the architecture of the detection engine, and some stem from the risk of a false positive. In this cat and mouse game, attackers often attempt to take advantage of these assumptions and limitations in order to beat the detection engine.
myfunction(‘h’ + ‘I’);
Now, if the detection was looking only for a double quote “hi” and not a single quote ‘hi’, or the letter ‘h’ and ‘i’ concatenated, this exploit would evade detection. Security researchers have a tough job, needing to think of all possible ways the exploit could be coded. Researchers also need to pay attention to false positives—in other words, is the pattern they are basing the detection on used by anyone legitimately?
So why do I say, give me more obfuscation? Wouldn’t that make things tougher? Well, as it turns out, too much of something is not always a bad thing.
Attackers often go overboard with obfuscation—to a point—where the code they use for obfuscation (that may be within the limits of the framework, format, or protocol) is never used by anyone legitimately. This means it is a very strong heuristic. This makes the attacker’s code stand out and hence easily detectable.
Here are a few examples. A PDF file starts with the header “%PDF-[version number]”. Unfortunately (or not, in this argument) a few attackers realized that they could add extra “%” symbols before the file signature, and still have Adobe Reader recognize and render the PDF file. So, the attackers started to use “%%%%%PDF-[Version number]” in their malicious PDFs. Many detection engines failed to detect this file because they were looking for “%PDF” and not “%%%%%PDF”. However, how many legitimate PDFs start with “%%%%%%PDF-Version number”? By my estimate, nil, nada, zero. This makes it a very strong heuristic. If a PDF contains this obfuscated header, there is a very strong possibility that the file in question is malicious.
Another example is an attempted MSRPC exploit evasion. This exploit requests RPC Binding for several interfaces in one MSRPC BIND request packet. All requested interfaces are fake interfaces, except one. The Server rejects the binding request for all of the fake interfaces and grants the binding for the real interface. The attackers’ logic is that if the detection was looking for only the very first context id (first interface), then the exploit would not get caught:
Although this is a completely legitimate request, and well within the specifications of the protocol, we have never seen any legitimate program requesting multiple bindings in one BIND Request. Assuming a small percentage of programs do in fact do this legitimately (requesting 15 bindings in one request, and having 14 rejected by the server) still qualifies as a strong heuristic.
If I had to chart a graph of obfuscation versus detection it would look something like this:
When there is no attempt at obfuscation or evasion, the detection rate is very high. However, as the obfuscation increases, the detection rate for the content-matching engine dips a little bit. Conversely, if the attackers start obfuscating more, the detection rate goes up. But when I say the detection rate of the content matching engine dips, does that mean we don’t detect the exploit? No, not really.
At Symantec we adopt a defense-in-depth strategy. This means that we have secondary engines that are immune to evasion and obfuscation techniques. These engines come into play and attempt to close the gap.
Having said that, a good obfuscated exploit is one where the obfuscation used is difficult to detect easily because of engine limitations. Or, the same obfuscation pattern is used by a small percentage of clean sites or programs, thus making the detection logic prone to false positives.
So, now you can see why I want more obfuscation!