The "Pasta Theory" of Programming and Malicious Code
In a letter to the editor of CrossTalk magazine, “Rubey” of SofTech Inc. exhorted developers to “go beyond the condemnation of spaghetti code to the active encouragement of ravioli code.” It was 1992 and the "pasta theory of programming" was officially born. Since we first talked of the “spaghetti code” used by Trojan.LinkOptimizer, at least one blog reader has asked for more details about it, so I decided to post a brief explanation and a visual demonstration of what is exactly spaghetti code is.
Programmers talk about spaghetti code when a program has a complex and tangled control structure that uses many jumps (GOTOs) or other unstructured branching constructs. Now, take a second to solve the following visual quiz. Look at the images below, which show three different graphs generated by IDA Professional (a well-known disassembler program). Each graph is the result of the analysis of the function flow of an executable file. Which one seems to be spaghetti code? Which one is Trojan.LinkOptimizer? I'm sure that it will be easy to spot.
The first graph (figure 1) was generated by analyzing Windows CALC.EXE. Each block represents a function and the connections between functions are jumps, loops, and nested branches. The flow is quite linear and has a top-down structure that begins from a common root. Normal programs have a graph very similar to this one (with a few exceptions that depend on the compiler and the programming language used).
In the second graph (figure 2) the flow is not linear at all. The structure does not begin with a common root (there are many blocks on the top) and all the blocks are connected to each other in a crazy way, due to the presence of many jumps. Moreover, the size of a single block is small, because each block is only a little piece of the code and is not a complete function. This picture is a view of a LinkOptimizer sample.
What about the third graph? Figure 3 is a representation of Backdoor.Rustock.B code. Rustock.B is another bit of malicious code that uses a spaghetti code approach to hide and scramble its code. However, in this case, the jump instructions are replaced by jump-equivalent instructions (i.e. PUSH/RET) and so IDA Professional has trouble building a proper graph representation – it shows only two large blocks. That’s very unusual for a legitimate program!
Spaghetti code makes an analyst’s life hard; however, the major drawback of this obfuscation technique is that it actually makes it easier to spot the suspicious files. In fact, Symantec products can detect all recent Trojan.LinkOptimizer and Backdoor.Rustock variants, even if the malicious samples are scrambled with spaghetti code.