Posted on behalf of Martin Lee, Senior Software Engineer, Symantec Hosted Services
Targeted Trojans are bespoke pieces of malware written by someone who is trying to access information from an identified individual. This particular Trojan demonstrates some of the tricks used by targeted Trojan writers.
The intended victim of this attack is a senior individual in the energy and mining sector. The malicious email is plausible, during difficult economic times an urgent round of downsizing may be underway and that this issue is being kept a secret. The attacker is expecting the victim to be curious and to attempt to open the attachment.
Although the attachment appears to be a spreadsheet file, it is actually a malicious executable that exploits a known vulnerability in Microsoft Excel to run.
Malware writers are aware that many corporate systems block executable file types such as .exe files, or that well informed users may be reluctant to click on them. Spreadsheet files are commonly distributed by email, security policy is unlikely to block such file types, and users are likely to be at ease with opening this kind of attachment. Therefore from the attacker’s point of view, these types of files are very useful to subvert.
With this malware, the attacker has chosen to exploit a relatively old vulnerability in Microsoft Excel dating from March 2008. In this exploit it is possible to create a specially crafted Excel file that tricks the program into executing computer code hidden within the file. Occasionally, targeted Trojan writers use older vulnerabilities in their malicious software. Presumably this may be an attempt to bypass detection by signature-based anti-virus detection using vulnerabilities that are no longer abused by common malware and thus signature-based systems no longer scan for them.
By glancing at the email it is impossible to tell that the attached file is malicious. But it is possible to identify that this is an unusual Excel file by looking at the degree of randomness contained within the file. Randomness is an interesting property of digital data, it can relay information about the characteristics of the data without having to understand the data in detail . Areas of low randomness within a file are indicative of empty space, areas of medium randomness are indicative of written text, and areas of high randomness are associated with either encrypted data or computer code instructions.
These graphs compare the amount of randomness within the malicious Excel file in blue and a normal Excel file in orange.
The degree of randomness of the normal file rarely strays over 0.6, whereas the malicious file has a large area of high randomness above 0.8 for a large area in the middle of the file. This suggests that there may be hidden machine code within the file.
Inspecting the contents of the file shows a lot of gibberish, but also an interesting section in plain text.
This section looks like it could be error codes used in debugging the malware. In any case, this is unusual text to find inside a spreadsheet and would appear to be completely unrelated to the expected contents of a spreadsheet related to downsizing. Occasionally malware writers remove debugging sections such as these from their programs which gives some insight into their development techniques.
Automated analysis of the rest of the file doesn’t identify any further areas of interest. Despite the large area of high randomness, and the apparent debugging messages, there doesn’t appear to be any actual computer instructions in the file. That is apart from one small section which looks like ii contains a few lines of computer instructions.
This small section explains why one is unable to find the large number of instructions that would be expected. XOR (highlighted in yellow) is a computer instruction often used to encrypt data. The reason why we are unable to identify the actual program in the data is because the program is encrypted and this piece of computer code is part of the instructions needed to decrypt the hidden program to execute it.
Looking through the encrypted data, we see large sections of data with the hexadecimal value ‘4a’.
It is possible that these values should be ‘00’, but they have been XOR encrypted with ‘4a’ as the key. Attempting to decrypt the data using ‘4a’ as the key, reveals an area of text that looks like it may be a scrambled version of ‘This program cannot be run in DOS mode’.
Applying a character swapping algorithm reveals the characteristic start of a Windows executable beginning with ‘MZ’ and containing ‘PE’ and the DOS mode warning.
This is the beginning of the malicious program, whoever wrote this executable file wanted to keep it hidden and went to a lot of trouble to hide it. The program can only be activated opening the spreadsheet file and then the data must be decrypted and rearranged before it can be executed.
These techniques for hiding the malicious code from analysis mean that such files are very difficult to identify using anti-virus signature techniques, since the part of the file that should be detected rearranges itself and is encrypted. Nevertheless by knowing what a spreadsheet file should look like in terms of the amount of randomness contained within the file, and knowing how malware writers hide their code we’re able to detect that this file is ‘wrong’ and that it contains hidden code. Often it is far easier and more effective to detect malware by using heuristic techniques such as these, than by looking for the short snippets of known malicious code using signature anti-virus techniques.