Video Screencast Help

VBA Emulation: A Viable Method of Macro Virus Detection? Part One

Created: 17 Apr 2002 • Updated: 03 Nov 2010
Language Translations
Anonymous's picture
0 0 Votes
Login to vote

by Gabor Szappanos

VBA Emulation - A Viable Method of Macro Virus Detection? Part One
by Gabor Szappanos
last updated April 18, 2002

According to the data collected by the International Computer Security Association (ICSA) Loveletter was the fastest spreading virus ever. In the early era of computing a successful virus, like Form, could become the world’s most prevalent virus in about 3 years from its first appearance. Five years later, in 1995 Concept needed 4 months to become No.1. WM97.Melissa reached the top of the virus list in 4 days. All this is nothing compared to Loveletter which has become the most widespread virus only after 4 hours of its first encounter, infecting already over a million PCs by then.

Current reactive anti-virus techniques with a 2-3 hour typical response time are just not efficient enough to prevent similar attacks in the future. The solution should be to utilize more efficient generic pro-active virus detection methods.

There are at least two commonly used solutions to detect unknown viruses. The first method is the heuristic scanning, which investigates the instructions in the sample looking for instructions that are characteristic to viruses. If such instructions exist, the sample is suspected to contain a virus.

The second method is using a code emulator. This creates a virtual PC environment and examines the code of the sample file instruction-by-instruction, emulating the changes caused by the instructions in the emulated environment. If the environment begins to look as though is has been infected by a virus, the sample is suspected to contain a virus.

The basic procedure of the virus emulation is the following:

  1. Collect the executable code
  2. Find the entry point
  3. Get the next instruction and execute
  4. Examine the changes – behaves like virus?
  5. If the sample is not determined to be a virus, go back to step 3.

Emulation has several advantages. It detects polymorph viruses by executing the decoder and scanning inside the main virus code. It detects encrypted viruses, where the main virus body is behind static encryption. Lastly, it detects unknown viruses

The major problem with emulation, as with all heuristic anti-virus methods, is that they are prone to produce false alarms, suspecting virus in innocent programs. This article is the first in a two-part series that will examine some of the problems that exist with emulation, with the end in mind of determining whether or not it is a realistic anti-virus method.

I. Macro Source Problems

The first point in the design of an emulator environment would be to feed in the executable macrocode in order to execute it in the emulator. Nothing could be easier, one would think. However, the real problems start here. First, what should be considered as the executable macrocode? One would say that the emulator should use exactly the same input that the MS Office VBA interpreter uses. Unfortunately, there are three different formats and locations for storing the VBA code, and under the circumstances, each one of them could be used by VBA as the execution source. The reason is that although the VBA environment is an interpreter, due to performance reasons precompiled and compiled code is also used under certain circumstances.

Compressed source code

The compressed source code is loaded when upgrading or downgrading from a different version of VBA or from a different platform (Mac to Windows and vice versa) because it is version and platform independent and will be usable from any version. Apart from a couple of pathologic cases, compressed source is present in all macro viruses.

Opcode format

As the text is typed into the VB editor, the VBA parser immediately converts it into opcodes, which is how the syntax errors are displayed right away. When the project is saved or loaded, the project information is saved in a platform and version specific format. The opcodes are saved in the module streams along with the compressed text. Apart from a couple of pathologic cases, compressed source is present in all macro viruses.

Executable p-code

This is the actual executable code. This format is generated as the code is run, or a compile is called explicitly. The execodes are saved if they exist at save time, but these do not exist in all documents, especially not in viruses. Theoretically, the opcode format and the compressed source code format should be equivalent. Unfortunately, we have seen some cases where these representations differ, as we shall see in the next section.

Off-Sync Situations Between the Representations

XM97.Jini.A1

This particular case was caused by the improper disinfection of XM97.Jini.A. It was completely missing most parts of the VBA project storage, including the opcodes and the compressed source. However, the improper disinfection left the execode streams intact. As it is a version-dependent representation, when the virus is opened in the original Excel version (Excel97 in this case), the execodes are used and the virus activates. Even so, most of the viruses in execode format would not be viable as they have no access to the code module in order to read the macro source from there; but Jini happens to use a different spreading mechanism not relying on the access to the code module.

Class.EZ

While the previous example was the result of an accident, this virus directly utilized the internals of the VBA engine. This virus drops KLOOP.EXE and, after the infection of a new document, this program is executed to patch the infected document. KLOOP.EXE changes the VBA version number to a random value, and then wipes the first 24 bytes of the opcode table. On one hand, this will force VBA to recompile the project on each run from the compressed source and, on the second hand, will corrupt the opcode table.

As we have seen, the choice of any single representation for emulation source will give us trouble; as such, there are viruses that do exist that will not be detected.

II. Macro Execution Problems

Now that we addressed the problem of the execution source, we can move on to more serious problems.

Entry point problems

Emulating an executable virus is rather simple. The emulator has to find the entry point of the program, then parse the first instruction (which is in an easily usable machine code form already), and then see what would happen in the emulated environment if the instruction were really executed. Now let us try to apply it to the macro viruses.

The first step is already extremely difficult. Office macro execution is event-driven; one cannot point out an instruction and say that it is the first virus instruction that will be executed. It depends on the actual circumstances. The different macro procedures can be executed in different order, depending on the sequence of events in the Office environment.

There are several ways macrocode can be executed. In the following section, I will describe them, listing only those that were seen in real-life viruses.

1. Automatic Micros

There are predefined macro names that represent application events. Whenever a macro with that name exists, and the event is fired, the macro is executed (if the automacro execution is not turned off programmatically or by holding down the SHIFT key). For Word, the following five automacros exist:

Macro name Condition
AutoExec MS Word startup
AutoOpen Opening document
AutoClose Closing document
AutoExit Exiting MS Word
AutoNew Creating new document

These automatic macros were created for use in Word 2 WordBASIC; however, they still exist in Office XP, despite the difference between WordBASIC’s macros and VBA’s procedures and modules.

2. Menu Commands

For improved flexibility and customizability, Word provides the option to override the default actions taken when a certain menu item is accessed. In order to do that, the macro name has to match the menu command name. For example, the File=>Save menu command can be overridden with the FileSave macro (at least, in the English version of Word). In nationalized versions of Word, these built-in menu command macro names vary with the language version. Thus the FileSave is really DateiSpeichern in the German Word, FilierEnregistrer in the French Word, and so on.

3. Event Handlers

VBA introduced yet another activation mechanism for macro code, the event handlers that can respond to application and document level events, like closing or saving a document or exiting the application.

The major differences between event handlers and the previously described automatic macros and menu commands are that the event handlers can only be placed in class modules, and these procedures take arguments. One of those class modules is always present in a document under the default ThisDocument name, but further class modules can be created and stored in the document macro storage.

So, for the sake of the emulation, learning from the experience of macro virus replications, one has to assume a standard File|Open => File|Save => File|Close sequence that would probably fire the majority of the Word macro viruses. This will fire the event sequence:

AutoOpen=> Document_Open=> FileSave=> FileClose=> AutoClose=> Document_Close.

The source parser has to analyze the code, collect the procedure names, then decide, whether it is one of the above mentioned procedure names. It is also important that the procedures named above would be declared Public (except for Document_Open and Document_Close), and that their parameter list should be empty. If any of these two conditions is not true, the procedure is not a valid event handler procedure and should be rejected. Moreover, the Document_Open and Document_Close handlers are special in the sense that they can only be placed in a class module, otherwise they would not execute. Even further, as FileSave and FileClose represent built-in menu commands, they can differ with different nationalized versions of Word; therefore, the parser has to maintain a database of these translated macro names for at least the most important Word versions (German, French, etc).

Excel viruses use different infection schemes, as they follow the old-style Laroux's trek, which is based on the activation and deactivation of worksheet windows. Therefore, in their case, the event scheme has to be combined with intermediate activation and deactivation of workbook windows to take care of the activation. This will result in the following event chain:

Workbook_Open=> Workbook_Activate=> Workbook=> WindowActivate=> {Workbook_SheetDeactivate=> Workbook_SheetActivate} => Workbook_BeforeSave=> Workbook_BeforeClose=> Workbook_WindowDeactivate=> Workbook_Deactivate

In this case, the parser will have to take into account that some of the above event handler functions take arguments. If the argument count does not match, the project will not even compile, not to speak of execution. Then the entry point to the macrocode will be the first procedure that exists in the above event chains. When walking through the event chain, upon each completed event handler subroutine, the VBA project has to be re-parsed, to take into account those viruses that change procedure names or even procedure content during execution.

Up until now, I have only paid attention to MS Word and Excel. Several other productivity applications allow VBA to improve functionality, and provide enough power to write macro viruses. These applications (including, but not limited to Visio, AutoCAD, and WordPerfect) all use their own macro storage mechanism and event handler naming scheme. As a result, the emulator should necessarily have several plug-in versions of the code parser and the event chain pre-selector.

4. Mouse Clicks (PowerPoint Viruses)

PowerPoint viruses pose another problem. They appear as side effects of the cross-application MS Office macro viruses (e.g. Tristate) and, consequently, can be found in the wild too. The problem is that these do not use the previously described event handlers, as PowerPoint does not provide a mechanism for macro execution on the file loading. These viruses create a form on the infected presentation and hook the mouse-click event on this form to the virus code. As a result, a virus does not necessarily use fixed procedure names; therefore, the parser cannot find the entry point easily. In this case, a more detailed analysis of the presentation file structure is necessary to determine which procedure is hooked to the mouse click.

Conclusion

This concludes the first of two articles discussing emulation as a viable method of virus detection. In the next installment of this series, we will discuss code execution flow, underlying operating system problems, incompatibility issues with different versions of Office, as well as VBA emulator environment.

This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.

Article Filed Under: