Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Who Goes There? An Introduction to On-Access Virus Scanning, Part Two

Updated: 02 Nov 2010
Anonymous's picture
0 0 Votes
Login to vote

by Bill Hayes

By now, most savvy computer users have anti-virus software (AV) installed on their machines and use it as part of their regular computing routine. However, most average users do not know how anti-virus software works. This article is the second in a two-part series that will offer a brief overview of a particular type of anti-virus technique known as on-access scanning.

The first article of this series looked at on-access AV scanners, including some of the basic concepts behind these mechanisms. This article will explore some of the strategies that virus writers have adopted to circumvent on-access scanners and the ways that anti-virus developers are in turn reacting to those changes.

The Bad Guys Respond

Virus writers developed several countermeasures in response to improvements in on-access scanning methods. They began to obscure the malicious nature of the code by encrypting it, by placing it where virus scanners were unlikely to find it, and by hiding the location in the infected program where the virus takes control. For instance, many simple Windows Portable Executable (PE) viruses append their viral code to the end of a program with a jump instruction at the front of the program to cause the virus code to be executed.

A virus scanner can quickly look for change in executable code by examining the length of the program. If the size of the file has been changed during execution, it is good indicator that a virus has infected the program. To counter this check, cavity viruses hide their code in empty spaces within the program file, thus keeping the file size the same. The cavity virus has been around since the MS-DOS days, beginning with the Lehigh virus. Its use in virus design increased greatly when Microsoft developed the Windows PE format to allow Windows programs to be transported between Windows operating systems. In order to speed the loading of PE format programs, Windows program compilers created empty spaces throughout the program. Recent viruses, such as W2K/Lamchi, make use of these open spaces to hide viral code.

Virus writers have also attempted to use encryption algorithms to break anti-virus CPU emulators. The IDEA.6155 virus appeared in the spring of 1998. Developed by the virus writer Spanska, it could infect .com, .exe and .zip files. It was not a simple virus, but a proof-of-concept virus that incorporated and demonstrated complex encryption methods. Spanska reportedly distributed it to anti-virus companies and did not allow it to get into the wild (a term that is used to refer to viruses that are released into circulation on the Internet). Nonetheless, it created quite a stir in the anti-virus community.

The IDEA.6155 virus used three different encryption methods in concentric layers around the viral code. The first layer used the FSE mutation engine to decrypt itself, using a key found in the viral code. The second layer did not have this key; instead, it could be decrypted because the key's value could be quickly guessed though a brute force decryption attack. The inside layer used the IDEA encryption algorithm. To spoof on-access scanners, the IDEA 128-bit key could be located in one of two places inside the virus body. The heavily encrypted virus could cause a CPU emulator to swamp a Windows Pentium PC. Reportedly, decryption of the virus could take as long as five seconds on a Pentium system without using an anti-virus code emulator. According to Vessilin Bontchev, anti-virus researcher at Frisk Software International, on a system with an anti-virus code emulator running, the decryption of the virus could take from a few minutes to a half-hour! This raised the specter that end-users would open their PCs to infection by disabling the anti-virus software.

Virus writers also began to use entry point obscuration (EPO) to hide the point where the jump instruction to viral code is located. The W32/MTX@M worm and Win95/SK virus were among the first viruses to use this technique. There are a number of variations on this idea, ranging from hiding the block of viral code in the program body to actually integrating the viral code inside the program code. Many AV professionals regard EPO viruses as being the most challenging.

Virus writers began using encryption techniques to keep their code from being immediately identifiable as viral code. Virus writers reason that by changing the encryption key from computer to computer, a virus can keep its code from being immediately identifiable. Virus design teams, such as the 29A VX group, and individual virus writers like Zombie and Black Baron worked to improve virus encryption and mutation schemes.

At the same time, virus writers developed methods to hide their viral code in plain view by altering its appearance. Polymorphic viruses encrypt their code using a variety of encryption schemes with varying decryption routines. However, the viral code can be readily identified once decrypted, thanks to the unchanging portions of their code, such as a data area filled with string constants. Polymorphic viruses must have a "head" or decryptor that exists to decrypt the encoded virus and allow it to run. Polymorphic viruses may alter their appearance by changing the order of subroutines, and injecting random junk code like NOP (null operation) instructions. Examples of polymorphic viruses include SMEG.Pathogen (whose U.K. writer was caught and sentenced to 18 months) and Elkern, the companion virus to the Klez worm.

Virus writers later introduced the metamorphic virus, which actually changes its code from generation to generation (a generation is one series of propagation). A metamorphic virus also encrypts its code but its decryption routine, key and even its key location may change gradually over time. And, unlike the polymorphic virus, in metamorphic viruses, data and code are all rolled into the same code body. Like the polymorphic virus, the metamorphic virus will make use of a number of techniques to hide the true purpose of its code.

Not content to just hide mailicious code, virus writers may also take steps to attack the anti-virus software's analysis tools. There are three main attacks that may be employed against system emulators. The first of these attacks is the inclusion of bizarre and unlikely instructions. These are programming instructions that the anti-virus programmer may not have considered, because they are almost never used in real programs and are too difficult to emulate, as a result, they are not detected by emulators.

In the second system emulation attack, the virus writer attempts to exploit the lag between the emulator's evaluation of suspect code and its actual execution time in the real system. The IDEA.6155 is an older example of this attack, in which heavy decryption action could swamp the emulator's limited resources.

In the third code emulator attack, the virus attempts to discern that it is being emulated by exploiting limitations in the code emulator's design. No emulator can account for every conceivable situation. The virus may also alter its behavior based on systems dates or perceived changes such as the loss of a network connection. For instance, the Magistr worm is known not to infect executables if an Internet connection is missing.

The W32.Simile virus is among the latest metamorphic viruses. Highly complex, it is also able to cross between the Linux and Windows operating systems. With over 14,000 lines of assembly code, the majority of the virus is devoted to its own metamorphic engine. Written by virus writer The Mental Driller and discovered in March 2002, W32.Simile is a metamorphic virus bristling with a number of anti-emulator functions. The virus uses a pseudo-random decryption routine that uses modular arithmetic functions to decrypt the virus body in a non-linear manner, not from beginning to end but chunks seemingly chosen at random. It is intended to confuse an emulator into thinking that decryption is occurring.

To further spoof the emulator, the Simile virus makes use of the RDTSC (Read Time Stamp Counter) instruction. This instruction is used to retrieve the current value of an internal processor ticks counter. The virus examines this time slice and then randomly determines if it will attempt to decode the virus or let the opportunity pass. This means that the virus may not successfully decrypt itself on the first try or even after a number of attempts. The complexity of the code has introduced a number of bugs that may keep an instance of Simile from decrypting itself for a much longer time.

Simile's complexity also points out that there may be eventually be a point of diminishing return for virus writers (a happy thought!) in making truly complex viruses. Instead, the success of Hybris, Klez , Magistr, MTX, and Sircam point out that a virus that makes effective use of deception (bogus return addresses) or misdirection (varying subject lines and message bodies) may be assured a much longer run on the world's stage.

Anti-Virus Programmers Fight Back!

Given the impressive opposition of metamorphic EPO viruses that excel in hiding and obscuring viral code, how does the anti-virus developer craft an effective on-access-scanner? String matching, a method by which viruses are detected by the detection of strings of code that are known to indicate malicious or viral properties, has been one of the stand-bys of virus scanning. But with code-obscuring techniques and the ascendancy of polymorphic and metamorphic viruses, it may be difficult, if not impossible, for scanners to detect a recognizable string. Likewise, cyclic redundancy check (CRC) comparisons used to be very effective in locating viruses. Viruses are now able to alter CRC values to hide the existence of their own code.

(Cyclic Redundancy Check (CRC) algorithms are used to check data for corruption. The earliest CRC algorithms were used in communications hardware to ensure that transmitted data was not corrupted in transit. It has since been adopted for a variety of uses in computer programming. A CRC algorithm should quickly produce a unique numerical fingerprint(called a checksum) for each unique piece of data it examines. This approach was used by some anti-virus programs to ensure that programs had not been modified. Unfortunately, it's long been know that some types of corruption can cancel each other out when examined by a CRC algorithm. This allows the corruption to pass unnoticed. This fact was not missed by virus writers, who developed ways to spoof anti-virus CRC algorithms.)

If the file at rest does not reveal the presence of the virus, the virus may well give itself away once it executes. Examining the CPU emulator's stack can point to telltale values that signal the presence of a virus. (For the uninitiated, a stack is a volatile data structure used by the processor that contains a series of memory locations and a pointer to the stack's initial location, which is called the top of the stack. A processor uses the stack as a kind of scratch pad, pushing values to the stack and popping the values out of the stack.)

With polymorphic viruses, there is at least a moment when virus code stands revealed in the stack after decryption, but that technique may not work when the code of a metamorphic virus differs from one generation to the next. The answer lies not in the question, "Does it look and act like a duck?" Instead it may lie in the question, "Ok, so it's got tentacles, but does it act like a duck most of the time?" In the anti-virus software world, asking about behavior really means looking for algorithms. This means leaving the "tangible" world of string recognition and entering an abstract line of reasoning and statistics. Somewhere in all the tall grass of garbage instructions and obscured code, there's a snake of an algorithm, intent on accomplishing its purposes. If the on-access sensor can't directly see the virus snake, it has to deduce its presence from indirect evidence.

If recognizable code can't be found, is the suspect code behaving in the same the way as its viral parent? Does it behave the same way as the parent most of the time? If so, it may be a descendent of a known virus. Even then there may not be a clear-cut answer, so the anti-virus programmer may start to ask questions about what the suspect code isn't doing. Called "negative detection", this technique is useful in deciding when to stop examining a particular piece of code and moving on to the next suspiciously behaving one.

The virus programmer can depend on one constant: in order to be considered a virus, the hostile code must replicate itself. Viruses not only deposit their code in the host file, they often mark them to prevent the file from being altered more that once. So, an on-access scanner should be able to examine the file structure of a file to determine if a virus is attempting to alter it. This technique is often called "shape heuristics" or "geometric detection". By examining the file for markers or other known values altered by the virus inside the file, the on-access scanner stands a better chance of identifying the virus. For instance, polymorphic viruses may have recognizable decryptors or "heads". By cataloging the decryptors, markers, and their variants, the scanning engine can then use these characteristics to help identify a virus.

The Future of On-Access Scanners

The continuing sophistication of viral code shows that on-access scanners must somehow develop ahead of virus writers, boxing them in with fewer and fewer ways to avoid detection. Unfortunately, the steady introduction of new operating system flavors opens new Pandora's boxes of exploitable vulnerabilities. As in the past, virus writers and anti-virus programmers will wage a seesaw war of attack and countermeasure. Experience has shown that a thorough understanding of an operating system's theory of operation is necessary to anticipate new virus strains. For the majority of us, it's a really good thing when the good guys win. But to win, quickly acquired, accurate knowledge of viral code is necessary to understand how increasingly complex viruses interact with their targets.

On-access scanners have made the Internet a much safer place, thanks to their abilities to extensively test code before it gets a chance to run unhindered on a computer. However, the lengths to which virus writers will go to hide their code and attack anti-virus detection methods require complex and reliable detection methods that go beyond simple string matching and CRC checks. These detection methods have become more abstract. Even so, they can successfully detect viruses through their behavior and telltale actions that give away the virus' intent. The challenge remains for anti-virus developers to stay ahead of the opposition, who use deception and disguise to hide the often-destructive nature of their creations, and provide accurate and reliable on-access virus detection.

This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.