This article briefly describes the technology that Symantec Antivirus uses to detect viruses and recognize if a file is infected by a virus, or is a virus; however this methodology is used with minor difference by almost all the other antiviruses too.
Note that the method Symantec AV receives the file is not the matter in this article, and we will only deal with the procedure that Symantec AV uses to recognize if the file is a virus or infected by a virus.
The analysis procedure has three phases:
- Signature scan
- Heuristic scan
- TruScan Proactive scan
When Symantec grabs a file to analyze, first it checks it with a table of the known viruses in its definition database. While the signatures are not always matched to the viruses, if the file is not evaluated as a virus using the signatures, then Symantec checks the file using the Heuristic method and tries to estimate if the file is going to perform malicious activities after executing or not. After passing this step, the file will be marked as clean and is permitted to be executed. However, immediately after the execution, the activities of the file are monitored and if there is any suspicious activity performed on the system (such as filling the RAM or doing especial modifications the system registry), the process will be isolated and terminated.
Below we check each step more detailed:
The first step of scanning a file is to compare the entire file or the probable infected portion of it with a table of known viruses’ signatures. These signatures are almost unique sequence of codes or a certain phrase(s) in the virus which are already detected on other systems and is supplied in the antivirus definition database.
While it is not practical to collect the whole body of the viruses in any database and compare with the files, therefore only some particular portion or very certain phrases of the virus will be retained in the signature database. Otherwise there would be a huge database of viruses’ signatures which is absolutely inapplicable.
Instead, the signature database is a collection of unique flags of the viruses and if one of them is detected in the file, Symantec Antivirus presumes the file is a virus.
Although the creators of the signatures try to make the flag (signature) as unique as possible, however it is probable that two similar viruses are mistaken as each other, or in worse case a productive file which mistakenly or by a legitimate reason contains the signature, is judged as the virus. Whereas the Signature scanning method is not hundred percent accurate, therefore the Signature scan is approximative or is a Heuristic algorithm.
Heuristic, is the method that tries to find the closest possible answer to the definite answer. This method is useful when finding the ultimate answer is a very time consuming or in some cases impossible. For instance, if you want to spend 1000$ equally in each day of a week (1000$ in 7 days), you can try to spend something like 142.8571428571429$ per day (which is almost impossible!), or try to balance expenses between 142$ and 143$, and in the last day you expense the rest of remainder. Although the second solution is not the 100% accurate solution since the expenses in each day is not exactly equal, however by an acceptable level of inaccuracy it is practical and especially is very near to the best answer. Nevertheless, was that the 100% correct answer? Of course not! But compare to spending 142.8571428571429$ per day, well, for sure it is the best answer!
The Heuristic Virus Detection method is an evaluation method which leads the antivirus to a state that with a high probability of success it can judge a file is a virus or may function as a virus.
In order to do so, the antivirus tries to check the sequence of the codes and tries to predict if they can be body of a virus. For this, the codes are extracted from the executive file body and compared to a database of codes with various fashions. If the codes sequence is similar to known methods of virus programming (not specifically a known virus) then file can be announced as a harmful application.
For instance in the below script:
Del *.* /F /Q /S > c:\windowns\temp\a
Copy %0 "c:\Documents and Settings\All Users\Start Menu\Programs\Startup\bad.bat"
The above semi-virus (which is not functional!) script tries to delete all the files in drive D: by executing a self created batch file in the startup folder. However when the antivirus encounters with the code line 4, it understands that the file copies its own code in the system. Accordingly the scrip is a suspicious one and maybe set as a malicious batch file.
This method is called Static Heuristic Scan.
However, a script or a code can be created with hundreds of different methods and sequences so that the Static scanning sometime fails to recognize the meaning and purpose of it. Therefore the codes should be scanned with a method that is Code Free and can estimate if a file is virus without relying on the code and the complexity.
The other method to examine a code is by really running it in a virtual environment and recording the results of the execution. Symantec leverages a CPU enumerator to create the virtual environment and deceive the program (which might be a virus) that it is functioning with a real CPU in a real environment. An enumerator is a fake CPU which roles as a real CPU. Using an enumerator, the suspicious program starts to function in an isolated environment without being capable to harm the real system. What is more is that the activities and requests can be logged for further analyze. After collecting enough amount of outputs and effects of the program, then Symantec can estimate if the program will function harmfully or not. Note that in this phase the program has not started in the real system yet.
This method is called Dynamic Scan and is very effective on encrypted codes or the codes which they are intentionally changed to hide their commands purpose.
The weakness of this method is when the code does not show its activity immediately after it is executed with the enumerator. For instance what if the harmful activity stars only on the 26thof month, but the current date is 15th? For sure the harmful action will not start at all, so that cannot be evaluated.
Therefore both the Static and Dynamic have their own pros and cons, but the good news is that Symantec Antivirus is the only Antivirus in the market which is able to benefit from both techniques and examines the files using both Dynamic and Static Heuristic scan. The Symantec Heuristic Scan technology is called Bloodhound and is one of the major advantages over the other Antiviruses.
After the file passes through Heuristic scan, the file is marked as Cleaned and will be eligible to function.
TruScan Proactive Threat Scan
This phase begins when the file actually starts functioning, for instance the user double clicks on it or is called by another application.
In this phase the effects of the process (which is now in RAM and really functioning in the system) will be monitored. So if intentionally or unintentionally the program starts harming the system or the OS, or it is functioning harmfully (for instance filing RAM up to end or increasing the CPU usage up to 100% or as such) or performs activities very similar to viruses and ultimately there are enough evidences proving the process is behaving as a virus or malicious, then Symantec will judge the process is a Badware and ceases it. The processes are checked immediately after they are executed, and with an interval all the processes in the RAM will be checked since they may hide their destructive activities in the first moment of execution.
As mentioned above, the main difference between the Truscan Proactive scan and the other two methods (signature scan and Heuristic scan) is that the TruScan Proactive examines a Process and starts after files are executed.
Overall, when Symantec Antivirus encounters with a file (for example when it is created or copied into the system), first it examines it with the database of known viruses. If it cannot find any match with the file in known viruses’ database, it tries to comprehend the purpose of the codes and commands in the file. In parallel, Symantec runs the file in a semi Virtual Machine using a CPU emulator and collects all the actions the file will perform on the system. If Symantec judges the file will not harm the system, it marks the file as safe, and it will be eligible to start functioning. In the next phase, when the file is executed by the user or any other means, immediately after execution Symantec examines the process to check that it will not act similar to viruses and this examination is performed periodically in defined intervals.