Preventing False Positives at Symantec
Symantec goes to great lengths to prevent false positives from occurring. Undoubtedly false positives (FPs) are a concern for all vendors across the antivirus industry. However with as large a user base as Symantec has, we need to set the bar very high. Symantec’s content is used on over 120 million devices around the world so any software defects like a false positive have a much higher chance of being exposed than with a smaller user base.
Given the importance of false positives our quality assurance team is at the forefront of efforts to prevent them. With this in mind we’d like to make available recently completed research in this area. The research is entitled ‘A False Positive Prevention Framework for Non-Heuristic Anti-Virus Signatures’ and is in the form of a case study (based on Symantec). That sounds like a mouthful so let’s break it down! The goal of the research was to develop a high level conceptual structure to help us address the problem posed by false positives...so hopefully this provides a brief explanation for the ‘false positive prevention framework’ bit. The focus on ‘non-heuristic anti-virus signatures’ allowed us to hone in on a given technology that causes false positives. Non-heuristic technology is the source of more false positives than any other anti-virus technology. Also non-heuristic technology is the most common and the most standardised technology across the antivirus industry.
Before addressing a problem it is important to know something about it. So the research began by assessing the root causes of false positives today and in the past. Next it was decided to look at whether all false positives are the same. For example, do all false positives have the same impact on customers? Do they cost the same to Symantec? The most relevant literature and prior research was then referred to. Finally, interviews were held with a number of domain experts at Symantec.
So what were the key things we learnt about the problem? Here are a few of them:
- A false positive is essentially a software defect.
- The cause of most false positives (81%) are clean files entering our workflow systems.
- The cost of removing a defect becomes more expensive the later in the Software Development Lifecycle (SDLC) it is addressed – up to 100 times more in some cases!
- However other research has shown that software defects can and perhaps should be classified according to severity level. Finding and fixing higher severity defects is 100 times more expensive when that defect is found in-field than when found during the requirements stage. For lower severity defects the cost is about twice as expensive. This implies that some lower severity false positives might be acceptable.
- DPP (Defect Prevention Process) is a process geared to preventing the injection of defects into the software process. A core aspect of DPP is root cause analysis of defects so that preventative measures can be taken and implemented to address the causes. It has been leveraged by industry standards such as CMMI, specifically CMMI’s highest maturity level (Level 5).
- False positives from anti-virus technology have a greater impact on the user base than related technologies such as intrusion prevention (IPS) or anti-spam (AS).
- The exponential growth in malicious code has led to an increase in false positives.
- However the tricky job of avoiding false positives by vendors is compounded by the fact that today malware is being designed to look legitimate whereas in the past this wasn’t the case. This presents problems for file classification that wasn’t present in the past.
- The cost of a FP is difficult to quantify. However the cost of a false positive is likely to be linked to its severity level. Interestingly it is thought that the cost of a false positive is less than that of other software quality issues (e.g. performance). The reasons for this are that false positives:
- Don’t persist like other quality problems.
- Tend to be short lived in general, especially once identified.
- Are less common.
- Typically only impact a small portion of the user base versus other quality issues or defects.
- An ideal approach to avoiding false positives is to leverage defect prevention techniques and the application of quality assurance throughout the content or signature generation lifecycle.
- Clean data is the key ingredient that underpins any FP prevention framework (and this is why software white-listing is important to Symantec).
- The risk of false positives is likely to shift to heuristic technologies in the long term.
The above is just a list of some of the findings from the research. Using this information, and the other findings, provided the ability to propose a framework to prevent false positives from non-heuristic antivirus signatures. I won’t go into detail on the proposed framework here, though it can be read directly from the research itself. However in a nutshell, it proposes breaking down the SDLC for signatures into three high level areas (also called ‘Zones’ or ‘Phases’) and lists the key activities that are required in each area in order to prevent false positives, especially high-severity ones. It also calls out for relevant clean data to underpin the framework. The good news is that this ‘theoretical’ framework has largely been implemented at Symantec and has resulted in the virtual elimination of high severity false positives from non-heuristic anti-virus signatures.*
The research paper is located here:
*Without putting a dampener on the piece please note that another finding of the research is that even with these efforts to prevent FPs (especially high severity ones) Symantec “will likely not be completely immune from them.”