Companies in our field of business have long wished for a better way of discovering and describing malware capabilities than the current system. Such a system would be of great benefit to everyone who has to deal with malware and the damage they can cause. While there is currently a whole spectrum of techniques used to discover the functionality of malware, ranging from the most basic to the more advanced, most fall short because they don’t describe the malware in a very complete way.
Many either rely on manual decomposition and analysis or may run samples in physical or virtual machine (VM) environments, then record changes made to the system and report them as side effects of the malware. Each method has its own benefits and drawbacks. Manual analysis is a slow and cumbersome task and prone to human error. Automated side effects collation is faster and requires little or no human input but is often sadly lacking in useful information and completeness. In short, automated malware analysis is a difficult problem to crack and some of the reasons why automated analysis does not deliver as much as we would like it to boils down to a number of factors:
All these factors combine to limit the capability of automated malware analysis.
However, researcher Joshua Saxe is to present at BlackHat today, an open-sourced and crowd-trained machine learning tool that can be used to identify the capabilities of malware files. It claims to be able to generate lists of malware capabilities such as the ability to use particular network protocols and the ability to steal data and so forth. Interestingly, it will give probability scores for the detected capabilities when appropriate, which could mean that for anything that it is uncertain of or looks like it might do, a score will be given so that the reader can know how likely or unlikely the capability is. The project to create this tool is funded by the DARPA Cyber Fast track initiative and algorithms used will be detailed in today’s presentation. It should make for very interesting viewing indeed.
Incidentally, Symantec Security Response have a number of automated systems that analyze and collate malware sample information and capabilities. These systems can perform static and runtime analysis of malware samples and record their side effects. This information is combined with other Symantec data and telemetry sources and then supplied to our customers through our exclusive malware reporting services, providing valuable information to help our customers prevent or recover from malware attacks.