Email Security.cloud

 View Only

Getting spam filtering right: What if the pharmaceutical email is legitimate? 

Nov 08, 2010 03:27 PM

Post developed in collaboration with Martin Lee, Senior Software Engineer

Our spam boxes are typically full of the usual suspects -- pharmaceutical spam, watch spam, relationship spam, and offers from the family members of ousted African potentates. The obvious solution is to create a block list of key words used in these spam messages. Unfortunately it isn’t that easy.

Consider pharmaceuticals. While the vast majority of emails with pharmaceutical names are spam there are legitimate emails using these key words and in some industries, like health care, these key words are in use daily. Key word based spam filtering creates high numbers of false positives, putting legitimate emails into spam folders. Extending the health care example, these false positives can have any number of negative outcomes from missing a meeting with a pharmaceutical representative to direct impacts on patient care.
 

We have not used key word based spam filtering for many years precisely because of the problems of false positives. Our spam filtering consists primarily of identifying the botnets that send spam and ensuring that these sources of spam do not send messages to our users. We have a large honeypot network that analyses current spam in circulation and generates signatures derived from these spams. We use these signatures to detect and block spam sent to our customers. The vast majority of these signatures are URLs or phrases unique to advanced fee fraud emails, single key words are not used as signatures.

We also use Symantec's spam signature system, Brightmail, to further filter email. Again, this primarily uses signatures derived from URLs and phrasing found in current spam rather than key words.

As an additional protection we have a spam heuristic system that we use to block persistent spam that evades our other methods of detection. In this system combinations of various rules are used to detect spam messages. Again, simple key word analysis is not used, however we do search for obfuscation attempts, therefore the string '\/i@gr.a' may trigger a rule. Each rule triggered adds to the score given to an individual email, once a threshold is reached the email is blocked as spam. 

These processes yield a stream of emails that is almost completely clean of spam with very few false positives. We typically exceed our spam blocking service level agreement of 99% by many orders of magnitude with accuracy rates for September of 0.000008% false positives –  1 in 12.5 million emails.

  

Statistics
0 Favorited
0 Views
0 Files
0 Shares
0 Downloads

Tags and Keywords

Related Entries and Links

No Related Resource entered.