Post developed in collaboration with Martin Lee, Senior Software Engineer
Our spam boxes are typically full of the usual suspects -- pharmaceutical spam, watch spam, relationship spam, and offers from the family members of ousted African potentates. The obvious solution is to create a block list of key words used in these spam messages. Unfortunately it isn’t that easy.
Consider pharmaceuticals. While the vast majority of emails with pharmaceutical names are spam there are legitimate emails using these key words and in some industries, like health care, these key words are in use daily. Key word based spam filtering creates high numbers of false positives, putting legitimate emails into spam folders. Extending the health care example, these false positives can have any number of negative outcomes from missing a meeting with a pharmaceutical representative to direct impacts on patient care.