by Francisco Pardo and Nick Johnston
Spammers are never idle when it comes to finding new ways to bypass mail filters—after all, this is crucial to a spammer's success. Recently, we've seen a low but steady number of spam messages in which spammers are replacing certain characters in URLs (which point to spam sites) with Unicode characters that look similar or identical. This is yet another way of obfuscating URLs in an attempt to make it more difficult to analyze them.
To understand how this technique works, a bit of knowledge of the Unicode standard is helpful. As well as specifying a large repertoire of characters, Unicode also provides normalization rules for converting similar and/or equivalent characters to a single form. For example, under various Unicode normalization forms, an encircled number is considered equivalent to the corresponding ordinary number. This latest spammer-led URL obfuscation technique relies on the HTML-rendering...