Video Screencast Help
Symantec Intelligence

Spammers take advantage of Unicode normalisation to hide URLs

Created: 02 Aug 2011 • Updated: 04 Aug 2011
Nick Johnston's picture
0 0 Votes
Login to vote

by Francisco Pardo and Nick Johnston

Spammers are never idle when it comes to finding new ways to bypass mail filters--after all, this is crucial to a spammer's success.

Recently we've seen a low, but steady, number of spam messages where spammers are replacing characters in URLs (which point to spam sites) with Unicode characters which look similar or identical. This is yet another way of obfuscating URLs in an attempt to make it more difficult to analyse URLs. To understand how this technique works, a bit of knowledge of the Unicode standard is helpful. As well as specifying a large repertoire of characters, Unicode also provides normalisation rules for converting similar and/or equivalent characters to a single form. For example, under various Unicode normalisation forms, an encircled number is considered equivalent to the corresponding ordinary number. This latest spammer obfuscation technique relies on the HTML rendering engine in mail clients (or web browser for web-based email) applying the appropriate Unicode normalisation to URLs.

For example, a spam message contains the following URL:


At first glance, the period or dot might look like a normal dot character, but it has actually been replaced with Unicode character U+2024, "ONE DOT LEADER". The "l" in the top-level domain also appears like a normal Latin letter "l", but is actually Unicode character U+217C, "SMALL ROMAN NUMERAL FIFTY". When a web browser or mail client HTML rendering engine processes this URL, it typically applies Unicode normalization to it, replacing the "ONE DOT LEADER" character with a normal dot, and replacing the "SMALL ROMAN NUMERAL FIFTY" with a normal "l" character, allowing the user to visit the spam site.

The process works as follows:

In a sense, this is similar to IDN (Internationalized Domain Name) homograph attacks where similar-looking Unicode characters are used to lead users to fake sites, often for phishing. However, this technique differs as it involves using similar Unicode characters to obfuscate a site rather than fake or spoof a site. and Symantec Brightmail customers are protected from these attacks by our URL filtering technologies which support handling these characters.