Spammers take advantage of Unicode normalisation to hide URLs
by Francisco Pardo and Nick Johnston
Spammers are never idle when it comes to finding new ways to bypass mail filters--after all, this is crucial to a spammer's success.
Recently we've seen a low, but steady, number of spam messages where spammers are replacing characters in URLs (which point to spam sites) with Unicode characters which look similar or identical. This is yet another way of obfuscating URLs in an attempt to make it more difficult to analyse URLs. To understand how this technique works, a bit of knowledge of the Unicode standard is helpful. As well as specifying a large repertoire of characters, Unicode also provides normalisation rules for converting similar and/or equivalent characters to a single form. For example, under various Unicode normalisation forms, an encircled number is considered equivalent to the corresponding ordinary number. This latest spammer obfuscation technique relies on the HTML rendering engine in mail clients (or web browser for web-based email) applying the appropriate Unicode normalisation to URLs.
For example, a spam message contains the following URL:
At first glance, the period or dot might look like a normal dot character, but it has actually been replaced with Unicode character U+2024, "ONE DOT LEADER". The "l" in the top-level domain also appears like a normal Latin letter "l", but is actually Unicode character U+217C, "SMALL ROMAN NUMERAL FIFTY". When a web browser or mail client HTML rendering engine processes this URL, it typically applies Unicode normalization to it, replacing the "ONE DOT LEADER" character with a normal dot, and replacing the "SMALL ROMAN NUMERAL FIFTY" with a normal "l" character, allowing the user to visit the spam site.
The process works as follows:
In a sense, this is similar to IDN (Internationalized Domain Name) homograph attacks where similar-looking Unicode characters are used to lead users to fake sites, often for phishing. However, this technique differs as it involves using similar Unicode characters to obfuscate a site rather than fake or spoof a site.
Symantec.cloud and Symantec Brightmail customers are protected from these attacks by our URL filtering technologies which support handling these characters.
The Symantec Intelligence Blog published by Symantec.cloud serves as a conduit for communicating Intelligence data, trends and statistics based on analysis of cyber security threats, trends and insights from the Symantec Intelligence team comprised of many world-renowned malware and spam experts. Sitting on the front lines of defense, they have a global view of threats across multiple communication protocols drawn from the billions of web pages, email and IM messages they monitor each day.