This Article is Brought to You By the Letter ノ

Published: 2015-10-30
Last Updated: 2015-10-30 13:46:08 UTC
by Johannes Ullrich (Version: 1)
0 comment(s)

Recently, I managed to register the domain name "comノindex.jp". This domain name uses the japanese "ノ" character, which looks somewhat like a slash typically used at the end of the domain name. As a result, an unsuspecting user may mistake the host name "example.comノindex.jp" for the "index.jp" page at "example.com". 

International domain names and look alikes are nothing new. As a result, registrars as well as browsers implemented various safeguards. But even with these safeguards, it is still possible to come up with creative domain names. Even without international characters, we do see "typo squatting" domains like "rnicrosoft" (this is "r" and "n" instead of "m"). There are a number of tools available that are trying to find all look alike domains. For example, Domaintools provides a simple online tool [1]. Some companies attempt to register all look-alike domains. But a domain like "comノindex.jp"  could be used to impersonate arbitrary .com domain names.

The DNS protocol does not understand anything but "plain ASCII". To encode IDNs, "punycode" is used. Punycode encoded domain names start with xn--, followed by all the ASCII letters in the domain name, followed by a dash and the international letters in an encoded format. For example, my domain encodes to xn--comindex-634g.jp. To mitigate the risks of IDNs, some browsers use punycode to display the domain name if they consider it "invalid".

Punycode and other related standards are described in a document commonly referred to as IDNA2008 (International Domain Names for Applications, 2008) and this document is reflected in RFC 5890-5895. You may still find references to an earlier version in RFCs 3490-3492. The RFCs mention some of the character confusion issues, but for the most part, refer to registrars to apply appropriate policies.

Similarly, there is no clear standard for browsers. Different browsers implement IDNs differently.

Safari: Safari redners most international characters with few exceptions. For example cyrillic and greek characters are excluded as they are particularly easily confused with English characters [2]

Firefox: Firefox maintains a whitelist of top level domains for which it will render international characters. See "about:config" for details. .com is not on the whitelist by default, but .org is. Country level TLDs are on the whitelist.

Chrome: Chrome's policy is a bit more granular [3]. 

Internet Explorer: Similar to chrome. Also, international characters are only supported if the respective language support is enabled in Windows [4]. The document on Microsoft's MSDN website was written for Internet Explorer 7, but still appears to remain valid.

Microsoft Edge: I couldn't find any details about Microsoft Edge, but it appears to follow Internet Explorer's policy.

And finally here is a quick matrix what I found users reporting with my test URL:

Chrome: displays punycode.
Firefox: displays Unicode
​Safari: displays Unicode (users of Safari on OS X < 10.10 report seeing punycode)
Opera: only a small number of Opera users participated, most reporting Unicode.
Internet Explorer: displays punycode

Mobile browsers behave just like the desktop version. E.g. Google Chrome on Android does not display Unicode, but Safari on iOS does.

For summaries of Unicode security issues, also see http://unicode.org/faq/security.html and https://www.owasp.org/index.php/Canonicalization,_locale_and_Unicode (among other OWASP documents)

[1] http://research.domaintools.com/buy/domain-typo-finder
[2] https://support.apple.com/kb/TA22996?locale=en_US&viewlocale=en_US
[3] https://www.chromium.org/developers/design-documents/idn-in-google-chrome
[4] http://msdn.microsoft.com/en-us/library/bb250505(VS.85).aspx

​NB: Sorry for any RSS feeds that the title may break.

---
Johannes B. Ullrich, Ph.D.
STI|Twitter|LinkedIn

 

Keywords:
0 comment(s)

Comments


Diary Archives