So Where is Kln-sna?

Like it or not, our Internet was created to be English-language focused. As the Internet and Web were being developed there was only one…

Wikipedia: [here]

So Where is Kln-sna?

Like it or not, our Internet was created to be English-language focused. As the Internet and Web were being developed there was only one core character set: ASCII. Most of the RFCs defined a character set that was based around ASCII [here], and which only support a limited number of characters. With Unicode [here] we have 16 bits to represent characters, and can then represent almost every character we need.

Representing Unicode in URLs

Our URL infrastructure for domain names is often focused on the ASCII character set. To overcome this Punycode is used to encode Unicode into ASCII characters. It does this with a Letter-Digit-Hyphen (LDH) subset, and where we define the Unicode characters after a hyphen. So let’s try some German city names [Try]:

const punycode = require(‘punycode’);rtn=punycode.encode(‘München’);
console.log(rtn);
rtn=punycode.encode(‘Köln’);
console.log(rtn);
rtn=punycode.encode(‘Düsseldorf’);
console.log(rtn);

The results are:

Mnchen-b078a
Kln-5t7s
Dsseldorf-g674c

If we look at “München”, then we get “Mnchen-b078a”. The hyphen represents the additional characters, and which are encoded using generalized variable-length integers. If we now try “點看” [here] we get:

Message: Dian Kan 
Encode: c1yn36f

Crashing Systems With a Font: The Homograph Attack

A recent vulnerability has been found to crash many Apple iOS devices (such as WhatsApp, Facebook Messenger and Gmail). It derives from a single character from the alphabet of the Telugu language (and which is a Dravidian language and spoken by over 70 million people). The bug was spotted by the Italian blog, Mobile World.

The vulnerability — known as homograph attack (known since 2001) — was found by a Chinese researcher (Xudong Zheng) and is now often used by scammers to trick users in regions of the world. A recent scam used the apple.com domain and was even signed by a valid digital certificate:

This shows that the certificate is valid (as it goes green), but it is not the Apple site. The epic.com site was used as a demonstrator of the vulnerability:

The site looks to be signed by epic.com, but where we see the Common Name (CN) is xn — e1awd7f.com:

It works by replacing the characters with Unicode characters, where quite a few characters act differently when they are processed as a Web address. With Punycode, the “ — “ part defines a prefix which defines that the domain is formatted in ‘ ‘ to represent the Unicode characters:

can crash your device and block access to the Messaging app in iOS, including WhatsApp, Facebook Messenger, Outlook for iOS, Gmail, Safari and Messages for the macOS versions.

Conclusion

Our current Internet is still running many of the basic protocols that were created in the 1980s, and we’ve just patched them. In 2003, the Internationalizing Domain Names in Applications (IDNA) standard defined the usage of non-ASCII characters in domain names, and which supported Punycode.