Internationalized Domain Names can be thought as extensions of the traditional Latin-script ASCII-encoded domains, such as example.com, that we are a

Internationalized Domain Names (Punycode Domains) — Latent Threats?

submited by
Style Pass
2021-09-24 14:30:18

Internationalized Domain Names can be thought as extensions of the traditional Latin-script ASCII-encoded domains, such as example.com, that we are accustomed to. IDNs allow unicode charaters and thus a much wider array of characters from local scripts that use diacritics and ligatures, which cannot be directly rendered in ASCII.

The DNS "hostname rule" requires domains to be in ASCII before being stored within it. Therefore, an IDN such as apṗlê.com can be represented as an ASCII string using punycode transcription, resulting in: xn--apl-hma7778a.com

In the olden days of yore, looong before IDNs, the "LDH" (aka Letter-Digit-Hyphen) hostname convention reigned over the DNS and only permitted ... err ... letters, digits and hyphens within domains. To support the various major global dialects in their native writing (scripts), IDNs were fronted. Originally proposed in 1996, IDNs were formally introduced circa 2003 (christened "IDNA2003") after the implementation guidelines version 1.0 was published. The latter was then revised in 2008 ("IDNA2008"), approved in 2010 and still is the current recommended implementation. However, IDNA2008 disallowed around 8000 characters that used to be valid per IDNA2003 including all uppercase characters, full/half-width variants, symbols, and punctuation. Such teething issues, backward compatibility included, could have driven IDN owners up the wall but their seemingly meagre adoption worldwide, at the time, allowed a conflict-free transition.

To date, the scripts allowed stand at 23 by count, representing 37 languages (a script is a set of characters used to write one or multiple languages). The scripts include: Arabic, Armenian, Bengali, Cyrillic, Devanagari, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Lao, Latin, Malayalam, Oriya, Sinhala, Tamil, Telugu, and Thai.

Leave a Comment