According to the in­ter­na­tion­al telecom­mu­ni­ca­tions union (ITU), more than three billion people use the World Wide Web and in­creas­ing­ly so in their mother tongues. This change was in part brought on by the in­tro­duc­tion of in­ter­na­tion­al domain names in 2003. We’ll explain how IDN domains work.

What is an in­ter­na­tion­al­ized domain name (IDN)?

The IETF (Internet En­gi­neer­ing Task Force) refers to IDNs as domain names that contain special char­ac­ters that are not part of the Latin alphabet, such as umlauts or char­ac­ters from other alphabets. However, the Domain Name System (DNS), which is re­spon­si­ble for trans­lat­ing URLs into IP addresses, cannot un­der­stand these domain names. The DNS is based on the limited standard character set ASCII.

In order to make IDNs un­der­stand­able for the DNS as well as other internet protocols, the internet standard In­ter­na­tion­al­iz­ing Domain Names in Ap­pli­ca­tions (IDNA) was created in 2003. This defines a stan­dard­ized trans­la­tion from Unicode to ASCII, therefore enabling the use of non-ASCII char­ac­ters in domain names.

Domain Name Reg­is­tra­tion
Build your brand on a great domain
  • Free Wildcard SSL for safer data transfers 
  • Free private reg­is­tra­tion for more privacy
  • Free Domain Connect for easy DNS setup

How does IDNA work?

Much of the internet’s in­fra­struc­ture is only supported by the ASCII character set. In order to make sure that in­ter­na­tion­al domain names can be processed, each IDN that’s available in Unicode is trans­lat­ed into an ACE string, which is based on ASCII. Following this, URLs featuring char­ac­ters with accents or umlauts are displayed. The server, on the other hand, continues to process the addresses as ASCII com­pat­i­ble. This procedure is specified in the IDNA2003 internet standard and in the IDNA2008 revision, which was approved in 2010. Trans­lat­ing from Unicode to ASCII occurs client-side (in the browser, email program, etc.) and is based on a stan­dard­ized coding process called Punycode.

Punycode

The RFC 3492-stan­dard­ized Punycode was developed for clearly dis­play­ing Unicode character strings as ASCII symbols without loss of quality. All non-ASCII char­ac­ters are removed from the domain name, encoded and separated with a hyphen. This code sequence contains in­for­ma­tion about the Unicode symbol in question as well as its position in the domain name. Ad­di­tion­al­ly, each ACE string created in this way is labeled with the prefix xn–. This clarifies to the reader that the character sequence is an IDN that has been encoded according to IDNA and Punycode standards. See our article on Punycode for a detailed ex­pla­na­tion of the encoding process as well as some examples.

Tip

With an online IDN domain converter, you can convert IDNs to their cor­re­spond­ing ACE strings using Punycode.

Dif­fer­ences between IDNA2003 and IDNA2008

For the original 2003 procedure, in­ter­na­tion­al­ized URLs were nor­mal­ized prior to Punycode encoding using the nameprep method. This method changed capital letters into lowercase letters, removed control char­ac­ters and trans­ferred equiv­a­lent char­ac­ters into a unified form. Nameprep was removed from this process when IDNA2008 was in­tro­duced. Now, IDNA does not specify any nor­mal­iza­tion. Instead, it rec­om­mends an algorithm that converts capital letters into lowercase ones.

This adaption also ac­com­mo­dates users in the German-speaking world, since the Unicode character “ß”, which is common in Germany, was orig­i­nal­ly defined as the equiv­a­lent of “ss” according to IDNA2003. Domains such as www.fußball-ergebnisse.de were thus au­to­mat­i­cal­ly nor­mal­ized to www.fussball-ergebnisse.de in the nameprep process. This is no longer the case since IDNA2008 came into the picture. Since 2010, the “ß” is correctly in­ter­pret­ed as “Latin small letter sharp s” and can be reg­is­tered as part of an IDN domain.

In addition, around 8,000 char­ac­ters that were possible in domain names under IDNA2003 are no longer supported under IDNA2008. Four char­ac­ters including “ß” are in­ter­pret­ed dif­fer­ent­ly since the standard was revised. For a detailed dis­cus­sion of the dif­fer­ences between IDNA2003 and IDNA2008, see Unicode Technical Standard #46. The following table provides a summary of the main dif­fer­ences:

IDNA2003 IDNA2008
Nameprep procedure required No nor­mal­iza­tion specified
Valid for Unicode 3.2 Valid for Unicode versions from 5.2 onwards
Strict rules for right-to-left fonts Clearer rules for right-to-left fonts
Upper- and lower-case letters are con­sid­ered as separate char­ac­ters Upper-case letters are converted to lower-case letters
Many symbols are pro­hib­it­ed, e.g., graphic symbols that do not belong to any alphabets, as well as some punc­tu­a­tion
“Remapping” removed from some Unicode char­ac­ters, as this could lead to ir­reg­u­lar­i­ties

What problems are there with IDNs?

By now, all common internet programs should be able to un­der­stand IDN. However, problems with in­ter­na­tion­al­ized domain names sometimes occur because the switch from IDNA2003 to IDNA2008 has not yet been con­sis­tent­ly im­ple­ment­ed. One example that’s prob­lem­at­ic for German is the different in­ter­pre­ta­tion of “ß”. Since IDNA2003 com­pul­so­ri­ly converts “ß” to “ss”, special ß domains that can be reg­is­tered according to IDNA2008 are often not dis­cov­er­able for systems that convert according to the outdated standard. Instead, users are directed to the cor­re­spond­ing domain con­tain­ing “ss”. This problem can be cir­cum­vent­ed by website operators reg­is­ter­ing both variants and redi­rect­ing the second domain to the pri­or­i­tized spelling using a domain redirect.

Domain Checker
Go to Main Menu