Punycode is a stan­dard­ized encoding method that allows Unicode char­ac­ters to be mapped using a limited ASCII character set, meaning that in­ter­na­tion­al­ized domain names (IDN) can also contain non-ASCII char­ac­ters such as umlauts, for example.

How was the encoding method developed?

In 2003, Punycode was stan­dard­ized by the Internet En­gi­neer­ing Task Force (IETF) as syntax for encoding In­ter­na­tion­al­ized Domain Names in Ap­pli­ca­tions (IDNA). The IETF defines a domain name as an IDN if it contains special char­ac­ters such as di­a­crit­ics, letters or char­ac­ters that are not found in the Latin alphabet (e.g., umlauts in German). Such char­ac­ters cannot be processed by basic protocols such as the Domain Name System (DNS). For this example, we’ll use a domain name in German. Although, following the in­tro­duc­tion of IDNs, müller-büromöbel (Müller’s office furniture) is allowed under the top-level domain .de, it can only be processed by encoding the non-base char­ac­ters, for example, in the context of name res­o­lu­tion. Numerous internet protocols are based on English and therefore only support the limited ASCII character set.

In order to ensure com­pat­i­bil­i­ty between IDNs and older internet standards, the IETF has pre­scribed a method for encoding in­ter­na­tion­al­ized domain names using the char­ac­ters that were already permitted. This stan­dard­ized encoding procedure is known as Punycode.

Note

For email addresses, Punycode is only used for in­ter­na­tion­al­ized email domains. If the local part (before the @ character) contains non-ASCII char­ac­ters, it is encoded via UTF-8.

How does Punycode encoding work?

An overview of the Punycode process

Punycode is defined by the IETF in RFC 3492 as a possible ap­pli­ca­tion of the general coding algorithm known as Boot­string. The Boot­string algorithm enables the mapping of character strings that comprise arbitrary character sets with a limited selection of elements. The de­vel­op­ment of the coding procedure is based on six prin­ci­ples. In Punycode encoding, these elements are called base char­ac­ters, which consist of lowercase letters, digits, and the hyphen (-). The de­vel­op­ment of the coding method is based on six prin­ci­ples.

  • Com­plete­ness: Each output string can be mapped to a sim­pli­fied string using a boot string.
  • Unique­ness: Assigning the output string to the re­spec­tive Boot­string encoding is unique. Each Punycode can be assigned exactly one ASCII coun­ter­part and vice versa.
  • Re­versibil­i­ty: A Boot­string encoding can be reversed at any time without any in­for­ma­tion loss.
  • Ef­fi­cien­cy: The encoded string is – if at all – only minimally longer than the output string.
  • Sim­plic­i­ty: Boot­string uses simple encoding and decoding al­go­rithms.
  • Read­abil­i­ty: Only char­ac­ters that cannot be rep­re­sent­ed in the target character set are encoded. All other char­ac­ters remain unchanged.

Punycode specifies Boot­string according to the re­quire­ments for in­ter­na­tion­al­ized domain names. This should enable the Unicode char­ac­ters to be mapped via the pre­vi­ous­ly permitted base char­ac­ters.

Punycode example

The following example shows how the encoding works:

IDN: müller-büromöbel

The IDN müller-büromöbel contains the char­ac­ters ü and ö, which are not included in the pre­vi­ous­ly permitted character set for domain names. As a result, they must be encoded via Punycode to ensure com­pat­i­bil­i­ty.

Step 1: Nor­mal­iza­tion

In the first step, the encoding procedure enables nor­mal­iza­tion of the output character string. All uppercase letters are replaced by cor­re­spond­ing lowercase letters.

Step 2: Erad­i­ca­tion of all non-basic char­ac­ters

In the second step, all non-basic char­ac­ters are erad­i­cat­ed. These are then added to the domain name in coded form and separated by a hyphen.

If the Punycode syntax is used to encode internet addresses, each result string is provided with an ACE prefix (short for ASCII-com­pat­i­ble encoding):

ACE prefix: xn–

The ACE prefix ensures that domain names con­tain­ing hyphens are not mis­in­ter­pret­ed as in­ter­na­tion­al domain names.

This results in the following encoding for the IDN müller-büromöbel:

ACE: xn–mller-brombel-rmb4fg

The algorithm un­der­ly­ing the Punycode procedure is re­mark­able. It ensures that, despite the con­ver­sion, domain labels don’t exceed the maximum length of 63 char­ac­ters.

During the encoding process, Unicode char­ac­ters are not converted one-to-one into ASCII char­ac­ters. Instead, the algorithm de­ter­mines a string based on the distance between the erased char­ac­ters and the position of the char­ac­ters in the output string.

Related to the example shown above, the string rmb4fg indicates that mller-brombel must be sup­ple­ment­ed by the Unicode char­ac­ters ü and ö in the second and seventh position.

Image: Overview of sections of the ACE string
The ACE string consists of the ACE prefix and a puny-coded string.

Ex­cep­tions to the rule

De­vi­a­tions occur if the domain name doesn’t contain any non-base char­ac­ters or if it only contains non-base char­ac­ters.

A domain name that contains only non-base char­ac­ters shows only the encoded string and the ACE prefix after being encoded. A domain name such as παράδειγμα (Greek for “example”) cor­re­sponds to the following encoding:

IDN: παράδειγμα

ACE: xn–hxajbheg2az3al

If a domain name contains only base char­ac­ters, Punycode is not used. Ac­cord­ing­ly, no ACE prefix is appended. Coding is not necessary in this case because basic internet protocols can already un­der­stand the domain name.

If you consider the Fully Qualified Domain Name (FQDN) as a whole, each label (top-level domain, second-level domain, third-level domain, etc.) is encoded sep­a­rate­ly. A domain likeпример.бг (Bulgarian for “example.bg”) could be encoded as follows

IDN: пример.бг

ACE: xn–e1afmkfd.xn–90ae

The following table gives an overview of the different variants of the Punycode syntax.

IDN Punycode ACE
Base & non-base char­ac­ters müller-büromöbel.de mller-brombel-rmb4fg.de xn--mller-brombel-rmb4fg.de
Only non-base char­ac­ters Παράδειγμα.gr hxajbheg2az3al.gr xn--hxajbheg2az3al.gr
Only base char­ac­ters example.org example.org No use
Note

The Punycode algorithm is described in detail in RFC 3492. In addition, the document provides an im­ple­men­ta­tion of the coding procedure in the pro­gram­ming language C.

Users usually resort to freely available Punycode con­vert­ers for encoding in­ter­na­tion­al­ized domain names.

Puny encoding with emoji domains

Not only in­ter­na­tion­al­ized domain names but also emoji domains can be realized via Punycode. For this to work however, the top-level domain, has to permit the use of emojis, and the desired emoticon needs to be in the Unicode standard.

Tip

At the moment, the following TLDs allow emoji domains to be reg­is­tered: .ws, .tk, .to, .ml, .ga, .cf, .gq, and .fm.

Emoji domains are tech­ni­cal­ly processed as Punycode, but in theory should be presented to the user as a com­bi­na­tion of text and emoticons.

Emoji domain: https://i❤.ws/

ACE: https://xn--i-7iq.ws/

Prac­ti­cal­ly no standard browser im­ple­ments this at present. If you enter an emoji domain in Firefox, Chrome, Safari, Edge, or Opera, the address bar only shows the ACE string.

Are there free Punycode con­vert­ers?

Free Punycode gen­er­a­tors that transfer IDNs into an ASCII-com­pat­i­ble form can be found on various websites. One example is Punycoder.

Image: Punycoder, the Punycode converter
Punycoder converts Punycode to Text/Unicode and vice-versa.

For IDNs of other TLDs, the Punycode converter by Mathias Bynens based on punycode.js is a good choice.

Image: The Punycode converter made by Mathias Bynens based on punycode.js
With his Punycode domain name converter, Mathias Bynens offers an open-source tool for con­vert­ing in­ter­na­tion­al­ized domains.
Domain Name Reg­is­tra­tion
Build your brand on a great domain
  • Free Wildcard SSL for safer data transfers 
  • Free private reg­is­tra­tion for more privacy
  • Free Domain Connect for easy DNS setup

Does Punycode pose a security risk?

Punycode becomes a security risk in the case of ho­mo­graph­ic phishing – cy­ber­at­tacks where criminals use the similar ap­pear­ance of different char­ac­ters to lure un­sus­pect­ing victims to fake websites. Blogger Xudong Zheng shows what a phishing attack looks like using the following Punycode domain https://www.xn--80ak6aa92e.com/ as an example. This leads internet users to a website with the following IDN: https://www.аррӏе.com/

The URL provided is not the official website of the Cal­i­for­nia tech­nol­o­gy company Apple Inc., but a phishing website created for demon­stra­tion purposes.

Instead of the ASCII character a with Unicode U+0061, the Cyrillic а (U+0430) is used – these two char­ac­ters can hardly be dis­tin­guished by the naked eye but are in­ter­pret­ed as different char­ac­ters by web browsers. Even cer­tifi­cates cannot provide security to protect internet users. For modern phishing campaigns, criminals create valid SSL cer­tifi­cates with the goal of making their websites look authentic.

Current versions of Chrome and Opera prevent phishing attacks like these by dis­play­ing the ACE string instead of the in­ter­na­tion­al­ized domain on IDNs that mix char­ac­ters from different character sets. Internet Explorer and Microsoft Edge prevent domains like these from being accessed. Firefox, however, does not offer any pro­tec­tion against Punycode phishing.

Image: Example of a homographic attack
Example of a ho­mo­graph­ic domain: The URL looks the same as Apple’s official website, however, the Unicode character U+0430 is actually a Cyrillic letter that is as­ton­ish­ing­ly similar to the ASCII character a.

This is how Firefox users can protect them­selves. In order to reduce the risk that phishing websites pose, Firefox users currently only have the option to prevent Punycode from being trans­lat­ed into IDNs in general. Only two steps are necessary for this temporary solution:

  1. Access the con­fig­u­ra­tion editor: Type about:config in the address bar of your web browser to open the Firefox con­fig­u­ra­tion editor.
  2. Force Punycode: Find the setting network.IDN_show_punycode and change its value from false to true.

After con­fig­u­ra­tion, Firefox will display in­ter­na­tion­al­ized domains in the address bar as ACE strings.

Domain Checker
Go to Main Menu