Punycode is defined by the IETF in RFC 3492 as a possible application of a general coding algorithm known as a boot string. The bootstring algorithm enables character strings of arbitrary character sets with a limited selection of elements to be mapped. The development of the coding procedure is based on six principles:
- Completeness: Each output string can be mapped to a simplified string using a boot string.
- Uniqueness: Assigning the output string to the respective boot string coding is unique. Each Punycode can be assigned exactly one ASCII counterpart and vice versa.
- Reversibility: Coding by boot string can be reversed at any time without any information loss.
- Efficiency: The encoded string is – if at all – only minimally longer than the output string.
- Simplicity: Bootstring uses simple coding and decoding algorithms.
- Readability: Only those characters are coded that cannot be represented in the target character set. All other characters remain unchanged.
Punycode specifies the bootstring according to the requirements for internationalized domain names. This should enable the Unicode characters to be mapped via the previously permitted base characters.
We illustrate the coding with the following example.
IDN: müller-büromöbel
The IDN müller-büromöbel contains two characters with ü and ö, which are not included in the previously permitted character set for domain names and must therefore be encoded via Punycode to ensure compatibility.
In the first step, the coding procedure provides for a normalization of the output character string. All uppercase letters are replaced by corresponding lowercase letters.
In the second step, all non-basic characters are eradicated. These are then added to the domain name in coded form and separated by a hyphen.
If the Punycode syntax is used to encode internet addresses, each result string is provided with a so-called ACE prefix (short for ASCII-compatible encoding):
ACE prefix: xn--
The ACE prefix ensures that domain names containing hyphens are not misinterpreted as international domain names.
This results in the following coding for the IDN müller-büromöbel:
ACE: xn--mller-brombel-rmb4fg