Characters can be coded in various ways. While today, UTF-8 is used a lot, UTF-16 encoding was previously popular – and is still often used today. UTF-32 is also used sometimes. Unlike with UTF-8, however, encoding with a larger number of bits per character requires the order of bytes to be known.
With UTF-8 encoding, each character can be presented within one byte (i.e. 8 bits). With UTF-16 on the other hand, you need two bytes (so 16 bits) to encode a character. In order for the character to be interpreted correctly, it must be clear whether the bytes are read from left to right or from right to left. Depending on this, a completely different value is created.
- From left to right: 01101010 00110101 is 6a35 in hexadecimal notation
- From right to left: 01101010 00110101 is 356a in hexadecimal notation
When looking at this number sequence in the context of a Unicode table, two completely different characters would be displayed. The first form of interpretation is known as Big Endian (BE), and the second as Little Endian (LE). The reason for this is that with Big Endian, the higher value is indicated first, and with Little Endian, the lower value is indicated first.