UTF-8 is method of Unicode character encoding, which claims to cover all modern languages for data processing. The format has significantly influenced international digital communication. But what exactly does “UTF-8” mean? And what is special about UTF-8 in the Unicode character set? Here you can find out which structure the coding is based on, and which bytes are permitted in the standard...UTF-8: the network standard
The ASCII code encodes characters to specify their representation by electronic devices such as PCs. For this purpose, the individual characters are converted into binary, decimal, and hexadecimal values that the computer can process.
Hosting with IONOS — Fast, Flexible, Secure
Discover fast and secure hosting for any project.
We've got your code covered, whether you're on the backend, frontend, or the frontlines.
What is ASCII?
ASCII is a standard for the representation of characters by electronic devices. To get a better understanding of what this means it helps to be aware of how a computer works in the first place. In a computer, computing processes are always based on the binary system. That means: ones and zeros determine the processes of a computer. ASCII too is based on this system. The original ASCII standard defines different characters within seven bits – that is, seven digits showing either a 0 or a 1.
Character encoding is the American Standard Code for Information Interchange, and is the US precursor to ISO 646 (internationally defined character sets). ASCII is a 7-bit code, meaning that 128 characters (27) are defined. The code consists of 33 non-printable and 95 printable characters and includes both letters, punctuation marks, numbers, and control characters.
The eighth bit, which is one full byte, is traditionally used for checking purposes. The ASCII-based extended versions use this exact bit to extend the available characters to 256 (28).
The original purpose of the eighth is to check the data for errors. The “parity” bit allows the bit sequence receiver to detect inconsistencies. However, the only visible aspect is what occurred, not the cause of the error. This makes the parity check fairly unsuitable for correcting errors.
Each character corresponds to a seven-digit sequence of zeroes and ones, which can then be represented as a decimal number, or as a hexadezimal number. The ASCII characters can be divided into several groups.
- Control Characters (0–31 & 127): Control characters are not printable characters. They are used to send commands to the PC or the printer and are based on telex technology. With these characters, you can set line breaks or tabs. Today, they are mostly out of use.
- Special Characters(32–47 / 58–64 / 91–96 / 123–126): Special characters include all printable characters that are neither letters nor numbers. These include punctuation or technical, mathematical characters. ASCII also includes the space (a non-visible but printable character), and therefore, does not belong to the control characters category, as one might suspect.
- Numbers (30–39): These numbers include the ten Arabic numerals from 0-9.
- Letters(65–90 / 97–122): Letters are divided into two blocks, with the first group containing the uppercase letters and the second group containing the lowercase.
To convert characters to ASCII code effortlessly, it’s worth consulting the ASCII table, which contains the binary, decimal, and hexadecimal values for each character.
Example: ASCII codes
In ASCII, the system converts binary numbers into printable and non-printable characters according to a specified standard.
If you take a look at the ASCII table, you’ll find the characters represented for various numeric values.
The binary number 01000001 can be written decimally as 65, hexadecimally as 41. The character encoded with this number is an “A”. If you now count down further, you will find the uppercase letters listed in alphabetical order. So, the word “ASCII” would correspond to the following numerical values:
Using Windows, you can enter Unicode characters – thus, ASCII characters – using a key combination. To do this, hold down the Alt key and enter the decimal value of the character using the number pad on the keyboard.
ASCII code: benefits and areas of application
ASCII is still widely used today, even though UTF-8 has become more important when presenting a text. However, Unicode has only been displacing the old character encoding method used during the early days of the internet since 2008. The advantage of using UTF-8 is that the code is almost backwardly compatible: ASCII is a subset of UTF-8, so the first 128 characters are identical. Since ASCII can be considered the lowest common denominator of most new encoding forms, the old encoding method is still used in emails and URLs.
Users can now use Unicode when creating emails and even domains can use umlauts thanks to Internationalized Domain Names. However, in both cases, text must be converted to ASCII before transmission. This is usually done automatically and users won’t notice anything.
In addition, ASCII has long been used for artistic purposes as well as technical ones: ASCII art uses exclusively printable ASCII code characters to produce creative works. The spectrum ranges from lettering to simple stick figures, to real paintings. ASCII artists use the different brightness levels of individual characters to create light and shade in their artworks.
Brief history of ASCII codes
The American Standards Association (ASA, now known as ANSI for “American National Standards Institute”) approved the American Standard Code for Information Interchange (ASCII) back in 1963. This set out binding specification for how electronic devices should represent characters. Since the standard is US-American, it is often referred to as US ASCII.
Its predecessors include Morse code and the codes used in telexes, where a standardized code (e.g. a fixed sequence of acoustic signals) is translated into text. Since computers cannot handle our alphabet, because their internal processes are based on the binary system, ASCII was introduced.
To this day, the standard has rarely been changed to adapt to new requirements. For example, extended versions exist that use an eighth bit so that national peculiarities such as the German umlauts (ä, ö and ü) can be represented. Latin-1 (ISO 88591-1), which is still popular in Germany, is based on the ASCII code.
However, it is still not possible to switch between the Latin alphabet and, for example, Arabic characters. To this end character sets based largely on Unicode, such as UTF-8, are now well-established. Unicode provides space for more than a million different characters. UTF-8 is also compatible with ASCII, encoding the first 128 characters in the same way.