Security is of paramount im­por­tance in many in­dus­tries when it comes to pro­cess­ing data. Companies handling important business processes are reliant on data stability, provided  from the likes of a hosting service that stores their customers’ in­for­ma­tion. If there’s a serious memory error, it’s not just a financial loss that occurs, but a company’s position on the market can be seriously weakened if worst comes to worst. The more memory that’s stored, the more likely it is that errors will occur. This is why it’s so important to place great emphasis on com­pre­hen­sive pro­tec­tion of data in work and server en­vi­ron­ments that require high data integrity. For example, ECC RAM is used in place of ordinary memory in order so that single-bit errors can be prevented.

ECC RAM: back­ground and de­f­i­n­i­tion

Random Access Memory (RAM) is a storage medium used in computer systems as a memory. It’s also known as the main memory and is re­spon­si­ble for the execution of programs including the resulting user data. The volatile contents of the main memory are stored as binary code, which consists solely of zeros and ones, which makes it easier for the computer to process them. A single binary digit is called a 'bit'. These various causes

  • Voltage vari­a­tions,
  • Over­clock­ing,
  • Defective and old storage modules,
  • or energetic emission

can lead to a bit error whereby memory entry is changed. This is where a bit assumes the wrong value, i.e. '1' instead of '0' and vice versa. This is hardly no­tice­able in many ap­pli­ca­tions. If a bit error occurs, for example, when working with an image-editing program, one pixel might receive a different color, which isn’t no­tice­able to the human eye. On the other hand, it is quite different in complex databases or cal­cu­la­tion ap­pli­ca­tions where a single bit error can lead to fatal con­se­quences. In addition, a bit error can cause system crashes when it occurs in a part of the memory used by the operating system.

The simple solution to the problem is error cor­rect­ing code (ECC). This is a data code which has the ability to detect and correct single bit errors. In addition, ECC can detect rare two-bit errors. In order to benefit from this error cor­rec­tion method, ordinary RAM modules are extended by an ECC memory chip, which is where ECC RAM comes into play.

How the error cor­rec­tion process works

The error cor­rec­tion process for single-bit errors (which is used for RAM modules) was developed in 1950 by the math­e­mati­cian, Richard Hamming, which is why the code is called the Hamming code. The special feature of this code is that several parity bits are used. They are also known as control bits and form different val­i­da­tion groups with the actual useful bits. If you want to use the Hamming code for single-bit error cor­rec­tion, you require a seven-digit binary code, con­sist­ing of three parity bits (P), four useful bits (N), and three val­i­da­tion groups. The parity bits are thereby set to the code word positions, whose number is a power of 2, in this example, 1, 2, and 4:

The val­i­da­tion groups of the parity bits of the received bit sequences are compared with the stored bit sequences. An error will always occur when the total number of bits with the value 1 is odd.

Applied to the exemplary bit sequence 0001001, the Hamming code de­ter­mines the error as follows:

  • The val­i­da­tion group of parity bit 1 (1, 3, 5, 7) contains a bit with the value 1 and is therefore incorrect.
  • The val­i­da­tion group of parity bit 2 (2, 3, 5, 7) contains a bit with the value 1 and is therefore incorrect.
  • The val­i­da­tion group of parity bit 3 (4, 5, 6, 7) contains a bit with the value 1 and is therefore correct.

Since code word position 3 is present in the first two incorrect val­i­da­tion groups, this is where the error is. The correct bit sequence is 0011001.

ECC RAM – also suitable for personal use?

ECC fully protects the main memory against single bit errors and thereby prevents a large portion of possible data storage errors. Closely linked to this is the reduction of system crashes, which is par­tic­u­lar­ly important for services or ap­pli­ca­tions that guarantee high avail­abil­i­ty and have to serve a large number of users. These ECC RAM ad­van­tages ensure that the special memory modules are par­tic­u­lar­ly required as a server RAM solution and are part of the com­pul­so­ry program in high-per­for­mance centers.

ECC RAM has minor dis­ad­van­tages, however, compared to non-ECC RAM: on the one hand, the error-cor­rect­ing memory modules are somewhat more expensive than the usual working memory modules, and the error detection process leads to an average 2% decrease in the system’s per­for­mance. Also, ECC RAM is not supported on all main­boards. So, if you plan on using ECC RAM on a normal board, you should first check the com­pat­i­bil­i­ty and assess the benefits. A com­bi­na­tion of ECC RAM and non-ECC RAM is not possible. By default, your personal computer or server comes with an ordinary working memory module without error cor­rec­tion.

Go to Main Menu