Nowadays, many de­vel­op­ers are using dis­trib­uted systems like cloud platforms. As a result, scalable clusters are replacing in­di­vid­ual databases, and IT managers are grappling with new chal­lenges. Hot topics include network failures, delays, finite data through­put, small system com­po­nents, and transport security.

One possible solution is to set up a central location for change­able data that is fail-safe, fault-tolerant, and con­sis­tent. That’s where etcd comes in.

Compute Engine
The ideal IaaS for your workload
  • Cost-effective vCPUs and powerful dedicated cores
  • Flex­i­bil­i­ty with no minimum contract
  • 24/7 expert support included

What is etcd?

etcd is a dis­trib­uted key-value store developed by the CoreOS team. Like many other tools that run in a Docker en­vi­ron­ment, etcd was written in the Google pro­gram­ming language, Go. The de­vel­op­ers’ goal was to create a secure storage space for critical data in dis­trib­uted ap­pli­ca­tions, with straight­for­ward man­age­ment functions.

The name comes from the con­ven­tion used to name con­fig­u­ra­tion files in GNU/Linux operating systems: “/etc”. The extra letter “d” stands for “dis­trib­uted”. etcd is now open source and is managed by the Cloud Native Computing Foun­da­tion.

How does etcd work?

To un­der­stand etcd, you need to be familiar with three key concepts relating to storage man­age­ment and clusters:

  • Leader
  • Elections
  • Terms

In Raft-based systems, the cluster elects a leader for a given term. The leader processes all the storage requests that require the consensus of the cluster. Requests that do not require cluster consensus (reads, for example) can be answered by any member of the cluster. If the leader accepts a change, etcd makes sure it repli­cates the in­for­ma­tion to the follower nodes. Once the followers have confirmed receipt, the leader commits the changes.

This kind of system, where a leader uses an etcd database to co­or­di­nate changes with the nodes in the cluster, is very valuable in dis­trib­uted ap­pli­ca­tions. If changes would affect how a node operates, the node can block them. This ensures that the ap­pli­ca­tion remains stable and minimizes related problems.

If a leader dies or fails to respond within a certain time, the remaining nodes in the cluster elect a new leader. Each node has a random timeout setting that de­ter­mines how long it waits until calling for a new election and declaring itself as a candidate. These timeout settings are con­trolled by a special timer within each node and are designed to allow a new node to become the leader as quickly as possible.

To ensure that there is always a majority in an election, there must be an odd number of nodes in the cluster. For per­for­mance reasons, clusters should not have more than seven nodes.

Tip

You can run etcd on a laptop or in a simple cloud system to try it out. It’s highly rec­om­mend­ed to use an SSD, because etcd writes data to the hard disk. For pro­duc­tion, refer to the guide­lines in the official doc­u­men­ta­tion.

Ad­van­tages of etcd

As well as ensuring that ap­pli­ca­tions are stable, etcd offers numerous other ad­van­tages:

  • Full repli­ca­tion: The entire store is available on each node in the cluster.
  • High avail­abil­i­ty: etcd databases are designed to avoid single points of failure if hardware or network issues arise.
  • Con­sis­ten­cy: Each read returns the latest write access across all hosts.
  • Ease of use: etcd has a well-defined, user-friendly API (gRPC) that is based on REST and JSON.
  • Security: etcd au­to­mat­i­cal­ly im­ple­ments secure transfers via SSL/TLS and offers optional client cer­tifi­cate au­then­ti­ca­tion.
  • Speed: etcd has a base speed of 10,000 writes per second.
  • Re­li­a­bil­i­ty: The Raft algorithm ensures that the store is always dis­trib­uted correctly.

An etcd example: The key-value store in practice

De­vel­op­ers im­ple­ment­ed etcd in Ku­ber­netes in 2014, leading to a rapid expansion of the etcd community. Cloud providers like AWS, Google Cloud Platform, and Azur followed suit and suc­cess­ful­ly in­te­grat­ed etcd in their pro­duc­tion en­vi­ron­ments.

But let’s go back to the first etcd example, Ku­ber­netes. Ku­ber­netes itself is a dis­trib­uted system that runs on a cluster of several machines. This means it has a lot to gain from a dis­trib­uted data store like etcd that keeps critical data safe. Within Ku­ber­netes, the etcd database acts as the primary data store that contains the con­fig­u­ra­tion data, status, and metadata. When changes are requested, etcd makes sure that all the nodes in the Ku­ber­netes cluster can read and write the data. At the same time, it uses a “watch function” to monitor the actual and ideal state of the system. If the two states diverge, Ku­ber­netes makes the necessary changes to reconcile them.

Note

The “kubectl” command retrieves read values from the etcd database, and changes made with “kubectl apply” create or update entries in the etcd store. System crashes also au­to­mat­i­cal­ly change values in etcd.

Go to Main Menu