Docker (The container technology)
Docker is a technology for container-based virtualization of software applications. Docker’s mainstream container-based approach has transformed application development in recent years. It has affected all the different areas of development, including how applications and components are built, how software services are distributed, and moving them from development to production. With Docker, all these processes run differently than they did before.
But more has changed than just the development processes – the software architecture has, too. It has moved away from monolithic overall solutions and toward clusters of coupled lightweight “microservices”. This has in turn rendered the resulting overall systems more complex. In recent years, software like Kubernetes has become established for managing multi-container applications.
The development of container-based virtualization is far from over, so it remains an exciting field. In this article, we will explain how Docker works as an underlying technology. Furthermore, we will look at what motivated the development of Docker.
The name “Docker” has several meanings. It is used as a synonym for the software itself, to designate the open source project on which it is based, and a U.S. company that operates various products and services commercially.
- A brief history of Docker
- What is Docker?
- How does Docker work?
- What are the advantages of Docker?
A brief history of Docker
The software originally released under the name “Docker” was built based on Linux Container (LXC) technology. LXC was later replaced by the Docker’s own libcontainer. New software components have been added as Docker has continued to grow and become the standard for container-based virtualization. Things like containerd, a container runtime with the default implementation runC, have emerged from Docker’s development. Today, both projects are managed by the Cloud Native Computing Foundation (CNCF) and the Open Container Initiative (OCI).
In addition to the Docker team, leading technology companies such as Cisco, Google, Huawei, IBM, Microsoft, and Red Hat are involved in the development of Docker and related technologies. A more recent development is that Windows is now also used as a native environment for Docker containers in addition to the Linux kernel. Here are some of the major milestones in Docker’s evolutionary history:
|Docker development milestones
|cgroups technology integrated into Linux kernel
|LXC released; builds on cgroups and Linux namespaces like Docker did later on
|Docker released as open source
|Docker available on Amazon EC2
|Docker available on Windows 10 Pro via Hyper-V
|Docker available on Windows Home via WSL2
At the end of the article, we will go into detail about what motivated the development of Docker and similar virtualization technologies.
What is Docker?
Docker’s core functionality is container virtualization of applications. This is in contrast to virtualization with virtual machines (VM). With Docker, the application code, including all dependencies, is packed into an “image”. The Docker software runs the packaged application in a Docker container. Images can be moved between systems and run on any system running Docker.
Containers are a standardized unit of software that allows developers to isolate their app from its environment [...]” - Quote from a Docker developer, source: https://www.Docker.com/why-Docker
As is the case with virtual machine (VM) deployment, a primary focus of Docker containers is to isolate the application that is running. Unlike VMs, however, a complete operating system is not virtualized. Instead, Docker allocates certain operating system and hardware resources to each container. Any number of containers can be created from a Docker image and operated in parallel. This is how scalable cloud services are implemented.
Even though we talk about Docker as one piece of software, it is actually multiple software components that communicate via the Docker Engine API. Furthermore, a handful of special Docker objects are used, such as the aforementioned images and containers. Docker-specific workflows are composed of the software components and Docker objects. Let’s take a look at how they interact in detail.
Docker Engine runs on a local system or server and consists of two components:
- The Docker daemon (Dockerd). This is always running in the background and listens for Docker Engine API requests. Dockerd responds to appropriate commands to manage Docker containers and other Docker objects.
- The Docker client (Docker): This is a command line program. The Docker client is used to control the Docker Engine and provides commands for creating and building Docker containers, as well as creating, obtaining, and versioning Docker images.
Docker Engine API
The Docker Engine API is a REST API. It interfaces with the Docker daemon. Official “software development kits” (SKDs) for Go and Python are available for integrating the Docker Engine API into software projects. Similar libraries also exist for more than a dozen other programming languages. You access the API with the command line using the Docker command. Furthermore, you can access the API directly using cURL or similar tools.
When you use virtual machines, you often use systems consisting of several software components. In contrast, container virtualization with Docker favors clusters of loosely coupled microservices. These are suitable for distributed cloud solutions that offer a high degree of modularity and high availability. However, these kinds of systems are becoming very complex quickly. To manage containerized applications efficiently, you use special software tools known as “orchestrators”.
Docker Swarm and Docker Compose are two official Docker tools that are available for orchestrating container clusters. The “Docker swarm” command can be used to combine multiple Docker Engines into one virtual engine. The individual engines can then be operated across multiple systems and infrastructures. The “Docker compose” command is used to create multi-container applications known as “stacks”.
The Kubernetes orchestrator, originally developed by Google, is more user-friendly than Swarm and Compose. It has established itself as the standard and is widely used by the industry. Hosting companies and other “Software as a Service” (SaaS) and “Platform as a Service” (PaaS) solution providers are increasingly using Kubernetes as their underlying infrastructure.
Workflows in the Docker ecosystem are a result of how Docker objects interact with each other. They are managed by communicating with the Docker Engine API. Let’s take a look at each type of object in detail.
A Docker image is a read-only template for creating one or more identical containers. Docker images are effectively the seeds of the system; they are used to bundle and deliver applications.
Various repositories are used to share Docker images. There are both public and private repositories. At the time of writing, there are more than five million different images available for download on the popular “Docker Hub”. The Docker commands “Docker pull” and “Docker push” are used to download an image from a repository or share it there.
Docker images are built in layers. Each layer represents a specific change to the image. This results in a continuous versioning of the images, which allows a rollback to a previous state. An existing image can be used as a basis to create a new image.
A Dockerfile is a text file that describes the structure of a Docker image. A Dockerfile is similar to a batch processing script; the file contains commands that describe an image. When you run a Dockerfile, the commands are processed one after the other. Each command creates a new layer in the Docker image. So you can also think of a Dockerfile as a kind of recipe used as the basis for creating an image.
Now let’s move on to the main concept in the Docker universe: Docker containers. While a Docker image is an inert template, a Docker container is an active, running instance of an image. A Docker image exists locally in a single copy and only takes up a bit of storage space. In contrast, multiple Docker containers can be created from the same image and run in parallel.
Each Docker container consumes a certain amount of system resources for it to run, such as CPU usage, RAM, network interfaces, etc. A Docker container can be created, started, stopped, and destroyed. You can also save the state of a running container as a new image.
As we have seen, you create a running Docker container from a non-modifiable image. But what about data that is used within the container and needs to be retained beyond its service life? Docker volumes are used for this use case. A Docker volume exists outside of a specific container. So several containers can share one volume. The data contained in the volume is stored on the host’s file system. This means that a Docker volume is like a shared folder on a virtual machine.
How does Docker work?
The basic working principle of Docker operates similarly to the previously developed virtualization technology LXC: Both build on the Linux kernel and perform container-based virtualization. Both Docker and LXC combine two contradictory goals:
- Running containers share the same Linux kernel, making them more lightweight than virtual machines.
- Running containers are isolated from each other and have access only to a limited amount of system resources.
Both Docker and LXC make use of “kernel namespaces” and “control groups” to achieve these goals. Let’s take a look at how this works in detail.
The Linux kernel is the core component of the GNU/Linux open source operating system. The kernel manages the hardware and controls processes. When running Docker outside of Linux, a hypervisor or a virtual machine is needed to provide the functionality of the Linux kernel. On macOS, xhyve, a derivative of the BSD hypervisor bhyve, is used. On Windows 10, Docker uses the Hyper-V hypervisor.
Namespaces are a feature of the Linux kernel. They partition kernel resources and thus ensure processes remain separate from each other. A namespace process can only see kernel resources of that same namespace. Here is an overview of the namespaces used in Docker:
|Assign containers their own host and domain names
|Each container uses its own namespace for process IDs; PIDs from other containers are not visible; thus, two processes in different containers can use the same PID without conflict.
|IPC namespaces isolate processes in one container so that they cannot communicate with processes in other containers.
|Assign separate network resources such as IP addresses or routing tables to a container
|Mount points of the file system
|Restricts the host’s file system to a narrowly defined section from the container’s point of view
Control groups, usually abbreviated as cgroups, are used to organize Linux processes hierarchically. A process (or group of processes) is allocated a limited amount of system resources. This includes RAM, CPU cores, mass storage and (virtual) network devices. While namespaces isolate processes from each other, control groups limit access to system resources. This ensures the overall system remains functional when operating multiple containers.
What are the advantages of Docker?
Let’s take a look at the history of software development to understand the benefits of Docker. How is and was software built, delivered, and run? What parts of the process have changed fundamentally? Software is the counterpart to hardware, the physical computer. Without software, the computer is just a lump of matter. While hardware is fixed and unchangeable, software can be recreated and customized. The interaction of the two levels results in this wondrous digital world.
Software on a physical machine
Traditionally, software has been created to be run on a physical machine. But we quickly hit a wall when we do this. Software can only run on certain hardware, for example, it requires a certain processor.
Furthermore, more complex software usually does not run completely autonomously, but is integrated into a software ecosystem. This includes an operating system, libraries, and dependencies. The right versions of all the components must be available for them to interact correctly. There is also a configuration, which describes how the individual components are linked to each other.
If you want to run several applications on one machine in parallel, version conflicts quickly arise. An application may require a version of a component that is incompatible with another application. In the worst-case scenario, each application would have to run on its own physical machine. What is true is that physical machines are expensive and cannot be scaled easily. So if an application’s resource requirements grow, it may need to be migrated to a new physical machine.
Another problem arises from the fact that software under development is used in different environments. A developer writes code on the local system and runs it there for testing. The application goes through several test stages before going into production, including a test environment for quality assurance or a staging environment for testing by the product team.
The different environments often exist on different physical machines. There are almost always differences in the operating system, library, and configuration versions. How can you reconcile all of them? Because if the environments differ from each other, tests lose their meaning. Furthermore, a system must be replaced if it fails. How can you ensure consistency? It is hard to deal with these problems on physical machines.
Virtual machines as a step in the right direction
The aforementioned problems related to physical machines led to the rise in the popularity of virtual machines (VMs). The basic idea is to integrate a layer between the hardware and operating system or the host operating system and guest operating systems. A VM uncouples the application environment from the underlying hardware. The specific combination of an operating system, application, libraries, and configuration can be reproduced from an image. In addition to completely isolating an application, this allows developers to bundle several applications in an “appliance”.
VM images can be moved between physical machines, and multiple virtualized operating systems can be run in parallel. This ensures the application is scalable. However, operating system virtualization is resource intensive and is overkill for simple use cases.
The advantages of container virtualization with Docker
The images used in container virtualization do not need an operating system. Container virtualization is more lightweight and provides nearly as much isolation as VMs. A container image combines the application code with all the required dependencies and the configuration. Images are portable between systems, and the containers built on them can be reproduced. Containers can be used in various environments, such as development, production, testing, and staging. Layer and image version control also provide a good deal of modularity.
Let’s summarize the key benefits of Docker-based virtualization of applications as opposed to using a VM. A Docker container:
- does not contain its own operating system and simulated hardware
- shares an operating system kernel with other containers hosted on the same system
- is lightweight and compact in terms of resource usage compared to a VM-based application
- starts up faster than a virtual machine
- can be run in multiple instances of the same image in parallel
- can be used together with other container-based services via orchestration
- is ideally suited for local development