The Docker virtualization solution has fundamentally altered how software is built, distributed, and operated over the last decade. Unlike its predecessors – virtual machines (VM) – Docker virtualizes individual applications. So, a Docker container is an application or software container.
The term “software container” is based on physical containers, such as those used on ships. In logistics, containers as standardized units are what has made modern retail chains possible. Thus, a container can be transported on any ship, truck, or train designed for this purpose. This works largely independently of the contents of the container. On the outside, the container is equipped with standardized interfaces. This is quite similar to how Docker containers work.
- What is a Docker container?
- What differentiates Docker containers and Docker images?
- How is a Docker container built?
- How and where are Docker containers used?
- Advantages and disadvantages of Docker container virtualization
$1 Domain Names
Register great TLDs for less than $1 for the first year.
Why wait? Grab your favorite domain name today!
What is a Docker container?
So, what exactly is a Docker container? Let’s pass the mic to the Docker developers:
“Containers are a standardized unit of software that allows developers to isolate their app from its environment.” - Source: www.docker.com/why-docker
Unlike a physical container, a Docker container exists in a virtual environment. A physical container is assembled based on a standardized specification. We see something similar with virtual containers. A Docker container is created from an immutable template called an “image”. A Docker image contains the dependencies and configuration settings required to create a container.
Just as many physical containers can stem from a single specification, any number of Docker containers can be created from a single image. Docker containers thus form the basis for scalable services and reproducible application environments. We can create a container from an image and also save an existing container in a new image. You can run, pause, and stop processes within a container.
Unlike a virtual machine (VM) virtualization, a Docker container does not contain its own operating system (OS). Instead, all the containers running on a Docker host access the same OS kernel. When Docker is deployed on a Linux host, the existing Linux kernel is used. If the Docker software runs on a non-Linux system, a minimal Linux system image is used via a hypervisor or virtual machine.
A certain amount of system resources is allocated to each container uponexecution. This includes RAM, CPU cores, mass storage and (virtual) network devices. Technically, “cgroups” (short for “control groups”) limit a Docker container’s access to system resources. “Kernel namespaces” are used to partition the kernel resources and distinguish the processes from each other.
Externally, Docker containers communicate over the network. To do this, specific services listen for exposed ports. These are usually web or database servers. The containers themselves are controlled on the respective Docker host via the Docker API. Containers can be started, stopped, and removed. The Docker client provides a command line interface (CLI) with the appropriate commands.
What differentiates Docker containers and Docker images?
The two terms “Docker container” and “Docker image” often cause confusion. This is hardly surprising, since it is a bit of a chicken-or-the-egg dilemma. A container is created from an image; however, a container can also be saved as a new image. Let’s take a look at the differences between the two concepts in detail.
A Docker image is an inert template. The image only takes up some space on a hard drive and does nothing else. In contrast, the Docker container is a “living” instance. A running Docker container has a behavior; it interacts with the environment. Furthermore, a container has a state that changes over time, using a variable amount of RAM.
You may be familiar with the concepts of “class” and “object” from object-oriented programming (OOP). The relationship between a Docker container and a Docker image is kind of similar to the relationship between an object and its associated class. A class exists only once; several similar objects can be created from it. The class itself is loaded from a source code file. There is a similar pattern in the Docker universe. A template is created from a source unit, a “Dockerfile”, which in turn creates many instances:
|Docker concept||Dockerfile||Docker image||Docker container|
|Programming analogy||Class source code||loaded class||instantiated object|
We refer to the Docker container as a “running instance” of the associated image. The terms “instance” and “instantiate” are abstract right now. If you don’t really get it, let’s use a mnemonic device. Replace “instantiate” with “cut out” in your mind. Even if there is no relationship between the words, there is a strong correspondence between their meanings in computer science terms. Think of the principle like this: Just as we use a cookie cutter to cut out many similar cookies from a layer of dough, we instantiate many similar objects from a template. So then, instantiating is when a template creates an object.
How is a Docker container built?
To understand how a Docker container is built, it helps to look at the “Twelve-Factor App” methodology. This is a collection of twelve fundamental principles for building and operating service-oriented software. Both Docker and the twelve-factor app date back to 2011. The twelve-factor app helps developers design software-as-a-service apps according to specific standards. These include:
- Using declarative formats for setup automation to minimize time and cost for new developers joining the project;
- Having a clean contract with the underlying operating system, offering maximum portability between execution environments;
- Being suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
- Minimizing divergence between development and production, enabling continuous deployment for maximum agility;
- And being able to scale up without significant changes to tooling, architecture, or development practices.
The structure of a Docker container is based on these principles. A Docker container includes the following components, which we will look at in detail below:
- Container operating system and union file system
- Software components and configuration
- Environment variables and runtime configuration
- Ports and volumes
- Processes and logs
Container operating system and union file system
Unlike a virtual machine, a Docker container does not contain its own operating system. Instead, all the containers running on a Docker host access a shared Linux kernel. Only a minimal execution layer is included in the container. This usually includes an implementation of the C standard library and a Linux shell for running processes. Here is an overview of the components in the official “Alpine Linux” image:
|Linux kernel||C standard library||Unix commands|
|from host||musl libc||BusyBox|
A Docker image consists of a stack of read-only file system layers. A layer describes the changes to the file system in the layer below it. Using a special union file system such as overlay2, the layers are overlaid and unified into a consistent interface. Another writable layer is added to the read-only layers when you create a Docker container from an image. All the changes made to the file system are incorporated into the writable layer using the “copy-on-write” method.
Software components and configuration
Building on the minimal container operating system, additional software components are installed in a Docker container. This is usually followed by further setup and configuration steps. The standard methods are used for installation:
- via a system package manager like apt, apk, yum, brew, etc.
- via a programming language package manager like pip, npm, composer, gem, cargo, etc.
- by compiling in the container with make, mvn, etc.
Here are some examples of software components commonly used in Docker containers:
|Application area||Software components|
|Development tools||node/npm, React, Laravel|
|Database systems||MySQL, Postgres, MongoDB, Redis|
|Web servers||Apache, nginx, lighttpd|
|Caches and proxies||Varnish, Squid|
|Content management systems||WordPress, Magento, Ruby on Rails|
Environment variables and runtime configuration
Following the twelve-factor app methodology, the Docker container configuration is stored in environment variables, called “Env-Vars”. Here, we understand configuration as all values that change between the different environments, such as the development vs. production system. This often includes hostnames and database credentials.
The values of the environment variables influence how the container behaves. Two primary methods are used to make environment variables available within a container:
1. Definition in Dockerfile
The ENV statement declares an environment variable in the Dockerfile. An optional default value can be assigned. This comes into effect if the environment variable is empty when the container is started.
2. Pass when starting the container
To access an environment variable in the container that was not declared in the Dockerfile, we pass the variable when we start the container. This works for single variables via command line parameters. Furthermore, an “env file”, which defines several environment variables together with their values, can be passed.
Here is how to pass an environment variable when starting the container:
docker run --env <env-var> <image-id></image-id></env-var>
It is useful to pass an env file for many environment variables:
docker run --env-file /path/to/.env <image-id></image-id>
The “docker inspect” command can be used to display the environment variables present in the container along with their values. Therefore, you must be careful when using confidential data in environment variables.
When starting a container from an image, configuration parameters can be passed. These include the amount of allocated system resources, which is otherwise unlimited. Furthermore, start parameters are used to define ports and volumes for the container. We’ll learn more about this in the next section. The startup parameters may override any default values in the Dockerfile. Here are a few examples.
Allocate a CPU core and 10 megabytes of RAM to the Docker container at startup:
docker run --cpus="1" --memory="10m" <image-id></image-id>
Expose ports defined the in Dockerfile when starting the container:
docker run -P <image-id></image-id>
Map TCP port 80 of the Docker host to port 80 of the Docker container:
docker run -p 80:80/tcp <image-id></image-id>
Ports and volumes
A Docker container contains an application that is isolated from the outside world. For this to be useful, it must be possible to interact with the environment. Therefore, there are ways to exchange data between host and container, as well as between multiple containers. Standardized interfaces allow containers to be used in different environments.
Communication with processes running in the container from the outside runs over exposed network ports. This uses TCP and UDP standard protocols. For example, let’s imagine a Docker container that contains a web server; it listens on TCP port 8080. The Docker image’s Dockerfile also contains the line “EXPOSE 8080/tcp”. We start the container with “docker run -P” and access the web server at “http://localhost:8080”.
Ports are used to communicate with services running in the container. However, in many cases it can make sense to use a file shared between the container and the host system to exchange data. This is why Docker knows different types of volumes:
- Named volume – recommended
- Anonymous volumes – are lost when the container is removed
- Bind mounts – historical and not recommended; performant
- Tmpfs mounts – located in RAM; only on Linux
The differences between the volume types are subtle. The choice of the right type depends heavily on the particular use case. A detailed description would go beyond the scope of this article.
Processes and logs
A Docker container usually encapsulates an application or service. The software executed inside the container forms a set of running processes. The processes in a Docker container are isolated from processes in other containers or the host system. Processes can be started, stopped, and listed within a Docker container. It is controlled via the command line or via the Docker API.
Running processes continuously output status information. Following the twelve-factor app methodology, the standard STDOUT and STDERR data streams are used for output. The output on these two data streams can be read with the “docker logs” command. Something called a “logging driver” can also be used. The default logging driver writes logs in JSON format.
How and where are Docker containers used?
Docker is used in all parts of the software lifecycle nowadays. This includes development, testing, and operation. Containers running on a Docker host are controlled via the Docker API. The Docker client accepts commands on the command line; special orchestration tools are used to control clusters of Docker containers.
The basic pattern to deploy Docker containers looks like this:
- The Docker host downloads the Docker image from the registry.
- The Docker container is created and started from the image.
- The application in the container runs until the container is stopped or removed.
Let’s take a look at two Docker container deployment examples:
Deploying Docker containers in the local development environment
The use of Docker containers is particularly popular in software development. Usually, software is developed by a team of specialists. A collection of development tools known as a toolchain is used for this purpose. Each tool is in a specific version, and the whole chain works only if the versions are compatible with each other. Furthermore, the tools must be configured correctly.
To ensure that the development environment is consistent, developers use Docker. A Docker image is created once, and it contains the entire correctly-configured toolchain. Each developer on the team pulls the Docker image onto their local machine and starts a container from it. Development then takes place within the container. The image is updated centrally if there is a change to the toolchain.
Deploying Docker containers in orchestrated clusters
Data centers of hosting providers and Platform-as-a-Service (PaaS) providers use Docker container clusters. Each service (load balancer, web server, database server, etc.) runs in its own Docker container. At the same time, a single container can only handle a certain load. Orchestration software monitors the containers and their load and condition. The orchestrator starts additional containers when the load increases. This approach allows services to scale up quickly in response to changing conditions.
Advantages and disadvantages of Docker container virtualization
The advantages of virtualization with Docker can be seen in particular with regard to the use of virtual machines (VMs). Docker containers are much more lightweight than VMs. They can be started faster and consume fewer resources. The images underlying Docker containers are also smaller by several orders of magnitude. While VM images are usually hundreds of MB to a few GB in size, Docker images start at just a few MB.
However, container virtualization with Docker also has some drawbacks. Since a container does not contain its own operating system, the isolation of the processes running in it is not quite perfect. Using many containers results in a high degree of complexity. Furthermore, Docker is an evolved system, and now the Docker platform does too much. Developers are thus working harder to break down the individual components.