The Docker project has es­tab­lished itself as a standard for container vir­tu­al­iza­tion with software of the same name. A key concept when using Docker platforms is the Docker image. In this article, we will explain how Docker images are built and how they work.

What is a Docker image?

You may already be familiar with the term “image” in the context of vir­tu­al­iza­tion with virtual machines (VMs). Usually, a VM image is a copy of an operating system. A VM image may contain other installed com­po­nents such as databases and web servers. The term comes from a time when software was dis­trib­uted on optical data carriers like CD-ROMs and DVDs. If you wanted to create a local copy of the data carrier, you had to create an “image” with special software.

Container vir­tu­al­iza­tion is the logical further de­vel­op­ment of VM vir­tu­al­iza­tion. Instead of vir­tu­al­iz­ing a virtual computer (machine) with its own operating system, a Docker image usually consists of just one ap­pli­ca­tion. This could be an in­di­vid­ual binary file or a com­bi­na­tion of several software com­po­nents.

To run the ap­pli­ca­tion, a container is first created from the image. All con­tain­ers running on a Docker host use the same operating system kernel. As a result, Docker con­tain­ers and Docker images are usually sig­nif­i­cant­ly more light­weight than com­pa­ra­ble virtual machines and their images.

Docker con­tain­ers and Docker images are closely linked concepts. As such, not only can a Docker container be created from a Docker image, but a new image can also be created from a running container. This is why we say that Docker images and Docker con­tain­ers have a chicken-and-egg re­la­tion­ship:

Docker command De­scrip­tion Chicken-egg analogy
docker run <image-id> Create a Docker container from an image Chick hatches from an egg
docker commit <container-id> Create a Docker image from a container Hen lays a new egg

In the bi­o­log­i­cal chicken-and-egg system, exactly one chick is produced from one egg. The egg is lost in the process. In contrast, a Docker image can be used to create an unlimited number of similar con­tain­ers. This re­pro­ducibil­i­ty makes Docker an ideal platform for scalable ap­pli­ca­tions and services.

A Docker image is an un­change­able template that can be used re­peat­ed­ly to create Docker con­tain­ers. The image contains all the in­for­ma­tion and de­pen­den­cies needed to run a container, including all basic program libraries and user in­ter­faces. There is usually a command-line en­vi­ron­ment (“shell”) and an im­ple­men­ta­tion of the C standard library on board. Here is an overview of the official “Alpine Linux” image:

Linux kernel C standard library Unix command
From the host musl libc BusyBox

Alongside these basic com­po­nents that sup­ple­ment the Linux kernel, a Docker image usually also contains ad­di­tion­al software. Below are a few examples of software com­po­nents for different areas of ap­pli­ca­tion. Please note that a single Docker image usually contains a small selection of the com­po­nents shown:

Area of ap­pli­ca­tion Software com­po­nents
Pro­gram­ming languages PHP, Python, Ruby, Java, JavaScript
De­vel­op­ment tools node/npm, React, Laravel
Database systems MySQL, Postgres, MongoDB, Redis
Web servers Apache, nginx, lighttpd
Caches and proxies Varnish, Squid
Content man­age­ment systems WordPress, Magento, Ruby on Rails

How does a Docker image differ from a Docker container?

As we have seen, Docker images and Docker con­tain­ers are closely related. So, how do the two concepts differ?

First of all, a Docker image is inert. It takes up some storage space but does not use any system resources. In addition, a Docker image cannot be changed after creation and as such is a “read-only” medium. As a side note, it is possible to add changes to an existing Docker image, but this will create new images. An original, un­mod­i­fied version of the image will remain.

As we already mentioned, a Docker image can be used to create an unlimited number of similar con­tain­ers. How exactly is a Docker container different from a Docker image? A Docker container is a running instance (i.e. an instance in the process of execution) of a Docker image. Like any software executed on a computer, a running Docker container uses the system resources, working memory and CPU cycles. Fur­ther­more, the status of a container changes over its lifecycle.

If this de­scrip­tion seems too abstract, use this example from your day-to-day life to help: Think of a Docker image like a DVD. The DVD itself is inert ­– it sits in its case and does nothing. It per­ma­nent­ly occupies the same limited space in the room. The content only becomes “alive” when the DVD is played in a special en­vi­ron­ment (DVD player).

Like the film generated when a DVD is played, a running Docker container has a status. In the case of a film, this includes the current playback time, selected language, subtitles, etc. This status changes over time, and a playing film con­stant­ly consumes elec­tric­i­ty. Just like how an unlimited number of similar con­tain­ers can be created from a Docker image, the film on a DVD can be played over and over again. What’s more, the running film can be stopped and started, as can a Docker container.

Docker concept Analogy Mode Status Resource con­sump­tion
Docker image DVD Inert “Read-only”/un­change­able Fixed
Docker container “living” Playing film Changes over time Varies depending on use

How and where are Docker images used?

Today, Docker is used in all phases of the software lifecycle, including during the de­vel­op­ment, testing, and operation phases. The central concept in the Docker ecosystem is the container which is always created from an image. As such, Docker images are used every­where that Docker is used. Let’s look at a few examples.

Docker images in local de­vel­op­ment en­vi­ron­ments

If you develop software on your own device, you will want to keep the local de­vel­op­ment en­vi­ron­ment as con­sis­tent as possible. Most of the time, you’ll need perfectly matching versions of the pro­gram­ming language, libraries, and other software com­po­nents. If just one of the many in­ter­act­ing levels is changed, it can quickly disrupt the other levels. This can cause the source code to not compile or the web server to not start. Here, the un­change­abil­i­ty of a Docker image is in­cred­i­bly useful. As a developer, you can be sure that the en­vi­ron­ment contained in the image will remain con­sis­tent.

Large de­vel­op­ment projects can be carried out by teams. In this case, using an en­vi­ron­ment that stays stable over time is crucial for com­pa­ra­bil­i­ty and re­pro­ducibil­i­ty. All de­vel­op­ers in a team can use the same image, and when a new developer joins the team, they can find the right Docker image and start working straight away. When changes are made to the de­vel­op­ment en­vi­ron­ment, a new Docker image is created. The de­vel­op­ers can then obtain the new image and are thus im­me­di­ate­ly up to date.

Docker images in service-oriented ar­chi­tec­ture (SOA)

Docker images form the basis of modern service-oriented ar­chi­tec­ture. Instead of a single mono­lith­ic ap­pli­ca­tion, in­di­vid­ual services with well-defined in­ter­faces are developed. Each service is packaged into its own image. The con­tain­ers launched from this com­mu­ni­cate with each other via the network and establish the overall func­tion­al­i­ty of the ap­pli­ca­tion. By enclosing the services in their own in­di­vid­ual Docker images, you can develop and maintain them in­de­pen­dent­ly. The in­di­vid­ual services can even be written in different pro­gram­ming languages.

Docker images for hosting providers/PaaS

Docker images can also be used in data centers. Each service (e.g., load balancers, web servers, database servers, etc.) can be defined as a Docker image. The resulting con­tain­ers can each support a certain load. Or­ches­tra­tion software monitors the container, its load, and its status. When the load increases, the or­ches­tra­tor launches ad­di­tion­al con­tain­ers from the cor­re­spond­ing image. This approach makes it possible for you to rapidly scale services to respond to changing con­di­tions.

How is a Docker image built?

In contrast to images of virtual machines, a Docker image is normally not a single file. Instead, it is made up of a com­bi­na­tion of several different com­po­nents. Here is a quick overview (more details will follow later):

  • Image layers contain data added by op­er­a­tions carried out on the file system. Layers are su­per­im­posed and then reduced to a con­sis­tent level by a union file system.
  • A parent image prepares the basic functions of the image and anchors it in the main file directory of the Docker ecosystem.
  • An image manifest describes the image com­po­si­tion and iden­ti­fies the image layers.

What should you do if you want to convert a Docker image into a single file? You can do this with the “docker save” command on the command line. This creates a .tar save file which can then be easily moved between systems. With the following command, a Docker image with the name “busybox” is written into a “busybox.tar” file:

docker save busybox > busybox.tar

Often, the output of the “docker save” command is pipelined to Gzip on the command line. This way, the data is com­pressed after it has been output to the .tar file:

docker save myimage:latest | gzip > myimage_latest.tar.gz

An image file created via “docker save” can be fed into the local Docker host as a Docker image with “docker load”:

docker load busybox.tar

Image layers

A Docker image is made up of read-only layers. Each layer describes the suc­ces­sive changes to the file system of the image. For each operation that leads to a change to the file system, a new layer is created. The approach used here is usually referred to as “copy-on-write”: a write access creates a modified copy of the image in a new layer while the original data remains unchanged. If this principle sounds familiar to you, it’s because the version control software Git works in the same way.

We can display the layers of a Docker image by using the “Docker image inspect” command on the command line. This command returns a JSON document that we can process with the standard tool jq:

Docker image inspect <image-id> | jq -r '.[].RootFS.Layers[]'

A special file system is used to merge the changes in the layers again. This union file system overlays all layers to produce a con­sis­tent folder and file structure on the interface. His­tor­i­cal­ly, various tech­nolo­gies known as “storage drivers” were used to implement the union file system. Today, the storage driver “overlay2” is rec­om­mend­ed in most cases:

Storage driver Comment
overlay2 Rec­om­mend­ed for use today
aufs, overlay Used in earlier versions

It is possible to output the used storage driver of a Docker image. We can use the “Docker image inspect” command on the command line to do this. It returns a JSON document that we can then process with the standard tool jq:

Docker image inspect <image-id> | jq -r '.[].GraphDriver.Name'

Each image layer is iden­ti­fied with a clear hash which is cal­cu­lat­ed from the changes that the layer contains. If two images use the same layer, the layer will only be stored locally once. Both images will then use the same layer. This ensures efficient local storage and reduces transfer volumes when obtaining images.

Parent images

A Docker image usually has an un­der­ly­ing “parent image”. In most cases, the parent image is defined by a FROM directive in the docker file. The parent image defines a basis that the derived images are based on. The existing image layers are overlaid with ad­di­tion­al layers.

When “in­her­it­ing” from the parent image, a Docker image is placed in a file directory that contains all existing images. Perhaps you are wondering where the main file directory begins? Its roots are de­ter­mined by a few special “base images”. In most cases, a base image is defined with the “FROM scratch” directive in the docker file. There are however other ways to create a base image. You can find out more about this in the section “Where do Docker images come from?”.

Image manifests

As we have seen, a Docker image is made up of several layers. You can use the “Docker image pull” command to pull a Docker image from an online registry. In this case, no single file is down­loaded. Instead, the local Docker daemon downloads the in­di­vid­ual layers and saves them. So, where does the in­for­ma­tion about the in­di­vid­ual layers come from?

The in­for­ma­tion about which image layers a Docker image is made up of can be found in the image manifest. An image manifest is a JSON file that fully describes a Docker image and contains the following:

  • In­for­ma­tion about the version, scheme, and size
  • Cryp­to­graph­ic hashes of the image layers used
  • In­for­ma­tion about the available processor ar­chi­tec­tures

To clearly identify a Docker image, a cryp­to­graph­ic hash of the image manifest is created. When the “Docker image pull” command is used, the manifest file is down­loaded. The local Docker daemon then obtains the in­di­vid­ual image layers.

Where do Docker images come from?

As we have seen, Docker images are an important part of the Docker ecosystem. There are many different ways to obtain a Docker image. There are two basic methods that we will take a closer look at below:

  1. Pulling existing Docker images from a registry
  2. Creating new Docker images

Pulling existing Docker images from a registry

Often, a Docker project starts when an existing Docker image is pulled from a registry. This is a platform that can be accessed via the network that provides the Docker images. The local Docker host com­mu­ni­cates with the registry to download a Docker image after a “Docker image pull” command has been executed.

There are publicly ac­ces­si­ble online reg­istries that offer a wide selection of existing Docker images for use. At the time that this article was written, there were more than eight million freely available Docker images on the official Docker registry “Docker Hub”. In addition to Docker images, Microsoft‘s “Azure Container Registry” includes other container images in a variety of different formats. You can also use the platform to create your own private container reg­istries.

In addition to the online reg­istries mentioned above, you can also host a local registry yourself. For example, larger or­ga­ni­za­tions often use this option to give their teams protected access to self-created Docker images. Docker has created the Docker Trusted Registry (DTR) for exactly this purpose. It is an on-premises solution for the provision of an in-house registry in your own computer center.

Creating new Docker images

You may sometimes want to create a specially-adapted Docker image for a specific project. Usually, you can use an existing Docker image and adapt it to meet your needs. Remember that Docker images are un­change­able and that when a change is made, a new Docker image is created. There are several different ways to create a new Docker image:

  1. Build on the parent image with Dock­er­file
  2. Generate one from the running container
  3. Create a new base image

The most common approach to creating a new Docker image is to write a Dock­er­file. A Dock­er­file contains special commands which define the parent image and any changes required. Calling up the “Docker image build” command will create a new Docker image from the Dock­er­file. Here is a quick example:

# Create Dockerfile on the command line
cat <<EOF > ./Dockerfile
FROM busybox
RUN echo "hello world"
EOF
# Create a Docker image from a Dockerfile
Docker image build

His­tor­i­cal­ly, the term “image” comes from the “imaging” of a data carrier. In the context of virtual machines (VMs), a snapshot of a running VM image can be created. A similar process can be done with Docker. With the “docker commit” command, we can create an image of a running container as a new Docker image. All mod­i­fi­ca­tions made to the container will be saved:

docker commit <container-id>

Fur­ther­more, we can pass on Dock­er­file in­struc­tions with the “docker commit” command. The mod­i­fi­ca­tions encoded with the in­struc­tions become part of the new Docker image:

docker commit --change <dockerfile instructions> <container-id>

We can use the “Docker image history” command to trace which mod­i­fi­ca­tions have been made to a Docker image later:

Docker image history <image-id>

As we have seen, we can base a new Docker image on a parent image or on the status of a running container. But how do you create a new Docker image from scratch? There are two different ways to do this. You can use a Dock­er­file with the special “FROM scratch” directive as described above. This creates a new minimal base image.

If you would prefer not to use the Docker scratch image, you must use a special tool like de­boot­strap and prepare a Linux dis­tri­b­u­tion. This will then be packaged into a tarball file with the tar command and imported into the local Docker host via “Docker image import”.

The most important Docker image commands

Docker image command Ex­pla­na­tion
Docker image build Creates a Docker image from a Dock­er­file
Docker image history Shows the steps taken to create a Docker image
Docker image import Creates a Docker image from a tarball file
Docker image inspect Shows detailed in­for­ma­tion for a Docker image
Docker image load Loads an image file created with “Docker image save”
Docker image ls / Docker images Lists the images available on the Docker host
Docker image prune Removes unused Docker images from the Docker host
Docker image pull Pulls a Docker image from the registry
Docker image push Sends a Docker image to the registry
Docker image rm Removes a Docker image from the local Docker host
Docker image save Creates an image file
Docker image tag Tags a Docker image
Go to Main Menu