Docker open-source software has es­tab­lished itself as the standard for container vir­tu­al­iza­tion. Container vir­tu­al­iza­tion is the next step in the evolution of virtual machines, but with one sig­nif­i­cant dif­fer­ence. Instead of sim­u­lat­ing a complete operating system, a single ap­pli­ca­tion is vir­tu­al­ized in a container. Today, Docker con­tain­ers are used in all phases of the software lifecycle, such as de­vel­op­ment, testing, and op­er­a­tions.

There are various concepts in the Docker ecosystem. Knowing and un­der­stand­ing these com­po­nents is essential to working with Docker ef­fec­tive­ly. In par­tic­u­lar, these include Docker images, Docker con­tain­ers, and Dock­er­files. We will explain some back­ground in­for­ma­tion and give practical tips for use.

What is a Dock­er­file?

A Dock­er­file is the building block in the Docker ecosystem. It describes the steps for creating a Docker image. The flow of in­for­ma­tion follows this central model: Dock­er­file > Docker image > Docker container.

A Docker container has a limited lifetime and interacts with its en­vi­ron­ment. Think of a container as a living organism, such as single-celled organisms like yeast cells. Following this analogy, a Docker image is roughly equiv­a­lent to genetic in­for­ma­tion. All the con­tain­ers created from a single image are the same, just like how all single-celled organisms are cloned from the same genetic in­for­ma­tion. So, how do Dock­er­files fit into this model?

A Dock­er­file defines the steps for creating a new image. You must un­der­stand that every­thing always starts with an existing base image. The newly created image succeeds the base image. There are also a number of specific changes. To get back to our yeast cell example, the changes cor­re­spond to mutations. A Dock­er­file specifies two things for a new Docker image:

  1. The base image from which the new image is derived. This anchors the new image in the Docker ecosystem family tree.
  2. A number of specific changes that dis­tin­guish the new image from the base image.

How does a Dock­er­file work and how is an image created from it?

Basically, a Dock­er­file is just a normal text file. The Dock­er­file contains a set of in­struc­tions, each on a separate line. The in­struc­tions are executed one after the other to create a Docker image. You may be familiar with this idea from running a batch pro­cess­ing script. During execution, more layers are added to the image step by step. We explain exactly how this works in our article on Docker images.

A Docker image is created by executing the in­struc­tions in a Dock­er­file. This step is called the build process and is started by executing the “docker build” command. The “build context” is a central concept. This defines which files and di­rec­to­ries the build process has access to. Here, a local directory serves as the source. The contents of the source directory are passed to the Docker daemon when “docker build” is called. The in­struc­tions in the Dock­er­file get access to the files and di­rec­to­ries in the build context.

Sometimes you don't want to include all files present in the local source directory in the build context. You can use the .dock­erig­nore file for this. This is used to exclude files and di­rec­to­ries from the build context. The name is borrowed from Git's .gitignore file. The leading period in the file name indicates that it is a hidden file.

How is a Dock­er­file struc­tured?

A Dock­er­file is a plain text file named “Dock­er­file”. Please note that the first letter must be cap­i­tal­ized. The file contains one entry per line. Here is the general structure of a Dock­er­file:

# Comment
INSTRUCTION arguments

In addition to comments, Dock­er­files contain in­struc­tions and arguments. They describe the structure of the image.

Comments and parser di­rec­tives

Comments contain in­for­ma­tion primarily intended for humans. For example, comments in a Dock­er­file start with a hash sign (#) in Python, Perl and Ruby. Comment lines are removed during the build process before further pro­cess­ing. Please note that only lines that begin with a hash sign are rec­og­nized as comment lines.

Here is a valid comment:

# Our base image
FROM busybox

In contrast, there is an error below because the hash sign is not at the beginning of the line:

FROM busybox # our base image

Parser di­rec­tives are a special kind of comment. They are located in comment lines and must be at the beginning of the Dock­er­file. Otherwise, they will be treated as comments and removed during the build. It is also important to note that a given parser directive can only be used once in a Dock­er­file.

At the time of writing, only two types of parser di­rec­tives exist: “syntax” and “escape”. The “escape” parser directive defines the escape symbol to be used. This is used to write in­struc­tions over several lines, as well as to express special char­ac­ters. The “syntax” parser directive specifies the rules the parser must use to process Dock­er­file in­struc­tions. Here is an example:

# syntax=docker/Dockerfile:1
# escape=\

In­struc­tions, arguments and variables

In­struc­tions make up most of the Dock­er­file’s content. In­struc­tions describe the specific structure of a Docker image and are executed one after the other. Like commands on the command line, in­struc­tions take arguments. Some in­struc­tions are directly com­pa­ra­ble to specific command line commands. So, there is a COPY in­struc­tion which copies files and di­rec­to­ries and is roughly equiv­a­lent to the cp command on the command line. However, a dif­fer­ence from the command line is that some Dock­er­file in­struc­tions have specific rules for their sequence. Fur­ther­more, certain in­struc­tions can appear only once in a Dock­er­file.

Note

In­struc­tions do not have to be cap­i­tal­ized. You should still follow the con­ven­tion when creating a Dock­er­file though.

For arguments, you must make a dis­tinc­tion between hard-coded and variable parts. Docker follows the “twelve-factor app” method­ol­o­gy and uses en­vi­ron­ment variables to configure con­tain­ers. The ENV in­struc­tion is used to define en­vi­ron­ment variables in a Dock­er­file. Now, let’s take a look at how to assign a value to the en­vi­ron­ment variable.

The values stored in en­vi­ron­ment variables can be read and used as variable parts of arguments. A special syntax is used for this purpose. It is rem­i­nis­cent of shell scripts. The name of the en­vi­ron­ment variable is preceded by a dollar sign: $env_var. There is also an al­ter­na­tive notation for ex­plic­it­ly de­lim­it­ing the variable name in which it is embedded in curly brackets: ${env_var}. Let's look at a concrete example:

# set variable 'user' to value 'admin'
ENV user="admin"
# set username to 'admin_user'
USER ${user}_user

The most important Dock­er­file in­struc­tions

We will now present the most important Dock­er­file in­struc­tions. Tra­di­tion­al­ly, some in­struc­tions – es­pe­cial­ly FROM – were only allowed to appear once per Dock­er­file. However, there now are multi-stage builds. They describe multiple images in a Dock­er­file. The re­stric­tion then applies to each in­di­vid­ual build stage.

In­struc­tion De­scrip­tion Comment
FROM Set base image Must appear as the first in­struc­tion; only one entry per build stage
ENV Set en­vi­ron­ment variables for build process and container runtime
ARG Declare command line pa­ra­me­ters for build process May appear before the FROM in­struc­tion
WORKDIR Change current directory
USER Change user and group mem­ber­ship
COPY Copy files and di­rec­to­ries to the image Creates new layer
ADD Copy files and di­rec­to­ries to the image Creates new layer; use is dis­cour­aged
RUN Execute command in image during build process Creates new layer
CMD Set default arguments for container start Only one entry per build stage
EN­TRY­POINT Set default command for container start Only one entry per build stage
EXPOSE Define port as­sign­ments for running container Ports must be exposed when starting the container
VOLUME Include directory in the image as a volume when starting the container in the host system

FROM in­struc­tion

The FROM in­struc­tion sets the base image on which sub­se­quent in­struc­tions operate. This in­struc­tion may only exist once per build stage and must appear as the first in­struc­tion. There is one caveat: the ARG in­struc­tion may appear before the FROM in­struc­tion. You can thus specify exactly which image is used as the base image via a command line argument when starting the build process.

Every Docker image must be based on a base image. In other words, each Docker image has exactly one parent image. This results in a classic chicken-or-the-egg dilemma. The lineage must begin somewhere. In the Docker universe, lineage begins with the “scratch” image. This minimal image serves as the origin of any Docker image.

ENV and ARG in­struc­tions

These two in­struc­tions assign a value to a variable. The dis­tinc­tion between the two in­struc­tions is primarily where the values come from and the context in which the variables are available. Let's look at the ARG in­struc­tion first.

The ARG in­struc­tion declares a variable in the Dock­er­file that is only available during the build process. The value of a variable declared with ARG is passed as a command line argument when the build process is started. Here is an example in which we are declaring the “user” build variable:

ARG user

When we start the build process, we pass the actual value of the variable:

docker build --build-arg user=admin

When declaring the variable, you can choose to specify a default value. If a suitable argument is not passed when starting the build process, the variable is given the default value:

ARG user=tester

Without using “--build-arg”, the “user” variable contains the “tester” default value:

docker build

Here we are defining an en­vi­ron­ment variable using the ENV in­struc­tion. Unlike the ARG in­struc­tion, a variable defined with ENV exists both during the build process and during container runtime. The ENV in­struc­tion can be written in two ways.

  1. Rec­om­mend­ed notation:
ENV version="1.0"

2. Al­ter­na­tive notation for backward com­pat­i­bil­i­ty:

ENV version 1.0
Tip

The ENV in­struc­tion works roughly the same as the “export” command on the command line.

WORKDIR and USER in­struc­tions

The WORKDIR in­struc­tion is used to change di­rec­to­ries during the build process, as well as when starting the container. Calling WORKDIR applies to all sub­se­quent in­struc­tions. During the build process, the RUN, COPY and ADD in­struc­tions are affected. During the container runtime, this applies to the CMD and EN­TRY­POINT in­struc­tions.

Tip

The WORKDIR in­struc­tion is roughly equiv­a­lent to the cd command on the command line.

The USER in­struc­tion is used to change the current (Linux) user, like how the WORKDIR in­struc­tion is used to change the directory. You can also choose to define the user’s group mem­ber­ship. Calling USER applies to all sub­se­quent in­struc­tions. During the build process, the RUN in­struc­tions are affected by user and group mem­ber­ship. During the container runtime, this applies to the CMD and EN­TRY­POINT in­struc­tions.

Tip

The USER in­struc­tion is roughly equiv­a­lent to the su command on the command line.

COPY and ADD in­struc­tions

The COPY and ADD in­struc­tions are both used to add files and di­rec­to­ries to the Docker image. Both in­struc­tions create a new layer which is stacked on top of the existing image. The source for the COPY in­struc­tion is always the build context. In the following example, we are copying a readme file from the “doc” sub­di­rec­to­ry in the build context to the image’s top-level “app” directory:

COPY ./doc/readme.md /app/
Tip

The COPY in­struc­tion is roughly equiv­a­lent to the cp command on the command line.

The ADD in­struc­tion behaves nearly iden­ti­cal­ly, but it can retrieve URL resources outside the build context and unpacks com­pressed files. In practice, this may lead to un­ex­pect­ed side effects. Therefore, the use of the ADD in­struc­tion is expressly dis­cour­aged. You should only use the COPY in­struc­tion in most cases.

RUN in­struc­tion

The RUN in­struc­tion is one of the most common Dock­er­file in­struc­tions. When we use the RUN in­struc­tion, we instruct Docker to execute a command line command during the build process. The resulting changes are stacked on top of the existing image as a new layer. The RUN in­struc­tion can be written in two ways:

  1. “Shell” notation: The arguments passed to RUN are executed in the image’s default shell. Special symbols and en­vi­ron­ment variables are replaced following the shell rules. Here is an example of a call that greets the current user using a subshell "$()":
RUN echo "Hello $(whoami)"

2. “Exec” notation: Instead of passing a command to the shell, an ex­e­cutable file is called directly. Ad­di­tion­al arguments may be passed in the process. Here is an example of a call that invokes the “npm” dev tool and instructs it to run the “build” script:

CMD ["npm", "run", " build"]
Note

In principle, the RUN in­struc­tion can be used to replace some of the other Docker in­struc­tions. For example, the “RUN cd src” call is basically equiv­a­lent to “WORKDIR src”. However, this approach creates Dock­er­files, which become harder to read and manage as the size grows. You should therefore use spe­cial­ized in­struc­tions whenever possible.

CMD and EN­TRY­POINT in­struc­tions

The RUN in­struc­tion executes a command during the build process, creating a new layer in the Docker image. In contrast, the CMD and EN­TRY­POINT in­struc­tions execute a command when the container is started. There is a subtle dif­fer­ence between the two in­struc­tions.

  • EN­TRY­POINT is used to create a container that always performs the same action when started. So, the container behaves like an ex­e­cutable file.
  • CMD is used to create a container that executes a defined action on startup without any further pa­ra­me­ters. The preset action can be easily over­rid­den by suitable pa­ra­me­ters.

What both in­struc­tions have in common is that they may only appear once in a Dock­er­file. However, you can combine these in­struc­tions. In this case, EN­TRY­POINT defines the default action to be performed when the container is started, while CMD defines easily over­rid­den pa­ra­me­ters for the action.

Our Dock­er­file entry:

ENTRYPOINT ["echo", "Hello"]
CMD ["World"]

The cor­re­spond­ing commands on the command line:

# Output "Hello World"
docker run my_image
# Output "Hello Moon"
docker run my_image Moon

EXPOSE in­struc­tion

Docker con­tain­ers com­mu­ni­cate over the network. Services running in the container are addressed via specified ports. The EXPOSE in­struc­tion documents port as­sign­ments and supports TCP and UDP protocols. When a container is started with “docker run -P”, the container listens on the ports defined by EXPOSE. Al­ter­na­tive­ly, the assigned ports can be over­writ­ten with “docker run -p”.

Here is an example. Our Dock­er­file contains the following EXPOSE in­struc­tions:

EXPOSE 80/tcp
EXPOSE 80/udp

The following ways are then available to activate the ports when the container is started:

# Container listens for TCP/UDP traffic on port 80
docker run -P
# Container listens for TCP traffic on port 81
docker run -p 81:81/tcp

VOLUME in­struc­tion

A Dock­er­file defines a Docker image which consists of layers stacked on top of each other. The layers are read-only so that the same state is always guar­an­teed when a container is started. We need a mechanism to exchange data between the running container and the host system. The VOLUME in­struc­tion defines a “mount point” within the container.

Consider the following Dock­er­file excerpt. We create a “shared” directory in the image’s top-level directory and then specify that this directory is to be mounted in the host system when the container is started:

RUN mkdir /shared
VOLUME /shared

Note that we cannot specify the actual path on the host system within the Dock­er­file. By default, di­rec­to­ries defined by the VOLUME in­struc­tion are mounted on the host system under “/var/lib/docker/volumes/”.

How do you edit a Dock­er­file?

Remember that a Dock­er­file is a (plain) text file. It can be edited using the usual methods. A plain text editor is probably the most popular. This can be an editor with a graphical user interface. There is no shortage of options here. The most popular editors include VSCode, Sublime Text, Atom and Notepad++. Al­ter­na­tive­ly, a number of editors are available on the command line. In addition to the original Vim and Vi editors, the sim­pli­fied editors Pico and Nano are widely used.

Note

You should only edit a plain text file with editors suitable for this purpose. Under no cir­cum­stances should you use a Word processor, such as Microsoft Word, Apple Pages, Li­bre­Of­fice or OpenOf­fice, to edit a Dock­er­file.

Go to Main Menu