Transcript
Hello, welcome to this tech talk session about image deep dive. After a quick introduction I will present the concept of the image layers. Then I will show some building based practices and I will finish with the mounts, HEREDOC support and the multi-architecture build.
Table of Contents
- Introduction (0:21)
- Image & Container 101 (1:42)
- Understanding unioned filesystems (3:00)
- Creating an image with a Dockerfile (6:40)
- Image building best practices (8:44)
- Maximizing layer reuse between builds (11:17)
- Multi-stage builds (14:05)
- Mounts & Multi-arch (17:28)
- Learn more
Introduction (0:21)
So a quick introduction about the container first. In the software industry, things are moving quickly. We see many new frameworks, libraries and components getting software ready, getting more and more complex. There is more chance of mistakes and problems. With the internet, the number of applications and how fast we deliver them is important for business. If we don’t have a common way to develop and deliver it’s a big problem. Luckily we have containers. They offer a standard method to package, build, share and run applications easily. About the container specifications.
In June 2015, Docker worked with the Linux Foundation to form the Open Content Container Initiative (OCI) to define three specifications. The image specification is the way to specify how to build the container image, how to create a file system, which files and so on. The runtime specification, how from the image I will create and run a container and define things like CPU, memory storage and networking and the distribution specification. This is all the API to achieve a complete container workflow. The question is how do we put all the binaries, files, config and so on into the containers to run applications? The answer is thanks to the container images.
Image & Container 101 (1:42)
So an image consists of everything you need to run an application like the file system, the code binaries, tools, runtimes, dependencies and so on. These images are stored in a registry like the Docker Hub. To create an image, the Docker file, an implementation of the image specification, is a very convenient solution. The Docker file is a file with a set of commands describing how to build the image. So an image is like a cake with several slices. It is composed of multiple layers and these layers are defined with the Docker file.
So thanks to the image, we will run a container. We can say that a container is a running instance of an image. The container takes the instructions and files from the image and creates an environment for running an application. It is very important to know that the container is an isolated process running like a virtualized environment and because it is isolated, it cannot interfere with the other processes. It’s possible to share files and directories between the host and the container with the kernels. This feature is useful to persistent shared data and to provide configuration files.
Understanding unioned filesystems (3:00)
It’s important to spend a little time to understand how images are structured and turn into the file systems that containers use. As an example, we will pull nginx. We see several things being pulled and sometimes see some of the elements already exist. But let’s see why. As I said, an image is like a cake with layers. Each of the items being pulled is a layer in an image. Each of these layers represents a set of file system changes. Each layer can add update, replace or delete files. For example, here I’m starting building an image. In the layer 1, I added four files. In the layer 2, I updated the file2 and I added the file5. So at the end, a unioned file system is created to merge all the layers.
The important thing to understand is that all the layers are immutable. When you are building an image, you are simply creating new file system diffs, not modifying previews. And files in the higher layers replace the files from the lower layers. To delete the file, we will create a new layer with a whiteout file. The whiteout file will mask or hide the previous file. So we won’t see it in the unioned file system. The container created from this image won’t see the file, even though the original is still being shipped. Remember, the layers are immutable. So a little bit of terminology now. The lower directories are the layers coming from the image. And the upper directory is a unique, writable space specific to the container, the scratch space. You can see it like a temporary file storage area. You can create an image manually starting from the new container and using manual command at every step. The result of each command will be a layer. For example, you can run this command, docker run –rm -ti ubuntu to create a new container in an interactive mode. Inside this container, I can install Node.js with the command apt update && apt install -y nodejs.
At this point, we can run Node commands with the run node —-version to verify if it works. In another terminal, I will get the ID of my running container and then I can run the Docker Commit command to save the changes as a new image. I can start a new container from this image with the command Docker run and then verify if the node —-version is working. So it works, but for more complicated cases, it could be painful. So instead of building images in this way, we tend to use a Dockerfile. If you run the GUI of Docker Desktop, you will see the details of your previous command.
Creating an image with a Dockerfile (6:40)
But let’s talk about the Dockerfile. So the Dockerfile at the end of the day is the easy way to define and build a Docker image. This file can be included in your source code repository, so you can version it, you can share it with the rest of the team and you can even use it to build the image with a CI/CD system. Some quick reminder, the Dockerfile supports various instructions like FROM – the from instruction, specifies the parent image from which you are building. In the example on the right, we will start with a Ubuntu image, so we don’t need to install anything regarding the operating system. The work there instruction is to set the working directory, then all the other instructions in the Docker file will be executed into this directory.
The COPY commands allow to copy a file or folder from the host system into the Docker image. The RUN instruction will execute any commands to create a new layer in the image. In the example, the RUN command will create a new layer into the image with all the necessary dependencies. The CMD command specifies the instruction that is executed when the container starts. In the example, we start node.js and execute the index.js file. And when we are ready to build, we just do the build using the Docker build command. Once we have built an image, we can use the Docker image history command to view the layers and some details. We can also see some of those details within the Docker desktop image analysis view. This will also give us insights about vulnerabilities that might exist within our image. There is another great tool used often to dig into images. This is Dive. It allows you to see the file system at each layer and see what files are added, modified and removed. Dive is an open source project.
Image building best practices (8:44)
Now that we have that basic understanding, let’s dive into a few best practices around building images. First, let’s chat about a few quick best practices. Do not include the secret into the images. Mount the secret as a volume, use environment, variables, use a vault, for example. Use only trusted content. So, from Docker, Docker Hub and from your organization. For example, at Docker, we have three kinds of trusted content. Docker official image, verified publisher and sponsored open source project. Please, don’t use the :latest tag (the default tag) when you use an image, it’s too dangerous.
When you are building images, you will have several goals. Let’s talk about the first goal here, reducing the size of your images. Why is this important? Well, smaller images reduce the amount of time and bandwidth required to push and pull the images. Now, a few common ways in which we might accidentally increase the size of the images; let’s see what we can do. Well, you have to clean up as you go here with the line RUN apt install. I’m installing Python and if you look at the GUI of Docker Desktop, you can see that a new layer, the number 7, has been created with a size of more than 300 megabytes. Then, with the line RUN apt autoremove, I will do some cleaning. So, I create a new layer, but the layer number 7 is still here. It’s only hidden. Remember the concept of whiteout files. If you want to save space when you are building an image, in this case, you can chain the command with several lines.
Then, at the end, you will create a smaller layer, you install the files, you clean the installations and remove the files. At this moment, you save the work and create a new smaller layer. So, chain down commands wherever possible to squash the set of file system changes. With this simple change, I save around 66% of space.
Maximizing layer reuse between builds (11:17)
Now, let’s dive into a best practice that focuses on maximizing layer reuse across builds. This has two main benefits. Your builds will run faster and your builds will be faster to push and pull. When you are building images, there is one rule. Try to reuse the layers. When you are working with images, everything is put in the cache, but at every change, the part of the cache is invalidated. If the cache of a layer is invalidated, the layer will be rebuilt, but all the child layers too. Then, let me explain this by an example. I have a directory with the Dockerfile and four big files. And I will create an image, sync the Dockerfile with the content of the directory. At the top right of the screen, this is my Dockerfile. I will build an image with the three first files. And when I run the first build, you can see that the build is copying three files. So, I created three layers for a total of more than 6 gigabytes. With the GUI of Docker Desktop, you can get information about the build, like the duration, for example.
Now, I want to update my Dockerfile and copy the fourth file into the image. At the next build, you can see that the first three files come from the cache and we only copy the fourth file. If I return into the build view of Docker Desktop, you can check that the three layers are in the cache and that the build duration layer is shorter. Now, I’m adding a line into the second file. So, I updated the second file. When I will rebuild the image, the first layer will stay in the cache, but not the other layers and the build will recreate the next three layers. And you can check again, the duration is longer and only one cached layer was used. Another best practice is to separate the setup from the code. With this Dockerfile, every time you will update the source code of the web application and npm install will be run. Because you copy first the source code to the image. The best practice is to install the dependencies first in one layer and as long as you only modify the source code, the dependencies layer will never be rebuilt.
Multi-stage builds (14:05)
Finally, let’s talk about reducing the size of our images by using only the necessary things. For that, we have the multi-stage build. This is a great tool to help with the image reduction. So, let’s see how to use it. At the end of the day, multi-stage build allows us to create a pipeline within a Dockerfile where stages can either build artifact and extract the same artifact from the build stage to use it into the next stage. But let’s see an example. I have a go program. Usually, I would say that the simplest way is to use the current image because I have all the tools to build and run my application. But if I check the size of the image, I can see that for a very small program, I’m coming with more than 800 megabytes. It’s too much, of course.
If I use the multi-stage build, I’m writing first a builder stage with the current image. I build the binary and then I create a new stage with a smaller image. For example, here I’m using a Ubuntu image, which is smaller, but you can find smaller. And I copy only the binary of my program from the builder stage. And this time, it’s a lot better. The size of my image is around 70 megabytes. But we can do better by using a smaller image, like slim images. Docker slim images are a type of Docker image designed to be as small as possible, containing only the essential components needed to run the specific application or service. By the way, you can notice that I reduce the number of vulnerabilities. If your application is a standard static executable, I mean with no dependency, you can even use the scratch image for the last stage. The Docker scratch image is essentially an empty image. It is the most minimal image. You can base your Docker images as it contains nothing, but the bare minimum required to execute a binary. With my example, the size of my new image is less than 2 megabytes.
It’s very important to have small images. You will improve the security of your system because the attack surface is minimal and you will increase the efficiency of your CI/CD, for example. The scratch image is the ultimate example. Most of the time, you will find a slim image that will meet all your requirements. And you can see, in this case, I have no more vulnerabilities. And the last tip, you can use a builder as a builder, the result of a previous image. In this example I will copy the binary from an existing image. So I’m using the layer of another image. So always think about the reuse.
Mounts & Multi-arch (17:28)
There are some useful features related to the Docker file, like the mounts. During a build, it’s often helpful to attach additional storage, volume sketches, etc. to the build. There are four different kinds of mounts that are currently supported. With this example, we have a secret mount that provides the credential to access a NPM repository. When using secret, an additional argument is required in the Docker build command to indicate the source of the secret data. The second mount is a cache mount. The idea is that builds can use this cache to allow builds to run faster. At the bottom of the slide, you can find a link to an example using the cache with NPM.
Another great feature is the ability to use HEREDOC statements in the Docker file, allowing you to run multiple command or even script without a bunch of ampersand and backslash statements. And finally, there are several great features around multi-architecture builds. In the Docker file, you can specify which platform, a particular stage, will use. Here, the build platform variable represents the native platform. TARGETOS and TARGETARCH are calculated from the build platform variable. And then it becomes very easy to build multi-architecture images. So in this example, I will create a multi-arch image for IMD64 and ARM64. At the bottom of the slide, again, you can find another link to a detailed example of multi-architecture. And that’s it. To wrap up on this part, we have a “Build with Docker” guide starting from very simple builds to fairly complex build. It’s a great resource, so don’t hesitate to read it. Thank you for your attention.
Learn more
- Check out the docs on Docker Build.
- New to Docker? Get started.
- Deep dive into Docker products with free learning paths.
- Subscribe to the Docker Newsletter.
- Get the latest release of Docker Desktop.
- Have questions? The Docker community is here to help.