DockerCon
Dockerfile: From Start to Optimized
David Karlsson, Technical Writer, Docker
Transcript
Hello and welcome to Dockerfile: From Start to Optimize. I’m David Karlsson. I’m in the documentation team at Docker. I’m a technical writer. I’ll be talking about Dockerfiles and how to build with Docker and create an optimized Dockerfile from scratch today.
I joined Docker about a little over a year ago. And I’d used Docker before that, comfortable using Docker Compose, building images. And I thought I had a pretty good idea of what that meant. But this talk is a summary of the things that I’ve learned in the past year working very closely with the maintainers of Moby and BuildKit and learning all the tips and tricks that I didn’t know back then.
Table of Contents
Agenda
The agenda for the talk today is going to be an introduction to what is build, what goes into build, what are the components used in build. And then I have a small demo project, which doesn’t contain a Dockerfile at the moment, but will create a new Dockerfile from scratch. And then we’ll take that Dockerfile and see how we can optimize it to make builds run faster, make the image size smaller, things like that. Towards the end, we’ll look at more advanced features, well, not necessarily advanced. But there are features that you can use to extend the flexibility of your build, or add variety to the build, or do things that you might not know that you could do with Docker build, like testing. So let’s get into it.
What’s a Dockerfile?
It all starts with the Dockerfile. The Dockerfile is the build instructions that executes the steps of your build. And it’s a DSL text file. Syntax is kind of like SQL. It contains a series of instructions that are executed by a builder to create an output. Basic. If we zoom out a little bit, we have the Dockerfile being actually a frontend in all of this, which is hilarious to me. It shows you how far in the backend my build team colleagues are. But, anyway, it’s a frontend, apparently. Then we have the Buildx, which is the command-line client, that’s embedded with Docker Engine and Docker Desktop. They’re used to invoke your build.
When you run Docker build, you’re using Buildx. And then we have BuildKit, which is the engine that’s actually running the build. We’ll talk mostly about the frontend today, but we’ll be using Buildx to invoke the builds. And then we have the build context. This is just something that I wanted to mention before we get into it.
Build context is different from a Docker context. If you’re familiar with that, a build context is a set of files that the build has access to that you pass in. If you run the docker build command on the command line, that’s the positional argument that you pass in. A lot of times, that’s going to be a dot signifying the current working directory. So you’re passing in the current working directory as a set of files that the builder has access to in that case. But it can be other things too. It can be a Git URL, in which case, the BuildKit builder will clone the repo directly. And you’re not sending anything from your local machine. It can be a tar ball. Some people use that. You probably won’t.
Creating a Dockerfile
All right, so that’s enough for the intro. Let’s get into creating a Dockerfile. First, a quick look at the sample project that I have. It’s a Go project, a small HTTP server with an API that you can invoke requests towards. And it’s going to return names and birthdays of users, given an ID. So, a very useful application. Okay, let’s create a Docker file for this. Creating a Dockerfile, the file name is Dockerfile, right? The first thing that I always like to do in my Docker files is to add this line at the very top. This is called a parse directive. It’s specifying in this case that you want to use the latest version of the Docker files syntax in this file. Adding this just means that you don’t have to upgrade your Docker Engine or Docker Desktop in order to get access to the latest features of the Dockerfile syntax, that will resolve dynamically when you run a build. So great to basically always have this line into the Dockerfile. It doesn’t do anything for the build, though. And that’s the next step.
Base image
Next, I’ll add a FROM instruction. That’s going to specify the base image that I’ll use for this build. My project is a Go project. So I’ll use the Golang Docker official image, which contains all of the compiler tools that I need to build my Go binary. All right, and then from this point on, all of the instructions that I add into the Dockerfile will be executed in a container based off of this image. Having access to the commands that are bundled inside.
Working directory
Next, I’ll set up a working directory in the container. The following commands will execute inside of /src. And then I use a copy instruction; I copy all of the files from the build context. Again, the files that I copy specify the set of files for the build into the current working directory in the container. Now we are in the right working directory. We have all the files inside. We can run our build commands. I’ll first download the dependencies with go mod download, and I’ll go build to create the binary. And then finally, I’ll specify an entry point for my container, which is just going to say, this is the binary that’s going to get executed when you run a container based off of this image. Okay, that’s it.
Now we can build this image. I’ll use the DC demo tag. And the output that you see here is output from BuildKit. And that’s showing you all the instructions that it’s executing that you have in the Dockerfile. When that’s done, we can docker run specifying the name of the image. And in this case, I’ll publish some ports to the host. Which means I should be able to execute towards those ports on my local host. Okay, so at this point, we have a basic image working. We’ve successfully built our Go application as a container.
Looking at layers
Now we’ll get into optimizing things. The first thing that we’ll take a look at is layers. How many of you are familiar or know about layers? All right, good. Some do. So layers are what constitutes an image. And also what constitutes a build cache. You can think of it like, roughly, each instruction in a Docker file corresponds to a layer. When that layer builds, that’s creating a layer of the image. Why this is significant for us is when we change something. If nothing is changed, first of all, that layer will be reused by the build cache.
Right now, our image consists of maybe six layers. If we don’t change anything, none of those six layers will be rebuilt. But if we do change a layer, Docker is going to detect that, and that layer will need to be rebuilt, as well as all of the layers that follow. So if we take the cake, for example, if you change something in the yellow layer, we’ll need to scoop up the layer plus the layers on top and then re-bake those layers. What that means is instruction order matters. Because if you change something, you don’t want to rebuild them unnecessarily, some layers that don’t need to be rebuilt.
Let’s look at the current Dockerfile that we have or the build output for the current image. If I run a build with no changes, all of the steps will be cached. Nothing gets executed. If I make a change to the source code and rebuild, the way the Dockerfile is written at the moment is, that’s going to invalidate the cache for the copy instruction and then all of the layers that follow. At the moment, this is very inefficient because we have a go mod download downloading the dependencies after that step. So every time we change our code, we also download the dependencies all over again. In this case, it’s a small project, doesn’t matter much, but you can imagine a larger project downloading dependencies can take quite a while.
We can play with the instruction order to mitigate this. First of all, it’s this copy layer that gets invalidated when we change our source code. If we move that down to after we download the dependencies, that means we won’t invalidate the dependency layer at this point. But, now our Dockerfile is broken because we can’t download the dependencies because we don’t know what dependencies to download. We need the package management files in order to know that. So what I’ll do is I’ll add another copy instruction before that to copy only the package management files — go.mod and go.sum in this case. This means now if I rebuild after changing my source code, the go mod download layer will remain cached every time, unless I change the go.mod and go.sum files, meaning I upgrade or add or remove dependencies. All right, that’s it for layers.
Multi-stage builds
We’ll get back to optimizing that later if you’re curious. Now, multi-stage, let’s talk about multi-stage. For me, in my opinion, this is the most important feature of Docker build ever. That’s probably subjective. But what it’s going to let you do is it’s going to make for a nice and clean separation between build and runtime. Because at the moment we don’t have that. Our base image that we’re using contains all of the Go compilation tools and all of our source code as well, and that’s not necessary for running the image in production.
That’s what multi-stage builds can help us fix. Basically, it lets you select a different base image. Then you can copy resources over from earlier stages into that new stage, effectively pruning everything away from the image that you don’t need. To further clarify what that means, our current image is around 600 megabytes at this point. Large. And the binary, if we just go build the binary statically, is 8 megabytes. So there’s some room for improvement.
Let’s add multi-stage builds to fix that. First, let’s separate the steps that we have, and add another stage at the bottom using a different base image. And then I’m going to use a copy instruction with a flag –from, specifying the name of the stage that I want to copy from. And then the file or files that I want to copy into my current stage. So we have the base stage, we run all the build commands in the image stage. We run no commands other than copying over this finished binary from the earlier stage. Now we can build that application, and we can compare the sizes of these two images, 25 megabytes versus 600 megabytes. So about 5% of this size. The new image that we built, that’s two layers now. It’s the base image layer, the Alpine image, plus the binary that we copy in.
Cache mounts and bind mounts
Next, we’ll take a look at mounts. There are different types of mounts that we can use to improve the performance of our builds. We’ll cover cache mounts and bind mounts today. Cache mounts let you use persistent storage for your build cache. And it helps keep packaged cache. Even when you do change dependencies, you don’t have to re-download everything if you only change one or two dependencies. Bind mounts are a more efficient way to copy files into the container used for building. So our project is a Go project, and I want to add a cache mount for the Go modules that I’m using.
After browsing some documentation, I learned that the Go module cache directory is /go/pkg/mod for the image that I’m using. That’s the directory that I want to set up a cache mount for, so that anything that’s inside of that directory is persistently cached. To do that, we go back to the Dockerfile. And in the go mod download step, we’ll add another flag here, –mount, type cache, and then target the directory of the one where I cache. This is going to let the go mod download command write to the cache mount. Then we’ll add the same thing for the build command, which lets the build command read from the same cache. Now if we do change one of the dependencies, let’s say, we upgrade to a new version of echo that I’m using. All of the other existing dependencies, the cache will still be valid for those or still be used for those. So this is great for performance.
Bind mounts are a way to mount context files over to the build container. It’s very similar to copy, but it only mounts the files temporarily. So it’s good for build steps where you don’t need those files in the final container. It’s more efficient, granted, probably edge case where you’re looking for that type of efficiency maybe in your builds. But anyway, it’s good practice to use bind mounts for this case.
Going back to the Dockerfile to add bind mounts, again, we’ll use the –mount flag. And we’ll add the go.mod and go.sum files to the go mod download run instruction. So this is binding those two files from the build context directly to the container without copying them in. Then we can actually remove the copy instruction before that because we already have those files mounted. Next, we can add the same thing for the build command. This time I’ll just mount the entire directory. I can emit the source. I just use target dot current directory. And that’s going to mount all of the build context into the container. And then again, removing the copy instruction here. All right. This Dockerfile is looking pretty good.
Build arguments
I’ll deviate from optimization for a moment and talk about build arguments. Build arguments are a nice way to, for example, be able to inject values at build time. If you’re building a binary, you might want to be able to print the version of the binary around startup or on dash dash version calls or something like that. Build arguments are a nice way to do that. We’ll also look at how build arguments can be a way to conveniently manage your versions of packages in your Dockerfile.
What we’ll demo here in this case is like the version printout. When I start the server, it’s going to print the current version that’s running. And we want to control that with build arguments so that we can set that version when we build a binary, rather than hard-coding it in the source code.
To add build arguments in the Docker file, we’ll use the arg instruction or the arg keyword. And here I’ll create the build argument version. I’ll set a default value to 0, 0, 0. Then I can consume that argument in my instructions here. In order to overwrite a variable value in Go, you use linker flags. That’s what this kind of cryptic flag is doing here. Setting the version variable in the code to whatever the value of the version build argument is. So if we build our binary, or our image rather, using the –build arg flag, that’s the value that will be present when running the container.
Another thing that I always like to do is, I don’t like to sprinkle versions of things around in the Dockerfile necessarily. What we can do is set the version values as top level arguments. And then we can refer to those when we select our base images. So FROM golang, and then I’ll just use the build argument go version. And that’s also going to let me, if I want to, build this image with a different Go version easily without changing the Dockerfile. And you can just overwrite the default value with the –build arg flag.
Testing
Let’s look at something else. We’ll get back to build arguments later when we do multi-platform things as well. But now let’s look at testing. I didn’t know until recently that you could do testing with Docker build. It didn’t connect for me, but apparently you can. And it’s quite nice, because it gives you an isolated environment for running tests. You can run tests isolated like in parallel, which is quite difficult to do otherwise. But using Docker build and using BuildKit, that’s really easy. It helps you create a consistent environment for running your tests. So you won’t have this case where tests are failing on your colleague’s machine, but they’re working on your machine because you’re running your tests in Docker, and you’re all using the same versions of everything.
To run our tests, first I’ll do some chore work here, and I’ll split the current base stage into two stages. This is just to separate that so that we have a base stage, which sets up all the dependencies, and we have a build stage, which compiles the binary. Then I’ll add a new stage here again using the base as a base, calling the stage test. I’ll run a go test command in this stage, executing whatever tests are in my current repository. And using bind mounts and cache mounts to not have to download the packages or dependencies again either.
Now we can run this test with Docker build using — target to select the stage inside of the Docker file that we want to run. And Docker build target test will run and build the test stage. And if the tests fail, the build fails. If the tests don’t fail, the build succeeds. This can be a really nice way to make sure your tests run in a reproducible environment and with better control over what’s happening.
Export test results
Now let’s take a look at exporting results from images. We have our tests. We’re able to build our image as well, which everything’s good. We can add exports to build other things than just the container image. So the first thing we’ll do is add a way to export test results and test coverage report to the file system after running the test.
So again, back to the Dockerfile, as usual. The first thing I’ll do is add a flag to generate a coverage report for my test, and then I’ll pipe the test results out to a file. At this point, we’re creating two files running our tests. We’re creating a coverage report, and we’re creating a test result file. And the next bit might not be idiomatic. I’m not sure. Now I’ll check the exit code for that command, and I’ll print out the results if there is an error and exit and fail the build. Okay, so basically the change that I was doing here was, in the test stage, we’re printing the test results in the coverage to two files. Now let’s make it possible to export those files if we want to archive those test results or coverage report in CI, for example.
To do that, we can add a new stage, and this time we’ll use the FROM scratch image. This is a special image in Docker where it’s an image that contains nothing. This is a completely empty image with nothing in it. The only thing I’m doing in this stage is just copying over those two test files from the test stage. That’s it. And that will create an empty image with just two files in it. The way that’s useful is that now I can run my build command. And I’ll use a -o flag, I don’t know if you can see that, but towards the end of the command, there’s a -o and then a path, which lets me output whatever the build result is to a directory on my machine. I’ll make sure to use the target export test here, which is going to select that empty base image with just the two files. Running this will give us a out directory with two files in it, the test results and whatever this is, coverage report.
Here’s another way to look at what we just did here. So we have our base image, that’s the golang base, Docker official image. We use that as our base for the tests, run our tests, create these two files. And then we create a new stage, which is completely blank, copy over those two files and that’s it. And that’s what we export to the file system. Now you could export this stage to the file system as well. That’s just going to contain a lot of files. So kind of be careful with that. Whatever is in the container, when you use a dash, dash output or -o flag gets exported to the directory that you specify.
We can also use this technique to create binaries, to export binaries and not only build the image, but build binaries that you can upload to get our releases, for example, or run them on your machine. It’s using the same technique, so we’ll use a new stage FROM scratch, copy the build result into this stage. And that’s it. That’s going to let us build and export binaries directly with Docker build, again using the out flag, or dash dash output flag. So here I’m building with the -o flag, and that’s created Linux arm64 binary for me on my file system. We’ll get back to this later as well.
Multi-platform builds
A few more things to cover. Next, we’ll have multi-platform builds. Multi-platform builds are a way to build images that can run on multiple different architectures. By default, when you build an image, that’s going to target whatever CPU architecture you’re currently running on. I’m using a Mac, an M1 Mac. So when I build an image, that’s going to be an Arm64 image. But if you’re an x86 machine, that’s going to be a Linux AMD64 image, if you’re building Linux images.
Multi-platform lets you build images that can run on different platforms then. And this is a pattern that uses a single Dockerfile and creates a single image. But you might have seen in the wild, people using different Dockerfiles for different architectures, or creating different tags for different architectures. And that’s a pattern that you don’t really want to pursue. Multi-platform images let you do that much easier.
There are three ways you could do multi-platform builds. There’s emulation. So emulating the build under a non-native platform using QEMU. You can use multiple different native nodes and run BuildKit, the builder, on multiple nodes, each of different architectures and build natively. That’s kind of a complex setup to use. The third option, which is really great, is to use cross-compilation where that’s possible. And that’s leveraging your language’s or your compiler’s capabilities for generating non-native binaries like clang and llvm or gcc or whatever. With Go, we can do that with Rust. It’s easy to do as well.
We’ll look at emulation and cross-compilation and some of the differences here. With emulation, that’s the easiest way to get going with building multi-platform images. You don’t even have to change anything about your Dockerfile. The downside is that this can be really slow. It can be 10x slower than running a normal build. It depends on how CPU-intensive your work is. But, yeah, sometimes that’s not a problem. Like if your build is really quick, and it’s just a matter of a small difference, then emulation can be a perfectly fine choice. Up to you to decide.
Before we can get started with using multi-platform, there’s one of two things that we have to choose to do. Because the native builder and the default image store in Docker doesn’t support that at the moment. We can either create a new builder using, for example, the Docker container driver. That’s going to let us run and build for multiple platforms at once. Or, we can use the containerd image store, an experimental feature, which changes the default image store to use containerd. And I do recommend doing this. It’s a great feature. I use it every day. So try it out. But, yeah, know that it’s experimental. But that’s going to let you build multi-platform images without having to swap out your builder.
Emulation
To build multi-platform images is easy with emulation. You just add the –platform flag to your build command. And then you specify the platforms that you want to build for. In this case, Linux AMD64 and Linux Arn64. And again, as I said, there’s a performance penalty to consider here because building for my native architecture of this project is 8.9 seconds. Versus building for with emulation is 32 seconds. Not quite 10x, but, yeah, it’s definitely slower. All right, after building multi-platform images is going to look like this. If you do Docker image LS, you’ll have two tags. At the moment, there’s some UX concerns here showing the different platforms. But if you push this image to Docker Hub, Docker Hub is going to show you that this image is now multi-platform. Both of these platforms are listed if you inspect the tag on Docker Hub, which means that if you pull that image on an x86 machine, that’s going to run just fine. So that’s emulation.
Cross-compilation
Now let’s look at the much more exciting way to build multi-platform images with cross-compilation. This is way faster. And you’re building from your native architecture. You’re not running any emulation, but you’re building from a native architecture of your machine and then outputting all of the other different architectures you want to target. It does however require that you change your Dockerfile. Because we need to take advantage of some of the predefined build arguments that exist in Docker build. Listed here, not all of them. Yeah, we’ll use these build arguments to cross-compile our binary. Here’s an example of how these build arguments work. Depending on what you set your –platform flag to, those build arguments are going to resolve differently inside of your builds. Anyway, I’ll leave this for reference. This is also covered in the documentation.
This is roughly how it works. You’ll have your Dockerfile, and you’re using these special target OS and target arch build arguments. Then it’s going to split that stage basically into two and run them concurrently for each of the platforms that you specify. This is going to be really efficient and nice. To cross-compile with Go, in case you’re not familiar, you can set these GOOS and GOARCH variables to the OS and platform that you want to build for. Then run the go build command as usual, and that’s going to create the binary that you specify. In this case, I’m setting GOOS to Linux and GOARCH to AMD64 and that creates Linux AMD64 binary.
Let’s look at how we can leverage this in the Dockerfile. The first thing we’ll do is to pin the build platform that we’re using. This is to prevent any emulation from happening because by default when you run with multiple values in your platform flag, it’s going to try to emulate. But if you add this, then it’s not going to emulate anything. So, that’s going to pin the build platform to whatever the machine is currently running.
In our build stage, we’ll add a couple of those predefined build arguments, which means that we’re consuming these build arguments in this stage, and we can use them. Target OS resolving to whatever the OS of the platform target is, and ARCH, which is our CPU architecture. Then we can set those Go-specific variables to those values, and this is really easy to do, because it happens like these are mapping one to one, at least in the cases that I’ve used it. But, yeah, this depends on the compiler that you’re using.
If you’re using a C compiler, the way to do this is going to be different. But, there’s a helper for that, luckily. There’s a really great project, called XX, by Tõnis Tiigi. If you do a lot of multi-platform builds, and you plan to do that with cross compilation, do check this out because it’s a great project to help you do that in an easier way. This is just a screenshot of the build with cross-compilation. Building with emulation was 32 seconds or something; with cross-compilation, the same targets are 9.2 seconds. It’s only a couple of tenths longer than a regular native build.
What’s really exciting, I think, is that you can combine this with the exporting feature that we just saw earlier, which is that you build multi-platform and export multiple binaries for different architectures all in the same build, just using a cross-compilation. So, in this case, we’re building for two architectures, and then if I inspect the output directory, I have created two different binaries for the architectures that are specified. I can also use this to build from my local architecture. I should use platform local here, actually. But, yeah, this can let me build a Darwin binary as well using cross-compilation, which I can run on my machine.
Conclusion
All right, this has been tips and tricks for building with Docker and utilizing various different features of the Dockerfile. At this point, this is what our Docker file looks like. This looks quite good to me. Here are the features that we covered: layers, multi-stage builds, cache mounts, bind mounts, multi-platform builds, build args, testing, and exporting results. If you want to try this out yourself, check out the build guide in our documentation and go through the steps. All right, that’s it for me. Thank you very much.
Learn more
- Docker Build
- Best practices for writing Dockerfiles
- Docker development best practices
- Get the latest release of Docker Desktop.
- Have questions? The Docker community is here to help.
- New to Docker? Get started.
- Subscribe to the Docker Newsletter.
Find a subscription that’s right for you
Contact an expert today to find the perfect balance of collaboration, security, and support with a Docker subscription.