On-Demand Training

Docker Best Practices

Learn how to bring Docker best practices to your org, team, and individual workflows at any stage of the software development life cycle, across new or existing applications.

In this training, we will review the containerization journey and discuss the best practices for each phase of the SDLC, including triage, coding, building, testing, integration, and deployment.

Thank you for your interest. The Docker Team will be in touch.

Transcript

This is Docker best practices. When we think about best practices in Docker, we can actually think across multiple different levels and different types of best practices. The first one would be at the organizational level. If the organization is going through the process of adding containers to its applications, what does that journey look like and what are some of the best practices there? We can think about the individual team level. If an individual team is working with containers, how do those containers fit into the software development lifecycle? And then think about the individual developer who is trying to create container images and work with containers. What are the best practices they need to be considering? We’ll talk through some of the best practices at each of these different levels.

Table of Contents

Containerization Journey (0:44)
Application Containerization/Modernization Triage (1:47)
SDLC and Containers (4:02)
Dependencies Best Practices (5:09)
Code Best Practices (6:48)
Build Best Practices (10:39)
Test Best Practices (13:12)
Integration Best Practices (15:32)
Deployment Best Practices (19:05)
Docker Best Practices (19:43)
The HEALTHCHECK Directive (22:17)
Docker Logging (23:56)
Learn more

Containerization Journey (0:44)

So at the organizational level, let’s consider the containerization journey. Almost nobody starts containerization with a blank slate. Everybody has a mix of new applications and existing applications that they want to containerize. So the best practice here is really to understand your company’s willingness to containerize applications and create standard processes or factories or whatever you’d like to call it for doing so. So really understanding what those steps are going to be and how the organization is going to do this in a way that’s repeatable and consistent. So in some cases, this containerization process could be paired with an application modernization process. The two things can go hand in hand. Either way, the process needs to include a triage or prioritization step. How do you determine which applications you’re going to containerize and what order or which applications you may not get to at all for whatever reason.

Application Containerization/Modernization Triage (1:47)

So when we talk about triage, there’s actually a number of things to think about, whether we’re thinking about containerization or modernization. A lot of the same ideas apply to the triage process. The first is the business priorities. What are the business priorities and how do they align to containerizing or modernizing this particular application? Is it something that aligns what the business is trying to do or is there another reason that containerizing or modernizing this application is a priority? The next is application knowledge. So especially in the case of applications that are existing that may have been running for a long time, the question is, does anybody really understand how the application works or how it is deployed at this point. If you have an application that has been running on bare metal or VMs for over a decade, who really understands how that actually works at this point. The next piece is the technology stack; how do the technologies use this application align to current application tech standards? Again, if you have an older application, it may not align and you may not have the knowledge base you need to modernize the application and that may also affect how you containerize the application as well. The next one is the application lifespan; is this application still needed in its current form? That may again affect how you prioritize the containerization process. The organizational capacity; regardless if you’re modernizing or containerizing an application, you do have to consider how much capacity do you have at the development, testing, and operation teams to do work on this application versus all their other work. And lastly, there is the cost and risk. What are the cost and risk of running the application in its current state versus the cost and risk of modernizing or containerizing the application? Even if containerization is all benefit, there’s still some cost involved in actually getting the containerization done. Again, most of the time this is going to be a net positive, but it should still be something that should be considered when you’re prioritizing which applications to containerize first, second, last.

SDLC and Containers (4:02)

Let’s move now to the individual teams who need to adopt containers within their software development lifecycle. Let’s think about how that applies through the entire process. So if we look at the SDLC as two parts, the inner loop and outer loop, we can see here the inner loop is where the individual developers are bringing in dependencies, whether those be base images or other libraries, their coding, their building, their container image, and their testing, their application, all in the local inner loop, most likely on their local laptop or desktop. Then at some point, there is a push of code, and this is where a CI process and CD process would take over the integration, testing, and finally deployment of that code, out to production. So let’s now look at each one of those individual steps and think about some of the things that need to be considered or what some of the best practices might be around this.

Dependencies Best Practices (5:09)

So the first one is the dependencies, bringing in dependencies, all developers do this. There are frameworks and libraries and other dependencies that need to be brought in, so you don’t have to rewrite those things. But you need to have a standardized process for accepting or using base images and other dependencies. How do you know those base images are what they say they are? How do you know that they have the right level of libraries and so on that comply with your internal standards? Having a trusted software supply chain strategy that complies with your internal standards and external requirements, legal restrictions, and so on is also incredibly important. So in this particular case, working with containers, you could look at something like Docker official images, which are created by Docker, or we work directly with communities to create them, so you know the provenance of those, and we stand behind the provenance of those. Or you can look at something like Docker verified publisher images, which are ones that are created by our partners, companies like Suse or Grafana or Red Hat or so on, publish official images out on Docker Hub. They stand behind the provenance of those, so you actually know what you’re getting when you’re using those particular images. Another option here is to do your own base images. You can actually go download the software yourself, source code, and actually go build it yourselves and create your own base images that way forward. Either way, you need to understand where this stuff is coming from and then have a way for your developers to get it in a consistent manner.

Code Best Practices (6:48)

The next thing here is code best practices. So the first piece of this is having a clear on-ramp process for teams that want to containerize their applications, going back to that containerization journey. Once you’ve selected this application to be containerized, the development teams need to understand what that process is. Are you actually going to take, for instance, monoliths and recommend microservices as part of this process? Or is it simply you’re going to take what’s there, drop it into a container, and that’s going to be it. There’s just to be processes around this so everybody understands how to do it in a consistent manner. The next thing is standardized container tooling configuration across different teams and organizations. So in Docker Desktop, there’s a number of different settings, and there’s the ability to actually standardize that with settings management so that everybody is using the tooling in a consistent manner. Again, that’s cutting down on the configuration issues or drift issues, trying to debug that kind of stuff and so on.

The next is ensuring the developer environment is both secure and productive. So this is about making sure that if there is malicious code that gets into the developer machine, that malicious code can’t break out and cause further software supply chain issues within the organization. You want to ensure that developer environment is running in a secure manner. This is where Enhanced Container Isolation comes in to make sure not only that the image is not trying to access pieces that it’s not supposed to or the container is, but also to make sure that it’s not accessing VM internals or doing other things that could risk the developer environment.

Next is reducing the number of tools and steps that developers have to take to accomplish tasks so they can actually “shift left.” When people use the term “shift left,” what they actually are saying is give the developer something else to do. And that only works if the developer can be efficient in what they’re doing and actually be able to get things done in such a way they have time to take on a new task. It also is extremely helpful if that new task is not involving an entire new tool, an entire new way of doing things. An example of this is on the security side of things. People often want to do security “shift left” where they pull vulnerability remediation to the developers. When the developers have to go through a whole separate tool and process, that’s very difficult. Tools like Docker Scout actually embed themselves directly into the processes the developers are already doing. So there’s not a lot of extra steps required for the developer to take on that responsibility. Make the inner loop process as fast as possible so developers are more productive. This is a key thing that we’ll be repeating a couple of times is that the faster that inner loop runs, the more productive the developer can be because they’re going to see the feedback faster and be able to iterate faster.

Determine where your development tools will be deployed. You can have your development tools on the host machine or you can actually embed them inside of a development image that you then share around. Both of those are very positive options. You can choose which one fits your particular organization better. Determine when hybrid or remote development makes sense. So development machines have gotten more and more powerful. But the reality is that the software has become larger just as fast, especially if we’re talking about things like AI/ML and that kind of thing. It’s very difficult to run those types of things on a local machine. So there’s going to be times when having part of the environment running remotely or the entire environment running remotely makes sense.

Build Best Practices (10:39)

So when you are building images, here are some things to think about. So developer images should have all the tools developers need to work. Debuggers, compilers, any of those types of things developers need, they should have access to in their development images because it’s going to make things faster for them to be able to have access to that tool. But production images should be minimal size, built fast, operationally aware and highly secure. So those two things conflict with each other and that’s a problem. But it’s actually fairly easy to resolve with things like multi-stage builds. With multi-stage builds, you can actually have a developer image that has all those tools and then also have a production image that removes all those extraneous pieces that you don’t need in production. This also ties into the next thing. Ensure there is not a mismatch between developer and production image structure. So this is where we talk about where dependencies are located, how logging is set up, what the build process is and so on.

So we talked about multi-stage builds being able to give different developer images and production images. What we want to ensure here is that there’s not some kind of fundamental difference between the two that’s going to cause drift or other issues that have to be debugged. The next one is utilize the many Docker build features as you grow to build your build maturity. So there’s caching, there’s how you actually do layering, multi-stage builds we’ve already talked about. Multi-architecture builds is another piece of this, whether you’re going to be doing ARM64 or AMD64 or other architectures. Build orchestration with a tool like Docker Bake or even Build Cloud to be able to do remote builds very fast with shared cache.

The last piece here is very important. Time developers wait on builds is wasted developer time. So we’ve seen studies where it’s up to an hour a day that developers wait on builds and that is simply wasted time because most times when a developer is doing a local build, they can’t do other things on their machine to be productive at the same time. So reducing that developer build time in multiple different ways is incredibly important.

Test Best Practices (13:12)

So this is a local test we’re talking about now. So determining which testing the developers will be responsible for. There is unit testing, which you’re traditionally responsible for. There’s security and policy testing. This is what we talked about earlier with “shift left,” integration testing and so on. So there’s a variety of different levels of testing and determining what you want the developers to actually do and what you’re expecting from them is the first critical piece of this. Have a standard way for developers to create full dev environments. This is also very important. If you want your developers to actually get good results, they need full environments that they can test in without stepping on other people’s toes to really understand how the application is working, test out functionality, tear it down, build it up, all that kind of thing. This is where something like Docker Compose can come in handy or even Kubernetes on the local machine. You can do that with Docker Desktop as well.

Determine how shared test resources – this could be data sets, cues or environments – will be used. There are going to be cases where you need to share data for people to use. Developers won’t be able to create realistic enough data for them to actually be able to test with. And so if you’re sharing this data, how is that going to work? Is everybody accessing the shared resource? And if so, how do you make sure people don’t delete data out from under each other or do other things like that? If you are copying data around, how is that going to work? There’s a lot of considerations here that you have to think about. Ensure development testing and integration testing are aligned. So when the developer is testing locally and the integration system is also doing testing, make sure that how they’re doing those testing is aligned. They don’t have to be the same test. We don’t want to have the two tests just simply be 100% overlap, but make sure that there’s not going to be issues just because the tooling is different. This is where something like Testcontainers can come in very handy. You can help you with creating the dev environments. It can help you both on the developer side and on the CI side of things. And again, make the inter loop spin as fast as possible. Faster feedback means better productivity.

Integration Best Practices (15:32)

So now we’re pushing out to the continuous integration server. What we want to make sure here is first, that the pipelines are optimized for working with Docker. So the CI tools have a lot of very nice features in there for integrating Docker capabilities into them. That can include things like just using containers for the builds, doing things like Docker-in-Docker for being able to build images from within images, doing things like using Docker Compose, Docker Scout, Build Cloud, all of those are possible within the integration build process. So ensuring builds are quick and small. This is another key thing is making sure that somebody is owning that these builds are going to be quick and small. Because these resulting images are the ones that are going to be going to production and you want those things to be as small as they possibly can be. Share the build cache with developers wherever possible. So the build cache speeds up builds. There’s no reason not to share it between the developers and the CI server. This is also where Build Cloud comes in again. It has a natural shared cache for this or you can manually share the cache as well. Ensure consistency and processes between the inner and outer loop for builds. Again, if you’re doing things in dramatically different ways, then you could run into issues between how things are done that you’re then going to have to track down.

So outer loop test best practices. Again, determine what types of testing you will do as part of your CI process. So where is the line, what gets done, where and make sure that’s clear across teams. Ensuring consistency and processes between the inner loops and outer loops. So if you’re doing vulnerability detection in both places, which is a good idea, make sure that you’re not just going to run into issues because you’re using two different tools that do it two different ways. And then that leads into the next issue, which is that developers should have context when there is a testing issue. So if we have an outer loop that is testing vulnerabilities in one way and an inner loop that’s testing vulnerabilities in a different way, and you pass a rejection back from the outer loop that the developers can’t reproduce in the inner loop, suddenly you have a huge problem. So ensuring there’s consistency there without again being 100% overlap is important.

The last one is, are you testing for mean time to resolve or mean time to failure? What we’re talking about here is, are you focused on how quickly you can get back up and going when there is a failure, or are you focusing on ensuring that there never ever is a failure? That’s two dramatically different ways of looking at things, and it needs to be clearly communicated what you’re doing. The mean time to failure type environment is typically something like a medical device, or it could be something like a space program, or something like that, where failure could have extremely serious results. Whereas the mean time to resolve is focusing much more on understanding that we don’t want failures, but at the same time we want to be able to resolve failures as fast as we possibly can.

Deployment Best Practices (19:05)

And now we’re talking about deployment. So deployment best practices – having clear processes for rollbacks, utilizing advanced deployment techniques like blue/green deployments, where you deploy some users getting one change, other users not getting that change to see what the reactions of those users are. Canary deployments, where you deploy to a very small subset of production, just to make sure everything works properly. And then using deployment metrics to measure the overall software development lifecycle. These are things like Dora and space and other metrics as well.

Docker Best Practices (19:43)

All right, so now let’s get into some Docker best practices for the individual developer. So now the individual developer working with images, what do they need to think about? There’s a number of things on the do and don’t list here. We’ll talk about several of them over the next couple of few slides here. So there’s things obviously like not using root, using a specific user instead. There’s things like using multi-stage builds we’ve already talked about using .dockerignore to get extraneous things out of the image, not going in and actually using the latest and so on. So there’s a lot of do’s and don’ts here. Let’s go into a little bit more detail.

So the first thing is each container should do one thing and do it well. There’s no reason to create general purpose images that have a variety of different layers and different technologies included. You should have smaller images that do one particular thing. This is going to help you from a scaling perspective. It’s going to help you from a debugging perspective. It’s going to help you from an ongoing maintenance perspective. The next thing is don’t ship developer tooling into production. We’ve already talked about this at the team level, but multi-stage builds are very important here. Use or create trusted base images. As we’ve talked about before, you can use Docker official images. You can build your own. There’s a number of choices. Pin your images; don’t use the latest tag. The use in the latest tag is very convenient, but it also will break your build at some point. You should go to a specific tag like FROM node:18.16,-alpine3.17 or you can actually do specific SHA sums. This will get you a very specific version of that particular image that will not change. There’s also a build with Docker guide on our website that we would highly recommend you take a look at.

The next thing here is to operationalize your containers from the start. What do we mean by that? What we mean is when the developers are working with containers, they’re not thinking about how they’re going to run in production at that particular point, but they should be. There’s a number of common factors that are needed in a production environment that developers can just learn to do from the beginning, and make things much more consistent between those two groups and ease the path to production.

The HEALTHCHECK Directive (22:17)

The first thing is a HEALTHCHECK Directive. In the Docker file itself, you can add HEALTHCHECK to actually detect application crashes, dependency failures, resource limitations, misconfigurations. How do you do that? You actually set up a command with an interval, a timeout, a start delay, and retries. You can actually run this command just naturally as part of running the container. Let’s see what that looks like. Here we have a Docker file, and we can see the HEALTHCHECK. The HEALTHCHECK has an interval of 30 seconds, a timeout of 3 seconds. What it’s going to do is do a curl on localhost, and it’s going to give a result code. Then I can go and build that image, run that image, and inspect the image. You can see the command here to go and do the test for the health check. At this point, you’ll notice here that the interval and timeout of this particular server are set way too high. What’s going to happen with these low timeouts down here is that this is going to fail. If we go to the next page, we’re going to see that when we inspect this particular container, the status is going to be unhealthy. It failed three health checks in a row. Here is the log showing what happened. All this is included in the stats that are returned by the container itself. This can be extremely helpful for the operations team to know what is going on and get that information right away.

Docker Logging (23:56)

This ties into the next piece, which is logging. If you are going to be logging from a container, which you are, you need to think about what levels of logging are you doing? Are you doing logging at the OS level? Are you doing logging at the middleware level? Are you doing logging at the application level? How are you going to handle those logs? Having a plan is incredibly important. What are you interested in? Is this only of local interest or should it get bubbled up to an aggregation tool? Make sure you can figure if things like log rotation, purge logs are needed. Honor notification levels; if it’s not critical, don’t mark it as critical. Does this really need to be a log message? These are all things that developers can be thinking about or have guides around when they’re creating their containers such that everybody’s on the same page and there’s consistency across the organization for how logging is done. Once you have logging, then you can do something like open telemetry to actually do things like tracing metrics, structured logging, context management, integration, instrumentation, exporting, and forwarding data. This is all stuff that the individual developer can make use of with open telemetry, but then that same capability can be used up at the production level with a variety of commercial tools that all use the OTEL format. There’s a lot of benefits here of actually being consistent between what you do in development and what you do in production.

So this has been Best Practices with Docker. Hopefully you’ve gotten a lot of new areas to think about at the organizational level, at the team level, and at the individual level. Certainly there’s always more best practices out there. I hope you enjoy this presentation. Thank you.

Learn more

New to Docker? Get started.
Deep dive into Docker products with featured guides.
Subscribe to the Docker Newsletter.
Get the latest release of Docker Desktop.
Have questions? The Docker community is here to help.