DockerCon

Docker-in-Docker: Containerized CI Workflows

Chris Maki, Principal Software Engineer, Docker

Rodny Molina, Principal Software Engineer, Docker

Recorded on November 16th, 2023
Docker-in-Docker (DinD) is a technique to run Docker containers inside another Docker container. DinD makes it easy to create an isolated environment for each application or service under test in a CI platform. Learn the basics of DinD, its benefits, and use cases in this presentation.

Transcript

Hello. We’re talking today about Docker-in-Docker (DinD) within the context of containerized CI workflows. I will try my best to make Docker-in-Docker a little bit exciting for you today.

Let’s get started. This is the agenda that we have today. We’re going to go through a brief introduction. Then we’re going to describe Docker-in-Docker, obviously. I’m going to explain when you might expect to use Docker-in-Docker. Then we’ll go through how to run Docker-in-Docker and the challenges associated with that. Then we’re going to jump into a potential solution for that. Then we’ll wrap it up by bringing it all together with an example.

My name is Rodny Molina. I’m a software engineer at Docker. Before Docker, I was working as one of the lead developers and co-founders at a startup called Nestybox. Nestybox created a container of time called sysbox. I’ll be talking about that in a few minutes. My partner in crime for this talk is Chris Maki. Unfortunately, he could not make it at the very last minute. So, is anything wrong with the presentation? Something you like? Please feel free to blame him. Okay, let’s get started.

Table of Contents

    What is Docker-in-Docker?

    What is Docker-in-Docker? Probably for most of you, Docker-in-Docker doesn’t really need a lot of introduction as it has been around for a while. But let me just cover the basics to make sure that we’re on the same page. Docker-in-Docker is a simple technique to allow developers to run a Docker container within a Docker container. Simply that. So if you have a process on that blue box there, just a container, and that process is a task, basically, Docker-in-Docker should allow that process to complete the behavior.

    The second is the same way; there shouldn’t be any difference. If things are being done right, Docker inside that box should behave the same way as the one outside. This is conceptually simple to understand. So why are we having this talk in the first place? Things are a little bit trickier when it comes down to the implementation. Because none of the components at the very bottom, that container stack —  none of those components were designed with an application like Docker in mind. They were all conceived from the context of, hey, I want to run my research.

    Docker is one of those applications that is considered system level, because it does a lot of tricky things with a kernel. It requires a lot of stuff  that regular applications would really care about. So I’m just laying the ground there for what I’m going to explain next, which is all the challenges that you have to build in and why those challenges exist.

    Use cases

    Okay, so in use cases, when do you need Docker-in-Docker? Let’s go through that. The most obvious use case that stands out, and I think pretty much everyone here is familiar with CI tools and agents. Probably we need to deal with jobs that require some Docker-related tasks to be completed, right? So it’s natural that you need an engine inside that engine, inside CI tools, inside the agent. That use case is pretty obvious.

    But what isn’t so obvious is if you want a local containerized environment — within the context of this environment — if you’re working on a repo, for example, that has a CI workbook, that is not too complex, but it’s not as simple. It’s something in between. If it’s too simple, you don’t really need to have a Docker engine inside. If it’s too complex, you cannot even port that CI workflow environment to your local dev machine. It’s just too complex. If it’s something in between, you can think of a scenario where you can bring that CI engine or an engine with all the tools already inside it. So you can run your local test and dev environment without having to clutter your machine with a lot of tools that you don’t really care about. So that’s what I mean by local containerized environments that require nested deployments. That’s what I mean by that.

    Another use case is sandboxed Docker environments. This is the environment that would allow developers to create Docker containers that act as sandboxed environments. You can imagine the scenarios. You, as a developer, you throw all the tools that you need for your dev environment — Docker, Kubernetes, dev tests, GUIs, pretty much anything that you need — inside an image. By doing that, now you have a lot of flexibility and portability, because now you can decouple the laptop that you’re working on, and pretty much port it across any other device that you’re running.

    I’m pretty excited about this use case because I think that’s where I want to go personally as a developer. I want to have an image that comprises all the elements that I’m happy with. And they are configuring the cycle of the way that I want to. It just doesn’t make sense for me that I have to switch laptops or go to my tablet or change companies. Then I have to redo everything again that doesn’t make any sense to me. I’m really looking to have something like this in the future. This is a sandboxed Docker environment — just to wrap it up that behaves like a virtual machine. This is actually the main driver for the startup that I was working on before. Use cases are pretty clear in my mind at least. Let’s keep going.

    Running Docker-in-Docker

    How do we run Docker-in-Docker? Let’s start with a super simple demo of the challenges that as a user, you encounter when you try to run Docker-in-Docker in a nested fashion. It’s pretty simple. We’re going to cover some of what I call deployment patterns here. We will expand on those later on. So if there’s something you don’t get here, just be patient, wait for the next few slides.

    Okay, so at the bottom, we’re going to watch all the `docker ps`. We’re going to watch all the containers within the system. At the top, we’re going to initialize the container that has nothing but Busybox. We’ll try to run Docker there. No surprise, it’s going to fail. So the simple example is just to show that Docker by default is not included anywhere. Then you need to write an image. Nowhere, you have to install it. Simple.

    Now let’s do something a little bit more intelligent. Not a Docker CLI image. Really include something. The Docker CLI itself. When we run that, good. We have a CLI there. We do some `docker ps`, which is an instruction. It fails. Why? Because the binary is trying to connect to an engine. The point that I’m trying to explain here is something obvious for many of you, which is that Docker is comprised of two elements of CLI, the engine. You need both of them to operate.

    So the Docker binary right here is trying to connect to a socket. A unique socket, which is where Docker by default is listening. Obviously, in the context of a Docker CLI, there’s no engine. Where can we bring an engine from? Where do we have an engine from? Okay, so this is where the Docker out of Docker pattern starts to make sense. Let me go through that instead. The first thing you try to do is to run a regular container. This is not really applicable. It doesn’t really make a lot of sense for the use case that I’m going to explain. I’m just showing the reason I’m creating that container is to showcase one of the main challenges that Docker out of Docker has.

    So just bear with me. Look at the bottom that we have a container that we just created. The bottom, as I said, is a system level where the Docker engine runs at the host level. So now we’re going to initialize a Docker CLI container again. Notice at this time we’re passing something. We’re passing the socket where the Docker engine at the host is listening. We’re bind mounting that socket from the host into the container. That sounds fishy just to start.

    The problem with that is that when you do a `docker ps`, yeah, things work, but now you’re looking at all the containers of the system. That’s something that you shouldn’t do. Basically, if you try to remove all the containers there, surprise, this is going to work. It’s going to work because you are bind-mounting the socket from the host so you have access to the entire system. That means that don’t try Docker out of Docker personally.

    Now let’s go with the Docker-in-Docker pattern, which is what this talk is all about. Docker-in-Docker does a few steps that you can complete to make it a little bit more secure, like create a network that is specific for the CLI and the engine. This is not really a must-have, but you could do it. You can see the instructions are more complex now. What we just did there is created a container with a Docker image. This image already contains the Docker engine. So you don’t have to search for an engine anymore. You already have a context where this Docker engine is executed. This is a Docker CLI, as we did before, just to compare it with the Docker out of Docker environment. Now we create an engine, and we create CLI to different containers. And things are going to work now.

    And the most important thing, we don’t have access to the host engine, which is something that we were trying to avoid. We’re trying to remove something now, and obviously, it’s not going to work because we don’t have access to the containers that are sitting at the host level. Okay, so that’s pretty much it for this demo, simple, as I said.

    Pros and cons

    So let’s dive deeper into the deployment patterns that I alluded to in the previous demo. As I said, Docker out of Docker reuses the host level Docker engine. And Docker-in-Docker relies on a separate and dedicated Docker engine running within the container.

    Let’s now go through each of them. So what are the pros of Docker out of Docker? Well, within the context of functionality, the fact that it uses the host Docker engine is convenient because it saves space. It expedites the build actions because you’re relying on a shared place where all the images are going. And you are not relying on every single container pooling the images. So it’s natural to think that things are faster and more efficient, but there’s a big cons there.

    As I said before, you have direct access to the host Docker engine. And if that’s not enough, you break functionality like fastest in a share. You know, just the fact that you live in a different bind mount space doesn’t allow you by definition to bind mount things from the host, which is in a different environment, a different context. So when you try to do a `docker run`, as we import some fastest and path, that obviously won’t work. It won’t work because you’re not in the same environment, same context. But as I said, the main problem with Docker out of Docker is security. It’s not definitely recommended for shared public environments.

    Now this actually is a little bit of a repetition, so I’m going to go quickly through this. Docker out of Docker, you do a `docker ps`, you have access to the entire environment. And the inner container can easily kill all the containers in the system.

    Okay, now Docker-in-Docker, what are the pros and cons? The pros is that it’s easy to use because it relies on a DinD image that it’s already packaged for you, and actually contains both the engine and the CLI. The example before, we split them just to compare with Docker out of Docker, but in reality, within that DinD image, you pretty much have everything that you need. So it’s pretty easy to use.

    Another good thing is that now you finally have in there an outer context from the Docker stable. You have separate engines working independently. This addresses some of the Docker out of Docker security concerns, so that’s also a pro. Now what is the big challenge with Docker-in-Docker that requires insecure, privileged containers? I know it wouldn’t go for many organizations. I wouldn’t suggest that. I wouldn’t recommend it either. So yeah, it’s definitely not a feasible solution.

    This is again, a little bit of repetition of what we did before the demo. We created the network — that is not really a must-have. We started the Docker-inDocker container. We created the CLI container, and we exec into it. And then we realized that we don’t have access to the host containers, which is good.

    Now the main challenge that Docker-in-Docker has is the fact that you’re relying on privileged containers. The problem with privileged containers to summarize is that the root user in the container is exactly the same root user on the host. If that’s not enough, all the capabilities, all the kernel capabilities are assigned to the process that run within the container. So you can think of if somebody, if a process were to escape that GL environment with the DinD container runs, that process would find itself with all the capabilities, exactly the same rights that a root user has in the system.

    So you can pretty much do anything. So that’s why it’s definitely a no-go for many organizations. Now even if you don’t escape, even if you don’t have the system to escape and go to the host, even within the container, you already have direct access to the host devices. So you can mount stuff from your root hard drive. You can bind mount into the container. You have root access to those files. And just to top it up, you also have a read-write access to the procfs and the sysfs systems. So you can literally write to the kernel, a root from the kernel.

    Next, I have a little bit of a screencast here. So, I initialize the privilege container. I’m looking at the procfs system. I’m asking what is the UID map? You know, it means what is the UID that the process of this container is executing with? As you can see, the first column is a zero, the second one is a column, and it’s a zero, too. What it means is that the process is running with UID zero in the container, which maps to UID zero on the host system.

    We’re looking at the capabilities. So we can see there the processes running with all the possible capabilities in the system. All those FFFs that you see there. And finally, as I said, because you have read-write access to the kernel through the procfs and sysfs, just by writing two bytes into those files in procfs, you’re literally restarting the entire system. Just one command, you’re restarting everything. So that shows how insecure the solution is. Okay, enough.

    Sysbox

    So let’s talk about potential solutions. How can we deal with Docker-in-Docker in a more secure fashion? How do we do it? So, Sysbox is one possible solution. Let’s go through what Sysbox is.

    Sysbox is a new container runtime. It’s a new runc. It operates below Docker and Kubernetes, even below containerd and CRI-O. So, it stays really low level. That means that the user doesn’t really need to learn another tool. There’s no need to operate with Sysbox. You know, Sysbox will go through the OCI spec and listen to the requirements that come or the request that comes from above. It would just operate on that. The user doesn’t need to learn anything new. That’s an important point here.

    What Sysbox does is that it enables more workloads than the regular containers. And it does that seamlessly and securely. How does it do that? Well, it offers more isolation through the utilization of, always, virtualization techniques. For example, all the containers that run within Sysbox are relying on the user namespace. All the processes that they contain are assigned in an ID space that is unique to that environment. So, that is important. And as I said, functionally speaking, Sysbox allows you to run privileged workloads, no containers, privileged workloads. So, that’s the reason that things like Docker or Kubernetes can run within Sysbox. As I said before, it fully integrates with Docker and Kubernetes. So, you can run it as another runtime, like runc, like any other. And, finally it is open source, that’s important.

    Within the context of this talk, why are we talking about Sysbox here? Because if you use Sysbox in a Docker-in-Docker environment, you have certain pros and cons. Let’s go through the pros first. First, this pattern is conceptually simple. It matches the traditional Docker-in-Docker scenario that we talked about before. It offers strong isolation through Linux user namespaces, through the virtualization of procfs and sysfs. We do some syscall trapping for certain specific syscalls on the control path, like mounts. So, we stay out of the data path, you know, so that we don’t hit the performance. We do some user-ID shifting to make sure that user namespaces are not an issue when it comes down to filesystems. And all the things. But when it comes down to the main pros, the most important thing is that no privileged containers are required. You still can run the same workloads, but you’re containing the things that you can do within the container.

    Another important point is that this works on bare metal or cloud VMs. There’s no need for nested virtualization. Because someone could say, why didn’t you do Docker-in-Docker with micro VMs, for example? That’s a valid solution, too. But you’ve got to be willing to pay for that. And, not only that, not even all cloud vendors offer that option. And actually, sometimes you don’t really need to go all the way into the security fashion. You sometimes want something in the middle. One thing that sysbox offers in comparison with micro VMs is that it’s pretty easy to use. Just Docker run, that’s it. You don’t need to create images. There’s a lot of complexity going on with micro VMs in my opinion. What are the cons? The main one is that it requires relatively modern Linux kernels. It’s relatively simple, and it currently supports a limited number of Linux distros.

    Let’s bring it all together. We talked about what Docker-in-Docker is. We talked about the use cases. We covered how do you run Docker-in-Docker? And we saw the challenges associated with that. We brought up, we mentioned it, a possible solution from that. And we bring all the pieces together with a demo, in an example.

    Put it all together

    The first thing you can do to connect all those components that we talked about is create an AMI, a VM image, that includes all the components that I’m talking about. That’s what we have done at Docker. We created what we call DinD AMI, that pretty much satisfies all the components we require. In terms of distro, in terms of kernel. All the things that are must-have for the components that are living inside, we already are admitting with that image. That image is ready for you to use anywhere. Of course, it includes Docker Engine, the CLI, the main Docker plugins, like BuildX, and so on. And, of course, also install Sysbox, it preconfigures, so everything is ready for you to use.

    Up to this point, we’ve talked about Docker-in-Docker, but we really haven’t said much about CI. You might be wondering, what is the CI part here? So one of the things that I’m personally vouching for is that this model, what this DinD AMI with all the components, could offer you is a way to reduce the cost of your CI environments. Because while we are proposing this, instead of relying on the current paradigm, which is, if you need a runner, you need a VM, you could somehow do something different. You could say, I’m going to dedicate one VM for a number of runners. So, before you jump down my throat, this is not something that I recommend for everyone. This is not for all multi-tenancy environments. I’m vouching more for a soft multi-tenancy environment where an enterprise has a few teams and those teams within the organization are pointing to a bunch of repos, somehow you want to reduce the cost of your infrastructure. You can, instead of renting 20 VMs, you can maybe just go for 10 and reduce the cost somehow. Some of those VMs obviously will need to be a little bit larger, but still on the price, the scheme is still going to save money there. That’s the use case that I’m specifically touching upon.

    GitHub Actions containerized runner

    Let’s go forward. The example that I have in mind that explains all that we just described is GitHub Actions, containerized runners. So, currently, self-hosted runners work best at host level. And what I mean by that is that Dockerized runners do not support CI pipelines with Docker related steps. Notice that I’m talking about Dockerized and not Kubernetes environments or runners, right? Because in the Kubernetes world, there were some fixes over the last few months to basically spawn this Docker container somewhere else as a pod. And they are sort of bypassing and working around some of the complexities that Dockerized runners have. Actually that’s not Docker-in-Docker anymore. That will be probably Docker and Kubernetes. But the important thing to keep in mind is that both in Kubernetes and in Docker environments, you’re still going to need privileged containers or the Docker out of Docker approach to run the engine. There’s no way out of that. Still going to need that.

    So, the challenges that I just highlighted above. What it tells you is that we have a problem in terms of cost because we’re still dedicating a VM for every single runner. And that’s not the most efficient way to maximize your compute resources. That’s the whole point.

    Okay, so this is how the solution looks like. You start from the bottom, but you have the Docker DinD AMI that we talked about. You have the virtual machine. You have the containers stack — that gray layer that you see there with Docker, containerd, and Sysbox. This is not runc now. We have Sysbox, and Sysbox will instantiate those runners, and those runners will talk to repos.

    In this pattern that I just highlighted in this slide, each runner is talking to an individual repo. But that’s just an example. It doesn’t have to be like that. You could have another environment in which, for example, you have three runners, and you’re pointing to the same GitHub repo, and you’re doing load balancing for jobs there. Another use case could be, you could have two runners, and you could have room for a third one, for example. The third one could be a debugging CI runner. Yeah, who doesn’t love to debug CI issues. So, instead of going to a different VM to debug that problem, just stay in the same VM. You just instantiate a runner right there. And you avoid impacting your peers. The other runners will keep on working. You just put a label when you create a runner and say that runner is going to call label X. And when you create your dev workflow, you use that label. Your job will go straight away to that, and you’ll leave your CI environment untouched. So, from a debug standpoint, it makes sense, too.

    Demo

    Let’s go through a demo now to showcase what I just described. Okay, so this is a Linux VM, my dev environment. Basically, I rely on Linux VM for most of my stuff. This is a relatively reduced kernel. There’s no containers as you can see. And what I’m going to show you now is a repo. This repo is a clone of an existing project that is pretty successful, that does GitHub Action runners. What we did is we cloned this project, and we specialized it for Sysbox. What I mean by that is, if you git clone that repo, what you’re going to get is an option to create an image that already contains the GitHub Action runner, with all the steps that you need to set it up. Everything is automated for you. And the advantage that this specific repo has is that it is for Sysbox, specifically.

    Okay. What we’re going to do now is to git clone that repo. As I said, that repo has a way to build the GitHub runner image. It also has a shell wrapper that I’m going to show you in a second. What that shell wrapper does is simply wrap the Docker instruction that you would need to create the runner. So that, but that repo is bringing us the image. It also allows you to just have a wrapper on top to simplify the creation of the GitHub runner from your dev environment, from your virtual machine.

    So let’s go ahead and run that wrapper. Now, this is one thing that you’re going to need. Now, we’re back into another repo. This is not the one that we cloned. This is the repo that is going to have our dev things, our CIS workflows. Everything is going to be in this repo. And in order for me to create a runner from my virtual machine, I need some tokens. Something that allows me to authenticate to GitHub.

    So how do I do that? GitHub allows you to create tokens for that, obviously. Let’s go through that. Settings. Click a Linux, and there you go. This is the token that we need right here. Notice all the other steps that GitHub publishes for the users to follow. None of that is needed. All that is already automated within the runner image that we prebuilt before. So you can skip all that. Actually, there’s even a way for you to avoid needing to generate the token. If you use the personal access token, then you could do pretty much anything, as long as you have the rights within that repo.

    So technically, you don’t even need to do this. But let’s just go with this example. You get this token, which is all you need. And now, do you invoke this CLI wrapper, which is nothing but a Docker command. Notice that what we’re passing here is the name of the runner. We call it GitHub Action Runner 1. We pass the organization, which is the one that I was working on before, the repo, which is this package here. And then we pass the token. That’s pretty much everything we need to generate the runner. The runner’s created. Let’s look at the locks associated with that runner. See that GitHub Action is coming up. We already authenticated. And now we’re listening to the jobs that are going to be coming from GitHub. Let’s take a look at the GitHub side. And let’s make sure that the runner is already existing there. There you go. It’s already ready. And now let’s try to work a workflow. This is a workflow that I created that simply has a Docker job. Basically, you’re building the image and you’re pushing the image. Something that currently today, as I said before, is a problem in Dockerized environments.

    Now let’s try to run this workflow. I made some changes in that dev branch where I have my dev workflow. So, we’ll see that right away here we’re going to get a job. Coming from GitHub. There you go. Running job. This is the usual stuff. Nothing here that you haven’t seen before. If you have worked with GitHub Actions. The workflow is going to start executing. It’s going to take a few seconds. I could have picked a simpler task. This one takes a few seconds. So we’re building the image right there. And we’re going to be pushing in a second.

    There we go. Boss check out. Complete. Okay. Just to make sure we’re on the same page. I’m not tricking you. I’m going to show you in fact that’s exactly what happened. We’re going to accept the runner. I’m going to show that there’s no running containers anymore. I could have shown you when I was running, but I didn’t. This is the image that we were trying to build in the first place. So this actually happened. I’m not lying to you.

    So this all makes sense. I mean, we created a runner within the dev environment. Now, what is the different thing that I’m vouching for here? Well, you could create more than one runner. That’s the whole point, right? So let’s try to do that. Let’s try to create more than one runner. We’re going to leave that runner right there. And then we’re going to call exactly the same script that we ran before. With a for loop. We’re going to copy and paste the token that we used before. All of a sudden, in literally two or three seconds, you’re going to have four runners running your machine. It’s that simple. And as I said, they’re isolated in the sense that they all have their Docker engine, processed within that jailed environment, they are well more contained than if you had a privileged container, obviously. And still, you’re not paying for the cost of having a virtualized environment. These are runners. You have four runners right there. And they are all now ready to execute jobs in parallel. So yeah, that’s pretty much it. That’s what I had in mind for today.

    Q&A

    Are there any questions that we might have? We still have some time. Okay. Could this be adapted for self-hosted GitLab, the shell script? So GitHub, our GitLab has its own version of runners. Can it be adapted to Sysbox for that as well?

    Yeah, absolutely. I don’t see why not. Actually, we already have Sysbox. My understanding is that a lot of people are using Sysbox in GitLab environments. So yeah, you could definitely have a GitLab runner being powered by Sysbox. That was actually our first use case before GitHub runners came up. So yeah, absolutely. Thank you.

    When you have all these layers nested together, is there a kind of lag, maybe for like file access on the host machine that causes a noticeable difference?

    You know, when it comes down to I/O and performance in general, I would say this is as efficient as it can be. You’re running the same OS. You’re using the kernel. The alternative to that is to have a virtual machine, or have a cost in performance.

    So when it comes down to throughput, probably it’s definitely going to be more efficient than if you were to have virtual machines in a nested fashion. Is that what you’re trying to get at? Yeah, sort of. Sysbox doesn’t really introduce that much additional overhead in terms of that. No. No. I’m sorry. I got your question now. No. Sysbox is just a container solution. It’s just another runc. So everything is running with the same kernel. You’re not adding any layers. You’re just adding more interfaces. So no.

    If you have more questions, please come join us at the Innovation Lounge. I’m going to be hanging around answering all the questions that you might have. Thank you for being here.

    Learn more

    This article contains the YouTube transcript of a presentation from DockerCon 2023. “Docker-in-Docker: Containerized CI Workflows” was presented by Chris Maki, Principal Software Engineer, Docker and Rodny Molina, Principal Software Engineer, Docker.

    Find a subscription that’s right for you

    Contact an expert today to find the perfect balance of collaboration, security, and support with a Docker subscription.