DockerCon

Demystify Secure Supply Chain Metadata

Christian Dupuis

Senior Principal Software Engineer
Docker

Recorded on November 5th, 2023
Learn how you can create container images with signed SBOMs and provenance attestations using Docker tools adhering to the highest supply chain standards. We cover various types of metadata and principles that underpin a secure software supply chain. In this talk, we use Docker BuildKit, Docker Scout, and GitHub Actions.

Transcript

I’m here to talk to you today about supply chain metadata, what it really means, what it is, how you can use it today. I’ve been mentioning attestations an awful lot throughout the keynote and earlier presentation. And I want to use this opportunity to really go a little bit more in depth and show you what these things are, how you can produce them, and how we can start using them with Docker Scout.

I’ve divided the talk into three sections. First, I’ll talk a little bit about where this is coming from, what is the supply chain. The second section is around showing you a few examples: what are attestations, how do they look? And then, I’ll do a demo at the very end.

Table of Contents

    Supply chain example

    I’ll start with a little example. I’m an engineer; I love coffee, You probably do, too. There’s a whole supply chain involved here, and I want to use this analogy to drive home what we mean by supply chain when we talk about software. There are an awful lot of ingredients that need to go into your coffee to make it really good. And there are these coffee beans down here and someone is putting a fair trade stem on them, right? The way you can think about the coffee with water, beans, the coffee machine, ending up in your cup is kind of like a supply chain. And someone has kind of put a stem on like the beans.

    You are not going to go down there wherever the coffee is grown to make sure that the coffee has actually been fairly traded. You’re trusting that someone has made this assertion for you. So there is a producer involved, and that supply chain down here at the bottom is something that you happily delegate to someone else to make sure that it is actually fairly traded coffee, right? That’s not you doing this.

    In a way, this is a lot of trust here. And you can apply the same principles to software building, right? The water becomes the source, the coffee maker becomes the build in the middle, package, and so on. So this is your process. And down there, there are dependencies that are ending up in your supply chain. It’s not like you’re going to go chase down each and every dependency and its transitive dependencies. Again, you’re trusting that someone has done that job for you and asserted their security.

    But where’s the risk coming from here? There are a lot of places in the supply chain where things can go wrong. Like, is this dependency really coming from the publisher that you think it’s coming from, or has someone taken over a GitHub repository, for example, and kind of impersonated an artifact? Who is the producer? Is your build environment compromised? Is your source being tampered with? Instance to thought, all of these things can really be attacked across the supply chain.

    How do you make sure that this is all consistent and falls in with what you expect? How do we try to solve that problem? Well, there are a couple of things that we can do here. First, we can document what’s in your artifact or what is in the artifacts that you consume ideally. So that’s where the SBOM comes in or software build materials. It’s a little bit like the ingredients on your cereal box, right? It makes you understand what is supposed to be in that artifact that you’re consuming or that perhaps you’re building yourself. Then provenance is all about where a particular artifact is coming from. How has it been built, by whom, in what environment, when, how can I potentially reproduce that build?

    Provenance and policy

    All of this is captured in provenance. All the materials that went into your build, your base image, your build environment itself is being captured, hopefully inside that provenance attestation. And then lastly, it’s all about establishing trust, right? It’s not just enough that someone publishes these artifacts, these attestations with these artifacts. You also need to be in a situation where you can verify that these things are coming from an authority that you can trust. That’s where signatures come in. And I want to talk a little bit about that as well. Signing attestations is really key for software supply security. Signing them and verification, that’s really a key aspect.

    Once you have this data, once you have SBOMs, provenance, and ideally you have them signed, well, you have a bunch of data. But the question is what can you do with that data, really? And that’s where policy comes in. And again, repeating what we said in the keynote is about allowing you to create corporate standards of what you expect, in this particular case, your artifacts to have. Are they signed? By whom are they signed? Where have they been built, potentially on GitHub? Who has signed them, and so on? So that’s where policy comes in and where we can kind of start enforcing and evaluating policy. So in a way, I’m a cyclist, road cyclist. I live in Germany, and that’s kind of the little map here.

    Following on with another analogy, policy is really all about what, to me, when I cycle my GPS is, right? There’s an awful lot of data that I need to navigate while being out in the wild. And I could do that on my mobile phone while writing, but at the same time, I would get all sorts of notifications. I would get all sorts of noise, and what I really want to focus on is cycling. And I only want to navigate. So with policy, that’s exactly what we’re trying to do, right?

    Trusted content

    Based on data, we want to help you just focus on what is really important to you. These satisfactions become input. We evaluate policies on top of that, and hopefully down the line, we drive remediation. And that’s kind of really what we mean by like, get the data into Docker Scout or into kind of your supply chain, and then start evaluating policy on top of that. This is a little sandwich diagram of the three things that make Scout Scout. So we have trusted content at the bottom. And if you’re a Docker user, you’ve probably used the Docker official image. And that’s part of what we call trusted content.

    We have three trusted content programs, Docker official images, Docker verified publishers, and Docker sponsored open source. Without a secure base, you can’t really have a secure software supply chain. So we — the Scout team and Docker — are heavily investing into that lower section of that sandwich diagram by adding signed attestations to those trusted content images. Like let’s take Node or Alpine, for example, do we expect in the near future to have attestations for these things? So that you know this is authentically coming from Docker, we assert that a particular package is in that image. We assert that this image has been built on GitHub by this authority, by this workflow against this Git commit chart. So it really allows you to make strong assertions on where these images are coming from. And then in the background on top of that is what we call our system of records.

    So all the data that comes in, and in this particular case, for this talk relevant here, is SBOM package data and provenance data and signature. All of that data is being stored and then allowing you as a user of Scout to evaluate policies on top of that data. Things like: Is my image signed? Or do I have all the required attestations attached to my images? These are things that Scout can help you kind of observe and hopefully remediate and improve in the future.

    Attestations

    Let’s jump into building images with attestations. I don’t know who has looked into this with buildkit. But that’s effectively all you need to do to start building provenance and SBOMs and SBOM attestations. An SBOM equals one parameter and a provenance equals one, which are shorthands for a longer parameter. Don’t ask me why you need the one. That is legacy in the Docker CLI. They don’t do just flags. You always have to have a value as well. It’s really weird. But, again, this gives you SBOMs and provenance. It’s not signed at that point, but it’s very easy to get started.

    You can do this on any CI system. You can do this out of the box with GitHub actions with our build and push action. There’s a little flag on there. You can just enable that, and you’re good to go. What happens if you do that? Well, a couple of things. So this is an example and I’m using the BuildX image tools “inspect” command to look at the image manifest.

    There is a lot of JSON here, but the thing that I want you to take away is that when you previously built an image and pushed that into the registry, you are now building an OCI index that first references an image up here. This is your stock standard Docker image or container image.

    There’s a new image down here with an unknown architecture that is effectively referencing the original image up here by digest. And that’s the attestation. So it has been attached to your image. It’s part of the OCI index. And therefore, it’s backwards compatible with all the OCI registries. We will be looking at supporting reference types when they become available in the future. So a new part of the OCI spec will have reference types so that you can have an external artifact being attached to something else. Given that it’s not widespread, it hasn’t been released as a final version yet, and it’s not available in all registries. Buildkit is producing these OCI indexes, which work across all registries at this point.

    Let’s dig in a little bit deeper. Let’s look at this attestation image. This is what you get when you’re looking at one of those attestation images. You get two layers. Normally when you look at a Docker image, you get a layer per Docker layer and that’s your binary content. It’s a table of your filesystem, right? Overlay filesystem. In this particular case, it’s not an image. It’s one layer per attestation. And which attestation this is is dedicated by this predicate type. So you can see this is an SPDX document, which is one of the standard formats for SBOMs. So in this case, that layer represents an SBOM and this layer down here, the SLSA provenance, is a provenance attestation.

    Let’s drill into one of these attestation layers. So, now I see that this is really an in-toto statement. And an in-toto statement is a way to bind a subject. Like, what do you want to address to a particular predicate? In this case, we are effectively addressing two tags and you can see this down here. This is the latest, and there is a different tag, a commit chart here. So these are the subjects that I want to attest some content to, and here I have just removed everything.

    This is where the SBOM would go. So this is where your standard SPDX document would go, for example. This gives you an OCI index with an attestation image with provenance and SBOM attestations. If you’re building multi-arch images, of course, this is just going to grow. So you have one image for like AMD64, and then you have an attestation image, and then you can have an Arm64 and an attestation image. So they always go in pairs. And you can copy them around, right? They move, like, when you pull these images, they’re getting pulled down. When you copy them into your own on-prem registry, they’re available there as well.

    OpenPubkey

    Now something really interesting has been pretty silently announced during today’s keynote. But Docker has finally started to sign these attestations. And we’re going to be using a technology called OpenPubkey. And that technology has been pioneered by a company called BastionZero, with whom we are partnering. And what’s really interesting for developers and hopefully in this audience is that this is a zero-configuration signing solution.

    So the goal of this exercise is that at the end of the day, if you’re using GitHub actions to build your Docker images or an environment where another OIDC provider is available, which is on Amazon, you can get that, on Google, you can get that. You don’t have to do anything. And those attestations will just start to be signed by the fact that you’re running that on GitHub, for example, by the fact that you’re running this on Google’s cloud build, for example. And so we announced the partnership with Bastion Zero today.

    We also announced an initiative in the Linux Foundation. So we donated the whole source code into the Linux Foundation. So start looking into this. This is the GitHub repo and the GitHub organization where all this code lives. And what we did announce is not a final product. This is a start of a journey, which we want to involve the community, because this is going to be a lot of work to get trust established. You need security researchers to validate the proposal and so on and so forth.

    But, I want to take the opportunity to show you what’s going to change when you start using OpenPubkey with, for example, a buildkit. Remember previously we had an in-toto statement that had a subject and a predicate. Now we are replacing that statement with an envelope, part of the in-toto spec. And that has a payload, which again is your in-toto statement, base64 encoded. And then in the signatures array, you’ll find an OIDC OpenPubkey signature.

    And that signature, and I’ll show you this in a minute. You can do a lot of assertions. It’s more than just verifying that a thing has been signed by a certain identity. This is coming out of GitHub Actions. There is a lot more in there.

    So, for example, we can verify tags. We can verify the name of the GitHub Action workflow that was produced. It produced this — is that what you expected? We can verify, and this is the repo’s own ID here. GitHub keeps a unique ID per every organization. So even if you delete your org, no one else can name-squat that org. So it’s a very safe assertion to say, this is the Docker org. I know exactly that this is coming from Docker at this point. And GitHub is standing in here and making sure that, okay, this can be trusted.

    The provenance attestation contains the Git share, as mentioned earlier. And the signed OIDC contains the same information. So we can compare that these two things actually match. So, the thing that is in my image was actually the thing that GitHub cloned for us, right? Making sure that there is a tight connection here, and the thing isn’t dirty. It’s really important. It allows you to do a lot of strong assertions on this.

    The thing I want to highlight here, a lot of signing solutions currently around container signing, they make verification somewhat of an opt-in, right? So you can always bypass. So you end up with a signed image, signed attestations. But you’re not really kind of, I wouldn’t say forced. But opting in is not a very strong security posture, right? You want to make sure that your developers — every time they do a Docker pull, a Docker run — then you and your organization decide to say, okay, I need these images to be signed by Docker. I need the provenance shard to match. I need to know that this is the workflow that these are supposed to be coming out, coming from.

    Docker CLI

    We are going to make that part of these Docker CLI operations. And we’re going to start shipping policies with the CLI. Of course, you can overwrite that. That will at least verify these DOI images, Docker official images, so that you can’t end up with something that isn’t authentically Docker. And that’s the same for when you type Docker run. And, of course, Docker pull, I mentioned that. There are other runtime integrations that we’re exploring as well, another piece of metadata that can as well be signed.

    But, I think it’s worth mentioning the whole notion of, if you’re the producer of software, and there is a vulnerability scanner coming in and telling your consumers, hey, your container is vulnerable to something or your image is vulnerable to something. You might want to say, you know, I’m actually not affected because X, I have mitigation in the container. This is not executed at runtime. Or you want to say, yes, I am affected. And that’s what you need to do. You need to upgrade to the next version of my image where I fix this.

    VEX

    And that’s where the Vulnerability Exploitation eXchange specification comes in. It’s one of the things that CISA is standardizing around, what they expect government providers to ship and so on. So it’s similar to what they’re doing with SBOMs. A new specification. I think it’s really important to know that Vex, when you hear that, doesn’t stand for vulnerability exclusions. It can also mean that a producer of a piece of software wants to tell you I am affected. So it’s really a way of communicating the status of a particular CVE, a particular vulnerability, for a particular product.

    And the other thing to keep in mind here is that you don’t have to trust that statement, right? Like it’s again, it can be signed, it should be signed, and it should ultimately be up to the consumer to say, well, I trust Docker that they have VEX statements for Docker official images saying that this is not exploitable. I trust, I don’t know, Microsoft, that this is not exploitable for the windows containers, and so on. But I do not trust random guy on Docker Hub, saying that that is not exploitable. Right? So there is a level of trust here, again, with signatures. These things being signed; we can start enforcing that.

    And that is a very basic VEX document here. You can see there’s a bit of metadata here, who is the author? And then we get down into the statements. There can be multiple statements in there. I do realize the green is not very easy to read. But hopefully I can go through this in a little bit more detail in a second. The CVE ID here, and then within a particular Docker image, I want to make a particular NPM package not vulnerable. I want to say not affected, and then when I say not affected, I have to specify justification. There’s no way around it, otherwise it wouldn’t be a valid VEX statement. And in this case, or I just say inline mitigation already exists. So this is already fixed in the container. All right. Let’s try that stuff and see if it works.

    Demo

    Let’s go into the demo. Let me just bring up the source code here real quick. This is this very simple application, a Node application. It uses Alpine. It uses a PIN digest to make it vulnerable, to kind of step back in time when I run a Docker build, to catch an image with vulnerabilities. So let’s do this quickly and run a Docker build here. I want to build this image with provenance and an SBOM, and I want to push this to Docker Hub. And you can see when you do this, there’s this buildkit scanner coming in. This is being kind of added into build kit so that you get an SBOM created.

    And another thing you can see down here is instead of just pushing a single manifest for a single arch image, I’m actually pushing an attestation manifest and a manifest list. So all these artifacts as described earlier. All right, let’s grab this tag here and make this a bit more visual. This is a very simple application — online service. If you want to introspect some metadata about Docker images, this is a great way of doing that.

    Looking at this, this is kind of what I described earlier. You have your OCI index now. This is the image that we just built. And down here we have the attestation image. I can now click in here and start drilling in, right? That’s why this thing is really useful. You see two attestations. We’ve built an SBOM, and we built provenance attestation.

    If we drill into the provenance here for a second, you see an awful lot of information. But, firstly, we see the subject. So we created a tag, cd, and it was for a particular platform. It was also for a particular digest. So this is what we are testing, too. And then here’s all the information that that buildkit has captured for us. You can see the base image being added with tag and digest, so there is no guessing anymore. Am I using the correct version of a base image? Is this outdated? It’s all captured right there. You get like the source maps. So you get all the individual build steps. They’re input and environments. In the early versions of buildkit, you would be seeing secrets here, too. But they fixed that in the meantime.

    And if we go all the way, these are all the build steps. And then we see labels. We see the materials going in. So we see which buildkit scanner was being used, which version of buildkit. You start seeing how all the things that go into your build are being documented. You can see how the build was being involved, and then which layers were produced all the way down.

    This area is really interesting. Have you ever tried to reconstruct a Docker file from an image that you found somewhere? I’ve been doing this in the early days an awful lot. This is the Docker file that was being used and coded as base64 string data here. And this is done because there are like source maps for everything. So each layer of an image can be associated to particular instructions in your Docker file. And that’s all encapsulated into the provenance attestation. Down here we have kind of the Git metadata. What was being built? I did the build out of a Git repository. And that is the Git remote URL and the commit SHA.

    Let’s go back a step and take a quick look at the SBOM. Again subject and predicate. In this case, it’s an SPDX document. SPDX documents are structured with three top-level elements. It’s files, packages, and relationships between these two things. So first, we see all the files that are being identified. If I scroll for a very long time, you’ll see packages. Here we see the references like which package references what file. Let me just find the package for you real quick. A lot of data. There we go.

    This is how packages are being represented. They have a name, some metadata, some external references. We have a “purl” here, which is really important because that’s what we use in Docker and to match it against CVEs. Again, more metadata. And if you then establish the relationship to files, you can already immediately know where this package was added into your container. Let me do this real quick by typing “docker scout”. I want to bring in this image real quick. And the thing that you can see here now is that we actually didn’t have to pull down the image. We got all the information from these two attestations, and now we can tell you exactly what the base image was.

    Again, you saw that being captured in the provenance. We can give you exact locations of where this image is coming from — which git repository. There’s even a link here that will hopefully take you to GitHub. And, very importantly, there are these lines here. So this “libSSL” — part of the OpenSSL package — in our particular case is coming out of the base image. So it’s coming out of layer zero, and this is exactly what this piece tells us here. Again, using the provenance attestation, we were able to match the information that a particular package by a file relationship was added by a particular layer in your Docker file. And you can actually read the text here and, of course, clicking this would take you straight to the line in GitHub. We have other examples here like this, including line breaks and everything. This is exactly how this looks in your Docker file.

    Next, I created a simple VEX statement, and I’m going to show you what happens when you start applying this. There are various ways you can attach this to the image. You can store this in Scout. In this particular case, I just want to say VEX, location, and pass the directory or file path to. And immediately see we effectively vexed out a vulnerability on Express.

    And, in this case, I trust this. There are various other CLI options where you can say, I don’t trust Christian. I only trust this guy. So it’s really up to you as a consumer to say, okay, I am going to trust this. But, it is a great way to present additional information. If you attended the keynote earlier, we were using the same technology for conveying what packages are being used at runtime when you deploy images into production — using like Sysdig or potentially other integrations in the future. So these are the packages loaded that you should look at. Right.

    There is a sample repository on the OpenPubkey org, and I want to quickly show you what’s changing. If you look at one of those attestation images, here it looks exactly the same. What’s going to change is when you start drilling into… Let’s click into the provenance, and here instead of seeing an in-toto statement with a subject and the predicate, you get the payload. This is again the statement, and you get the signature down here. And then it’s just a simple matter of running a new plugin that we just released into the OpenPubkey organization that you can start using today and explore the possibilities.

    You can see we’re expecting two attestations. Both of those attestations are verified for certain criteria. For example, are we referring to the correct tags, so no one has tampered with it? And the correct tag, is the token signed by the correct authority? Is it kind of the correct GitHub Action run, and does the repository owner actually match? In this case, it’s OpenPubkey, not Docker. And the same down here for the provenance attestation, with the added assertion around the git commit.

    Just want to show you one last thing. There’s a demo repository here, so if you’re building images on GitHub. Here’s a quick demo on how you can set up your build to start signing these images using that technology. If you’re familiar with the “docker build and push” action, this actually has not changed at all. There’s no configuration needed for this particular piece. The only manual thing right now for this very moment are these three drivers opposite you need to add. Although we’re going to be changing that in the future by getting this into build acts so really starting to embed this in the tool. You need to effectively use a different buildkit image and just pass in some environment variables that GitHub already provides. Using that you’re able to then start signing your images and start exploring the new signing technology.

    Q&A

    We have about 10 minutes to go. Do you have any questions for me today? Okay.

    Just a quick scenario to verify that I understand. So, you have a website that’s built on Django and have a container locally, and you do a build of that container or you do a build of that image and you push it up to Artifactory. How do you tell it that you want that to be pushed up within SBOM so that a co-worker will add down from Artifactory and know that everything’s good?

    Right. This command works exactly the same if you were to push this onto Artifactory. That does not change at all. If you replace the tag down here with your reference to Artifactory, you would effectively be starting to build images with provenance and SBOMs and send them off to Artifactory. Yes. That’s literally all you need to do. Thank you.

    Any more questions? When you add up the signing in, you ended up with this base64-encoded provenance, for example. How do you recover the provenance back from that? Because it didn’t appear to be long enough to encompass the whole provenance. Right. It is actually the full payload. There is no magic going on. Let’s go in and see. It is actually pretty long.

    Okay. I think that’s it. Thank you. Have a wonderful day.

    Learn more

    This article contains the YouTube transcript of a presentation from DockerCon 2023. “Demystify Secure Supply Chain Metadata” was presented by Christian Dupuis, Sr. Principal Engineer, Docker.

    Find a subscription that’s right for you

    Contact an expert today to find the perfect balance of collaboration, security, and support with a Docker subscription.