Docker image history modification - why you can't trust `docker history`

Docker image history modification - why you can't trust `docker history`

I recently came across Staying safe with .NET containers on Hacker News. It’s an excellent post that goes into how the .NET Docker image publishing team thinks about:

  • The pedigree and provenance of the .NET Docker images they publish
  • The process by which they build and publish the images
  • Vulnerabilities (read: CVEs) in the base images they build from, and in the images they publish

With the devil often being in the details, I want to challenge a particular assumption in the article regarding Docker image composition. The article suggests that the “history” that is recorded while building a Docker image is a trustworthy record, which in this post I will demonstrate it is not – it is simple metadata that can be tampered with.

I reached out to Richard, the author of the Microsoft article, before putting this post together. He encouraged me to dig deeper into my thoughts and feelings and to share my reasoning for the benefit of all.

I’m certainly not having a dig at Microsoft or Richard. The article is a fantastic and thoughtful read, and if you haven’t read it I highly encourage you to do so.

Update 2021-02-16: Richard has updated the article. A section titled “Critical note” now acknowledges the fact that Docker image history is not necessarily trustworthy and that the guidance regarding container-diff shouldn’t be used to draw security-critical conclusions. Thanks Richard, it was a pleasure chatting to you about this one!

tl;dr

You should not trust the “history” of a potentially untrustworthy Docker image. The history is not a trustworthy record that is guaranteed to reflect the contents of the image. The history is metadata and it can be edited. If you are using the image history as a proxy for its authenticity, you are vulnerable to a certain type of supply chain attack.

This is not a vulnerability in Docker, it’s simply a misunderstanding and a misuse of one of its features.

If you need to be able to assert the authenticity of Docker images you are building or consuming, you may be able to use Docker Content Trust (i.e. image signing) to achieve your goals.

The article

My reading of the section titled “Dockerfiles” gave me the following impression:

  • The team puts a lot of thought and care into how they craft their Dockerfiles
  • The team publishes their Dockerfiles. For example, here is the Dockerfile for amd64 .NET 5.0 built atop Alpine 3.13 (In case you missed it, “.NET” is the new name for the old “.NET Core”)
  • The team builds the published .NET images from the very same set of published Dockerfiles
  • Thus, customers can safely build the images themselves using the published Dockerfiles, or they can pull the published images from Microsoft’s registry as a convenience mechanism, and can rest assured that they should get the same result either way
  • Furthermore, customers can use Google’s container-diff tool with the --type=history flag to compare a locally-built image with an image pulled from Microsoft’s registry to verify that the image on Microsoft’s registry was built using the same Dockerfile. The article notes that the image digests will not match. This is due to the fact that Docker images currently cannot be built deterministically.

It’s this last point that jumped out at me. In this post, I’ll dive into the impression that one might get from reading the article, and why container-diff --type=history is not a suitable technique for validating that a Docker image was built from a particular Dockerfile.

An aside: Build determinism and “Reproducible Builds”

Before we get too far, a short discussion of build determinism and the idea of “Reproducible Builds” may be useful.

“Building” or “compiling” something is the process of taking a set of artefacts (Sometimes source code) and transforming them into a distributable, executable, usable artefact. In the context of Docker images, this is done using docker build which transforms a Dockerfile into a Docker image. By my own understanding, the docker build command does something similar to the following (Assuming a simple Dockerfile with only one FROM command):

  1. Start with the base image (e.g. Debian, Ubuntu, Alpine) referenced by the FROM command in the Dockerfile
  2. Run through the Dockerfile top-to-bottom, executing each command and capturing the “delta” effect that each command has on the image being built
    • In this context, “command” means a command specified by the Dockerfile Reference. For example – ADD adds a file from the local filesystem or a URL to the image being built, while RUN runs a shell command within the image being built.
  3. Arrange the pile of “delta” images into a stack and decorate the pile with some metadata

The resulting stack of delta images is the output artefact (the “image”) and can be executed on the local machine using docker run or can be pushed to a Docker image registry (docker push) to be pulled by others (docker pull).

A “deterministic” build would mean that if two people were to perform the build process at two different times on two different hosts, the output would be byte-for-byte identical. Deterministic builds are very attractive, as they enable Reproducible Builds.

Purely from my own experience and reasoning, I think it’s not easy or even practical to build Docker images in a reproducible way. For example:

  • If the Dockerfile specifies that packages should be updated or installed from an OS repository (e.g. via apt for Debian/Ubuntu or via apk for Alpine) then the build will vary if the OS publishes updated packages between two different docker build executions
  • If the installation of a package causes varying install-time data to be written (e.g. an installation ID or an SSH host key) then the build will vary each time docker build is run
  • If docker build is done at two different points in time, then any command that causes a state change to the filesystem will result in different filesystem timestamp data being captured in the layer diffs which will cascade through to what would otherwise be a great point of comparison – the layer hashes

Note that these are just a few reasons why I think that Docker images are difficult or perhaps even impossible to build deterministically. There may well be ways to mitigate some or all of these, and there are likely other ways that non-determinism can “leak” into the building of a Docker image. The Microsoft post notes that people are working on deterministic Docker image building, so it may one day become practical to deterministically build Docker images.

With that out of the way, the reason why deterministic builds are attractive is that they enable Reproducible Builds. Once something becomes deterministically buildable, if the builder commits to taking any care needed to ensure that their buildable thing is built deterministically, we gain something called Build Reproducibility. This allows anyone in the world to do the following:

  1. Obtain the source artefacts for the buildable thing
  2. Obtain a copy of the published “built thing”
  3. Independently build the buildable thing
  4. Verify that the obtained “built thing” (From 2) is a byte-for-byte copy of the independently built “buildable thing” (From 3)

Of course, at this stage, the person doing the experiment could simply use their locally-built identical copy of the buildable thing anyway. As Debian notes, the simple fact that someone can do this experiment gives us some interesting security benefits:

  • An attacker who is in a position to compromise the supply chain by tampering with published “built things” is now disincentivized to do so since they know their tampering can be detected by anyone
  • If an attacker is still bold enough to attempt their tampering, if they are detected by someone doing the “build locally and compare” experiment, the compromise can be identified and remediated

If you’d like to generally know more about build determinism and Reproducible Builds, the types of challenges that need to be solved to make builds deterministic, and the security and non-security benefits of Reproducible Builds, check out the Reproducible Builds project as well as Debian’s ReproducibleBuilds project.

Finally, I know enough about Docker and the building of Docker images to be dangerous, but I’m certainly not a Docker expert. Without question there are better resources out there regarding Docker image build determinism, and if you’re interested I encourage you to seek them out. For now, my “thoughts and feelings” should be enough to explain why I think the Microsoft post said something that it did, and why its advice on the matter is subtly misleading.

Using Docker image history as a proxy for build authenticity

Putting our attention back on Staying safe with .NET containers, the author explains that users of the published .NET Docker images are encouraged to consider whether an image obtained from the Microsoft registry is genuine.

Imagine you pull and inspect a .NET image (from our registry) and then rebuild the same image from the Dockerfile we’ve shared as its apparent source, on your own machine. You get the same result (I’ll define that shortly). That’s comforting. It means that using official .NET images is just a convenience as you can build them yourself. But what happens if you get a different result. That’s concerning, particularly if no explanation is provided. What are you to think? Your mind races. The difference could be the result of something nefarious or accidental. Only an investigation could help answer that question, and who has time for that?

I am confident that Microsoft takes the security of their .NET image build pipeline and Docker image registry extremely seriously, but it’s great to see them suggesting that users should still ask the question – how can I verify that a .NET image from the Microsoft registry is a genuine, safe artefact?

The author goes on to say:

The following workflow demonstrates how to compare a registry image with a locally built one (both from the same source Dockerfile) using the Google container-diff tool:

C:>curl https://storage.googleapis.com/container-diff/latest/container-diff-windows-amd64.exe -o container-diff.exe
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14.6M  100 14.6M    0     0  14.6M      0  0:00:01 --:--:--  0:00:01 20.3M

C:>git clone https://github.com/dotnet/dotnet-docker
Cloning into 'dotnet-docker'...

C:>cd dotnet-docker\src\sdk\5.0\alpine3.12\amd64

C:\dotnet-docker\src\sdk\5.0\alpine3.12\amd64>git pull
Already up to date.

C:\dotnet-docker\src\sdk\5.0\alpine3.12\amd64>docker pull mcr.microsoft.com/dotnet/sdk:5.0-alpine
5.0-alpine: Pulling from dotnet/sdk
Digest: sha256:fb1a43b50c7047e5f28e309268a8f5425abc9cb852124f6828dcb0e4f859a4a1
Status: Image is up to date for mcr.microsoft.com/dotnet/sdk:5.0-alpine
mcr.microsoft.com/dotnet/sdk:5.0-alpine

C:\dotnet-docker\src\sdk\5.0\alpine3.12\amd64>docker build --pull -t dotnet-sdk:5.0-alpine .
Sending build context to Docker daemon  4.096kB

<snip/>

C:\dotnet-docker\src\sdk\5.0\alpine3.12\amd64>container-diff.exe diff mcr.microsoft.com/dotnet/sdk:5.0-alpine daemon://dotnet-sdk:5.0-alpine --type=history

<snip/>

-----History-----

Docker history lines found only in mcr.microsoft.com/dotnet/sdk:5.0-alpine: None

Docker history lines found only in dotnet-sdk:5.0-alpine: None

This result tells you that the Dockerfiles used to build the two images are the same. That’s — as far as I’m aware — the most straightforward test to validate the fidelity of a registry image with its apparent source. The test passed. Comparing image digests won’t work; they will not match with normal practices.

It’s the first part of this statement that concerns me.

To put it in other words, the author is suggesting:

  • The user should obtain a published .NET Docker image
  • The user should obtain the Dockerfile from which the .NET Docker image publishing team builds the published Docker images
  • The user should build their own copy of the .NET Docker image
  • The user should use container-diff tool with the --type=history option to “compare the images”
  • If the container-diff tool reports that no Docker history lines differ between the published image and the locally-built image, the user should rest assured that the images were built with the same Dockerfile

Since Docker image builds are not deterministic, the author is suggesting that the Docker image history be used as a proxy for the autenticity of an otherwise untrustworthy image.

We’ll get to why I don’t think this thinking is correct. But first, some more background.

Docker image history

As explained previously, the process of building a Docker image runs through a Dockerfile top-to-bottom and runs each command, capturing the deltas at each stage into a stack of diffs. The build process also decorates each diff in the stack with some metadata, including the command used to generate that layer.

For example, given the following trivial Dockerfile:

justin@marlene:~/foobar$ cat Dockerfile
FROM debian:stable

RUN touch /foo
RUN touch /bar
RUN id > /foobar

We can build the image:

justin@marlene:~/foobar$ sudo docker build -t justinsteven/foobar .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM debian:stable
 ---> ee4553a2f015
Step 2/4 : RUN touch /foo
 ---> Running in a0a24ad31064
Removing intermediate container a0a24ad31064
 ---> 5d54b6987ebd
Step 3/4 : RUN touch /bar
 ---> Running in 8cb0f9364abe
Removing intermediate container 8cb0f9364abe
 ---> 91d51ef89c7e
Step 4/4 : RUN id > /foobar
 ---> Running in 62636d3601e1
Removing intermediate container 62636d3601e1
 ---> 90c38bb0a4af
Successfully built 90c38bb0a4af
Successfully tagged justinsteven/foobar:latest

And inspect its history:

justin@marlene:~/foobar$ sudo docker history justinsteven/foobar
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
90c38bb0a4af   51 seconds ago   /bin/sh -c id > /foobar                         39B
91d51ef89c7e   53 seconds ago   /bin/sh -c touch /bar                           0B
5d54b6987ebd   54 seconds ago   /bin/sh -c touch /foo                           0B
ee4553a2f015   5 days ago       /bin/sh -c #(nop)  CMD ["bash"]                 0B
<missing>      5 days ago       /bin/sh -c #(nop) ADD file:553358c6a785658d6…   114MB

We can see in the output of docker history that the command executed at each stage of the docker build process is recorded against each layer in the stack.

container-diff –type=history

The container-diff tool isn’t terribly descriptive about how the --type=history differ works. The README.md says:

The history analyzer outputs a list of strings representing descriptions of how an image layer was created.

The code says:

func getHistoryList(historyItems []v1.History) []string {
	strhistory := make([]string, len(historyItems))
	for i, layer := range historyItems {
		strhistory[i] = strings.TrimSpace(layer.CreatedBy)
	}
	return strhistory
}

(Note that getHistoryList() seems be digging out some field called layer.CreatedBy. Note also that the output of docker history justinsteven/foobar above has a column named “CREATED BY” with the list of commands that created each layer)

Based on these descriptions, and based on my usage of the tool, container-diff with the flag --type=history seems to compare the command-by-command layer-by-layer history of Docker images.

Modifying Docker image histories

I will show that Docker image histories (which, once more for those in the back, are simply metadata) can be trivially modified. I’ll do so by creating a justinsteven/baddotnet-sdk:5.0-alpine Docker image from a modified Dockerfile, and will then edit the history so that it matches the history of the genuine mcr.microsoft.com/dotnet/sdk:5.0-alpine image. The image will appear at a glance to be built from the genuine Dockerfile and will even satisfy the comparison approach suggested by Microsoft – that is, the use of container-diff with the flag --type=history will report that there is no difference between the commands in each image’s history.

Environment

I will be using an Ubuntu 20.04 VM with Docker v20.10.3 installed as per https://docs.docker.com/engine/install/ubuntu/

justin@marlene:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

justin@marlene:~$ uname -a
Linux marlene 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

justin@marlene:~$ sudo docker version
Client: Docker Engine - Community
 Version:           20.10.3
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        48d30b5
 Built:             Fri Jan 29 14:33:21 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.3
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       46229ca
  Built:            Fri Jan 29 14:31:32 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Installing container-diff

I have installed the binary release of container-diff v0.15.0 as per https://github.com/GoogleContainerTools/container-diff#linux

justin@marlene:~$ curl -s -LO https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64 && chmod +x container-diff-linux-amd64 && sudo mv container-diff-linux-amd64 /usr/local/bin/container-diff

justin@marlene:~$ container-diff version
v0.15.0 built from git 2db6995

Note that I wouldn’t normally yolo-curl a binary to /usr/local/bin/ but this is a disposable VM :^)

Building a modified .NET image as justinsteven/baddotnet-sdk:5.0-alpine

Obtain the source Dockerfile:

justin@marlene:~$ git clone https://github.com/dotnet/dotnet-docker
Cloning into 'dotnet-docker'...
remote: Enumerating objects: 32734, done.
remote: Total 32734 (delta 0), reused 0 (delta 0), pack-reused 32734
Receiving objects: 100% (32734/32734), 6.33 MiB | 3.01 MiB/s, done.
Resolving deltas: 100% (15143/15143), done.

justin@marlene:~$ cd dotnet-docker/src/sdk/5.0/alpine3.13/amd64/

justin@marlene:~/dotnet-docker/src/sdk/5.0/alpine3.13/amd64$ git pull
Already up to date.

Modify the last line of the Dockerfile and “poison” the dotnet binary:

$ cp Dockerfile Dockerfile.orig

$ vim Dockerfile
[... SNIP ...]

$ diff Dockerfile Dockerfile.orig
46c46
<     && apk add --no-cache ncurses-terminfo-base && echo -e "#!/bin/sh\necho $'This isn\'t the dotnet you\'re looking for'" > /usr/bin/dotnet
---
>     && apk add --no-cache ncurses-terminfo-base

(This Dockerfile, when built, will generate a Docker image with a /usr/bin/dotnet executable file that will simply echo a fun message.)

Build the image as justinsteven/baddotnet-sdk:5.0-alpine:

$ sudo docker build --pull -t justinsteven/baddotnet-sdk:5.0-alpine .
Sending build context to Docker daemon   7.68kB
Step 1/6 : ARG REPO=mcr.microsoft.com/dotnet/aspnet
Step 2/6 : FROM $REPO:5.0-alpine3.13-amd64

[... SNIP ...]

Successfully built 1d26e9ddc4ab
Successfully tagged justinsteven/baddotnet-sdk:5.0-alpine

Run the image and execute the dotnet binary:

$ sudo docker run --rm justinsteven/baddotnet-sdk:5.0-alpine /usr/bin/dotnet
This isn't the dotnet you're looking for

Edit the Docker image history of justinsteven/baddotnet-sdk:5.0-alpine

To edit the Docker image history I will:

  1. Save the Docker image to a .tar file
  2. Unpack the .tar file
  3. Edit the metadata that specifies the image history
    • In particular, I will remove the part of the command that I tacked on to the last line of the Dockerfile
  4. Repack the .tar file
  5. Load the .tar file over the top of the existing Docker image in our local cache

Save the image:

$ sudo -g docker docker save justinsteven/baddotnet-sdk:5.0-alpine -o baddotnet.tar

Extract the saved image:

$ mkdir extracted

$ tar -C extracted -xf baddotnet.tar

$ ls -la extracted/
total 52
drwxrwxr-x 9 justin justin 4096 Feb 14 15:19 .
drwxrwxr-x 3 justin justin 4096 Feb 14 15:19 ..
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 110bffb0ec0b55916aa6b43a0b8c81b0a890c5d42dcb0124571999bc19a2e5d7
-rw-r--r-- 1 justin justin 7960 Feb 14 15:13 1d26e9ddc4abacf3f968001a77a8abb2246f996c507ee8835e4ecb70f9a8d596.json
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 3908b1d9d15773255e32f27acc0a9567dbece4ead3ef31cafe4f93febb3c4e5d
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 55ff4378140fdc511af9c4e87735bf61a90c2cd46592402854103a801a18aa5b
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 72eca9dbd06c4f30908c02a3edb6fa4f8dcbf405897ca624ee7becbe05ac0421
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 7349d60c79b89ac8c46ba339f4e7bb1c864eac9886b1b8a016cbe8601bf1e346
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 7cf72a7ce4db929d904e939a14614213a5da8e760a0b557abe3a12fe75d2436f
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 a9a9a8a5606b844d86ced594a39bfe25b135c04984ec136d353856fc7f2ded43
-rw-r--r-- 1 justin justin  688 Jan  1  1970 manifest.json
-rw-r--r-- 1 justin justin  113 Jan  1  1970 repositories

Modify the <sha256>.json file in the root of the extracted image to rewrite the history of the image:

Repack the .tar:

$ tar -C extracted -cf baddotnet_modified.tar .

Load the .tar. Doing so will modify the justinsteven/baddotnet-sdk:5.0-alpine image in our local cache.

$ sudo docker load -i baddotnet_modified.tar
The image justinsteven/baddotnet-sdk:5.0-alpine already exists, renaming the old one with ID sha256:1d26e9ddc4abacf3f968001a77a8abb2246f996c507ee8835e4ecb70f9a8d596 to empty string
Loaded image: justinsteven/baddotnet-sdk:5.0-alpine

Inspecting the history and running container-diff

We can now use docker history to inspect the history metadata of the local justinsteven/baddotnet-sdk:5.0-alpine image. Note that viewing the history of images with long commands requires the --no-trunc flag to show all of each line, and be warned that the output is a little unwieldy.

The column of Image IDs is now littered with the value <missing> but in my experience this is par for the course for a Docker image obtained from a remote registry or imported from a .tar file.

We’re now in a position to use container-diff with the --type=history flag to compare our justinsteven/baddotnet-sdk:5.0-alpine image with Microsoft’s mcr.microsoft.com/dotnet/sdk:5.0-alpine

$ sudo -g docker container-diff diff mcr.microsoft.com/dotnet/sdk:5.0-alpine daemon://justinsteven/baddotnet-sdk:5.0-alpine --type=history

-----History-----

Docker history lines found only in mcr.microsoft.com/dotnet/sdk:5.0-alpine: None

Docker history lines found only in justinsteven/baddotnet-sdk:5.0-alpine: None

container-diff tells us that no Docker history lines differ between the two images, and yet the justinsteven/baddotnet-sdk:5.0-alpine image is clearly a bit wonky:

$ sudo docker run justinsteven/baddotnet-sdk:5.0-alpine dotnet
This isn't the dotnet you're looking for

We can even push the image to Dockerhub, remove it from our local cache, pull it down and repeat the exercise.

$ sudo -g docker docker push justinsteven/baddotnet-sdk:5.0-alpine
The push refers to repository [docker.io/justinsteven/baddotnet-sdk]
90a9b1d4e00e: Pushed
6aec14639694: Pushed
1a921a44e125: Pushed
82484281586d: Pushed
9b8d02d7ddab: Pushed
30025ca5e025: Pushed
1119ff37d4a9: Pushed
5.0-alpine: digest: sha256:60e4ccbd03dc2fce6e5e13be82cec8c079955acb28fd0b6f9e0d250a281ef4b3 size: 1798

$ sudo docker rmi justinsteven/baddotnet-sdk:5.0-alpine
Untagged: justinsteven/baddotnet-sdk:5.0-alpine
Untagged: justinsteven/baddotnet-sdk@sha256:60e4ccbd03dc2fce6e5e13be82cec8c079955acb28fd0b6f9e0d250a281ef4b3
Deleted: sha256:c211f1df2ca796be658d2e1918a3b772ec14ef143fc8117dfc55bf7d9f3eaa30

$ sudo docker system prune -a
WARNING! This will remove:
  - all stopped containers
  - all networks not used by at least one container
  - all images without at least one container associated to them
  - all build cache

Are you sure you want to continue? [y/N] y

[... SNIP ...]

Total reclaimed space: 1.191GB

$ sudo docker pull justinsteven/baddotnet-sdk:5.0-alpine
5.0-alpine: Pulling from justinsteven/baddotnet-sdk
4c0d98bf9879: Pull complete
3c97837ed36a: Pull complete
fd1ceae44045: Pull complete
804ffefbc086: Pull complete
c057bb26faa8: Pull complete
b142bbac8154: Pull complete
0d78a2f3501a: Pull complete
Digest: sha256:60e4ccbd03dc2fce6e5e13be82cec8c079955acb28fd0b6f9e0d250a281ef4b3
Status: Downloaded newer image for justinsteven/baddotnet-sdk:5.0-alpine
docker.io/justinsteven/baddotnet-sdk:5.0-alpine

$ sudo docker run --rm justinsteven/baddotnet-sdk:5.0-alpine dotnet
This isn't the dotnet you're looking for

$ sudo -g docker container-diff diff mcr.microsoft.com/dotnet/sdk:5.0-alpine justinsteven/baddotnet-sdk:5.0-alpine --type=history

-----History-----

Docker history lines found only in mcr.microsoft.com/dotnet/sdk:5.0-alpine: None

Docker history lines found only in justinsteven/baddotnet-sdk:5.0-alpine: None

We can successfully pull justinsteven/baddotnet-sdk:5.0-alpine and execute the dotnet command within it to get our funny little message. container-diff still sees the history as being identical to the official mcr.microsoft.com/dotnet/sdk:5.0-alpine image.

This image is available on Dockerhub at the time of writing. I may remove it in the future, but if you get in quick you can pull it down and have a play.

So what?

The ability to modify Docker image histories means that histories should not be trusted, and should not be used as a proxy to determine the authenticity of a Docker image.

For this technique to be useful to an attacker, in the specific case of Microsoft’s .NET images, the attacker would need to:

  1. Be a malicious insider within Microsoft, or an external attacker, who can compromise the .NET image build pipeline or the Microsoft Docker image registry
  2. Publish malicious .NET images or replace published .NET images with malicious contents, using this technique to rewrite the image’s history and hide the tampering.
  3. Evade any behind-the-scenes spot checks that Microsoft are doing on their published .NET images, and any checks that the wider community is doing on the published .NET images
  4. Trust that Microsoft’s users are only performing naive checking of the Docker image history to authenticate the image

To pull this off would be no small feat, and I’m not suggesting that Microsoft’s .NET images have been tampered with. All I am suggesting is that, if they had, the guidance given in Staying safe with .NET containers is insufficient to detect conscientious tampering.

Image signing via Docker Content Trust (DCT)

Only just now did I wonder – “Is there a content signing mechanism for Docker images?” It turns out there is something called Docker Content Trust (DCT) which claims to allow image builders to sign images with a private key, so that consumers can verify the image authenticity. I haven’t heard of DCT before today so I don’t know how usable it is or how widely it’s used, but it may be useful for some builders or consumers of Docker images.

Conclusion

My advice to Microsoft is – please reconsider recommending that people use container-diff to verify the authenticity of a Docker image using its history as a proxy. The image history is metadata only, and can be modified by an image author or by anyone who can get their hands on any Docker image.

Update 2021-02-16: The article has been updated. A section titled “Critical note” now acknowledges the fact that Docker image history is not necessarily trustworthy and that the guidance regarding container-diff shouldn’t be used to draw security-critical conclusions.

My advice to users of Docker images is – don’t trust the image history of a potentially untrustworthy image. It can be modified.

I don’t have any advice for Docker or for Google (the authors of container-diff). However, it might be a good idea to remind users at the appropriate time that Docker image histories are not immutable records of the building of an image – they can be modified, and hence should not be trusted.

Reproducible Builds allow anyone to independently verify that a given built artefact is identical to a locally built copy. They discourage and make visible some types of supply chain attacks. Reproducible Builds require build determinism, which Docker images lack, but some people are reportedly working on it.

Read more