Docker image history modification - why you can't trust `docker history`
I recently came across Staying safe with .NET containers on Hacker News. It’s an excellent post that goes into how the .NET Docker image publishing team thinks about:
- The pedigree and provenance of the .NET Docker images they publish
- The process by which they build and publish the images
- Vulnerabilities (read: CVEs) in the base images they build from, and in the images they publish
With the devil often being in the details, I want to challenge a particular assumption in the article regarding Docker image composition. The article suggests that the “history” that is recorded while building a Docker image is a trustworthy record, which in this post I will demonstrate it is not – it is simple metadata that can be tampered with.
I reached out to Richard, the author of the Microsoft article, before putting this post together. He encouraged me to dig deeper into my thoughts and feelings and to share my reasoning for the benefit of all.
I’m certainly not having a dig at Microsoft or Richard. The article is a fantastic and thoughtful read, and if you haven’t read it I highly encourage you to do so.
Update 2021-02-16: Richard has updated the article. A section titled “Critical note” now acknowledges the fact that Docker image history is not necessarily trustworthy and that the guidance regarding container-diff
shouldn’t be used to draw security-critical conclusions. Thanks Richard, it was a pleasure chatting to you about this one!
tl;dr
You should not trust the “history” of a potentially untrustworthy Docker image. The history is not a trustworthy record that is guaranteed to reflect the contents of the image. The history is metadata and it can be edited. If you are using the image history as a proxy for its authenticity, you are vulnerable to a certain type of supply chain attack.
This is not a vulnerability in Docker, it’s simply a misunderstanding and a misuse of one of its features.
If you need to be able to assert the authenticity of Docker images you are building or consuming, you may be able to use Docker Content Trust (i.e. image signing) to achieve your goals.
The article
My reading of the section titled “Dockerfiles” gave me the following impression:
- The team puts a lot of thought and care into how they craft their Dockerfiles
- The team publishes their Dockerfiles. For example, here is the Dockerfile for amd64 .NET 5.0 built atop Alpine 3.13 (In case you missed it, “.NET” is the new name for the old “.NET Core”)
- The team builds the published .NET images from the very same set of published Dockerfiles
- Thus, customers can safely build the images themselves using the published Dockerfiles, or they can pull the published images from Microsoft’s registry as a convenience mechanism, and can rest assured that they should get the same result either way
- Furthermore, customers can use Google’s container-diff tool with the
--type=history
flag to compare a locally-built image with an image pulled from Microsoft’s registry to verify that the image on Microsoft’s registry was built using the same Dockerfile. The article notes that the image digests will not match. This is due to the fact that Docker images currently cannot be built deterministically.
It’s this last point that jumped out at me. In this post, I’ll dive into the impression that one might get from reading the article, and why container-diff --type=history
is not a suitable technique for validating that a Docker image was built from a particular Dockerfile.
An aside: Build determinism and “Reproducible Builds”
Before we get too far, a short discussion of build determinism and the idea of “Reproducible Builds” may be useful.
“Building” or “compiling” something is the process of taking a set of artefacts (Sometimes source code) and transforming them into a distributable, executable, usable artefact. In the context of Docker images, this is done using docker build
which transforms a Dockerfile into a Docker image. By my own understanding, the docker build
command does something similar to the following (Assuming a simple Dockerfile with only one FROM
command):
- Start with the base image (e.g. Debian, Ubuntu, Alpine) referenced by the
FROM
command in the Dockerfile - Run through the Dockerfile top-to-bottom, executing each command and capturing the “delta” effect that each command has on the image being built
- In this context, “command” means a command specified by the Dockerfile Reference. For example –
ADD
adds a file from the local filesystem or a URL to the image being built, whileRUN
runs a shell command within the image being built.
- In this context, “command” means a command specified by the Dockerfile Reference. For example –
- Arrange the pile of “delta” images into a stack and decorate the pile with some metadata
The resulting stack of delta images is the output artefact (the “image”) and can be executed on the local machine using docker run
or can be pushed to a Docker image registry (docker push
) to be pulled by others (docker pull
).
A “deterministic” build would mean that if two people were to perform the build process at two different times on two different hosts, the output would be byte-for-byte identical. Deterministic builds are very attractive, as they enable Reproducible Builds.
Purely from my own experience and reasoning, I think it’s not easy or even practical to build Docker images in a reproducible way. For example:
- If the Dockerfile specifies that packages should be updated or installed from an OS repository (e.g. via
apt
for Debian/Ubuntu or viaapk
for Alpine) then the build will vary if the OS publishes updated packages between two differentdocker build
executions - If the installation of a package causes varying install-time data to be written (e.g. an installation ID or an SSH host key) then the build will vary each time
docker build
is run - If
docker build
is done at two different points in time, then any command that causes a state change to the filesystem will result in different filesystem timestamp data being captured in the layer diffs which will cascade through to what would otherwise be a great point of comparison – the layer hashes
Note that these are just a few reasons why I think that Docker images are difficult or perhaps even impossible to build deterministically. There may well be ways to mitigate some or all of these, and there are likely other ways that non-determinism can “leak” into the building of a Docker image. The Microsoft post notes that people are working on deterministic Docker image building, so it may one day become practical to deterministically build Docker images.
With that out of the way, the reason why deterministic builds are attractive is that they enable Reproducible Builds. Once something becomes deterministically buildable, if the builder commits to taking any care needed to ensure that their buildable thing is built deterministically, we gain something called Build Reproducibility. This allows anyone in the world to do the following:
- Obtain the source artefacts for the buildable thing
- Obtain a copy of the published “built thing”
- Independently build the buildable thing
- Verify that the obtained “built thing” (From 2) is a byte-for-byte copy of the independently built “buildable thing” (From 3)
Of course, at this stage, the person doing the experiment could simply use their locally-built identical copy of the buildable thing anyway. As Debian notes, the simple fact that someone can do this experiment gives us some interesting security benefits:
- An attacker who is in a position to compromise the supply chain by tampering with published “built things” is now disincentivized to do so since they know their tampering can be detected by anyone
- If an attacker is still bold enough to attempt their tampering, if they are detected by someone doing the “build locally and compare” experiment, the compromise can be identified and remediated
If you’d like to generally know more about build determinism and Reproducible Builds, the types of challenges that need to be solved to make builds deterministic, and the security and non-security benefits of Reproducible Builds, check out the Reproducible Builds project as well as Debian’s ReproducibleBuilds project.
Finally, I know enough about Docker and the building of Docker images to be dangerous, but I’m certainly not a Docker expert. Without question there are better resources out there regarding Docker image build determinism, and if you’re interested I encourage you to seek them out. For now, my “thoughts and feelings” should be enough to explain why I think the Microsoft post said something that it did, and why its advice on the matter is subtly misleading.
Using Docker image history as a proxy for build authenticity
Putting our attention back on Staying safe with .NET containers, the author explains that users of the published .NET Docker images are encouraged to consider whether an image obtained from the Microsoft registry is genuine.
Imagine you pull and inspect a .NET image (from our registry) and then rebuild the same image from the Dockerfile we’ve shared as its apparent source, on your own machine. You get the same result (I’ll define that shortly). That’s comforting. It means that using official .NET images is just a convenience as you can build them yourself. But what happens if you get a different result. That’s concerning, particularly if no explanation is provided. What are you to think? Your mind races. The difference could be the result of something nefarious or accidental. Only an investigation could help answer that question, and who has time for that?
I am confident that Microsoft takes the security of their .NET image build pipeline and Docker image registry extremely seriously, but it’s great to see them suggesting that users should still ask the question – how can I verify that a .NET image from the Microsoft registry is a genuine, safe artefact?
The author goes on to say:
The following workflow demonstrates how to compare a registry image with a locally built one (both from the same source Dockerfile) using the Google container-diff tool:
C:>curl https://storage.googleapis.com/container-diff/latest/container-diff-windows-amd64.exe -o container-diff.exe
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14.6M 100 14.6M 0 0 14.6M 0 0:00:01 --:--:-- 0:00:01 20.3M
C:>git clone https://github.com/dotnet/dotnet-docker
Cloning into 'dotnet-docker'...
C:>cd dotnet-docker\src\sdk\5.0\alpine3.12\amd64
C:\dotnet-docker\src\sdk\5.0\alpine3.12\amd64>git pull
Already up to date.
C:\dotnet-docker\src\sdk\5.0\alpine3.12\amd64>docker pull mcr.microsoft.com/dotnet/sdk:5.0-alpine
5.0-alpine: Pulling from dotnet/sdk
Digest: sha256:fb1a43b50c7047e5f28e309268a8f5425abc9cb852124f6828dcb0e4f859a4a1
Status: Image is up to date for mcr.microsoft.com/dotnet/sdk:5.0-alpine
mcr.microsoft.com/dotnet/sdk:5.0-alpine
C:\dotnet-docker\src\sdk\5.0\alpine3.12\amd64>docker build --pull -t dotnet-sdk:5.0-alpine .
Sending build context to Docker daemon 4.096kB
<snip/>
C:\dotnet-docker\src\sdk\5.0\alpine3.12\amd64>container-diff.exe diff mcr.microsoft.com/dotnet/sdk:5.0-alpine daemon://dotnet-sdk:5.0-alpine --type=history
<snip/>
-----History-----
Docker history lines found only in mcr.microsoft.com/dotnet/sdk:5.0-alpine: None
Docker history lines found only in dotnet-sdk:5.0-alpine: None
This result tells you that the Dockerfiles used to build the two images are the same. That’s — as far as I’m aware — the most straightforward test to validate the fidelity of a registry image with its apparent source. The test passed. Comparing image digests won’t work; they will not match with normal practices.
It’s the first part of this statement that concerns me.
To put it in other words, the author is suggesting:
- The user should obtain a published .NET Docker image
- The user should obtain the Dockerfile from which the .NET Docker image publishing team builds the published Docker images
- The user should build their own copy of the .NET Docker image
- The user should use
container-diff
tool with the--type=history
option to “compare the images” - If the
container-diff
tool reports that no Docker history lines differ between the published image and the locally-built image, the user should rest assured that the images were built with the same Dockerfile
Since Docker image builds are not deterministic, the author is suggesting that the Docker image history be used as a proxy for the autenticity of an otherwise untrustworthy image.
We’ll get to why I don’t think this thinking is correct. But first, some more background.
Docker image history
As explained previously, the process of building a Docker image runs through a Dockerfile top-to-bottom and runs each command, capturing the deltas at each stage into a stack of diffs. The build process also decorates each diff in the stack with some metadata, including the command used to generate that layer.
For example, given the following trivial Dockerfile:
justin@marlene:~/foobar$ cat Dockerfile
FROM debian:stable
RUN touch /foo
RUN touch /bar
RUN id > /foobar
We can build the image:
justin@marlene:~/foobar$ sudo docker build -t justinsteven/foobar .
Sending build context to Docker daemon 2.048kB
Step 1/4 : FROM debian:stable
---> ee4553a2f015
Step 2/4 : RUN touch /foo
---> Running in a0a24ad31064
Removing intermediate container a0a24ad31064
---> 5d54b6987ebd
Step 3/4 : RUN touch /bar
---> Running in 8cb0f9364abe
Removing intermediate container 8cb0f9364abe
---> 91d51ef89c7e
Step 4/4 : RUN id > /foobar
---> Running in 62636d3601e1
Removing intermediate container 62636d3601e1
---> 90c38bb0a4af
Successfully built 90c38bb0a4af
Successfully tagged justinsteven/foobar:latest
And inspect its history:
justin@marlene:~/foobar$ sudo docker history justinsteven/foobar
IMAGE CREATED CREATED BY SIZE COMMENT
90c38bb0a4af 51 seconds ago /bin/sh -c id > /foobar 39B
91d51ef89c7e 53 seconds ago /bin/sh -c touch /bar 0B
5d54b6987ebd 54 seconds ago /bin/sh -c touch /foo 0B
ee4553a2f015 5 days ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 5 days ago /bin/sh -c #(nop) ADD file:553358c6a785658d6… 114MB
We can see in the output of docker history
that the command executed at each stage of the docker build
process is recorded against each layer in the stack.
container-diff –type=history
The container-diff
tool isn’t terribly descriptive about how the --type=history
differ works. The README.md says:
The history analyzer outputs a list of strings representing descriptions of how an image layer was created.
The code says:
func getHistoryList(historyItems []v1.History) []string {
strhistory := make([]string, len(historyItems))
for i, layer := range historyItems {
strhistory[i] = strings.TrimSpace(layer.CreatedBy)
}
return strhistory
}
(Note that getHistoryList()
seems be digging out some field called layer.CreatedBy
. Note also that the output of docker history justinsteven/foobar
above has a column named “CREATED BY” with the list of commands that created each layer)
Based on these descriptions, and based on my usage of the tool, container-diff
with the flag --type=history
seems to compare the command-by-command layer-by-layer history of Docker images.
Modifying Docker image histories
I will show that Docker image histories (which, once more for those in the back, are simply metadata) can be trivially modified. I’ll do so by creating a justinsteven/baddotnet-sdk:5.0-alpine
Docker image from a modified Dockerfile, and will then edit the history so that it matches the history of the genuine mcr.microsoft.com/dotnet/sdk:5.0-alpine
image. The image will appear at a glance to be built from the genuine Dockerfile and will even satisfy the comparison approach suggested by Microsoft – that is, the use of container-diff
with the flag --type=history
will report that there is no difference between the commands in each image’s history.
Environment
I will be using an Ubuntu 20.04 VM with Docker v20.10.3 installed as per https://docs.docker.com/engine/install/ubuntu/
justin@marlene:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
justin@marlene:~$ uname -a
Linux marlene 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
justin@marlene:~$ sudo docker version
Client: Docker Engine - Community
Version: 20.10.3
API version: 1.41
Go version: go1.13.15
Git commit: 48d30b5
Built: Fri Jan 29 14:33:21 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.3
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 46229ca
Built: Fri Jan 29 14:31:32 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Installing container-diff
I have installed the binary release of container-diff
v0.15.0 as per https://github.com/GoogleContainerTools/container-diff#linux
justin@marlene:~$ curl -s -LO https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64 && chmod +x container-diff-linux-amd64 && sudo mv container-diff-linux-amd64 /usr/local/bin/container-diff
justin@marlene:~$ container-diff version
v0.15.0 built from git 2db6995
Note that I wouldn’t normally yolo-curl a binary to /usr/local/bin/
but this is a disposable VM :^)
Building a modified .NET image as justinsteven/baddotnet-sdk:5.0-alpine
Obtain the source Dockerfile:
justin@marlene:~$ git clone https://github.com/dotnet/dotnet-docker
Cloning into 'dotnet-docker'...
remote: Enumerating objects: 32734, done.
remote: Total 32734 (delta 0), reused 0 (delta 0), pack-reused 32734
Receiving objects: 100% (32734/32734), 6.33 MiB | 3.01 MiB/s, done.
Resolving deltas: 100% (15143/15143), done.
justin@marlene:~$ cd dotnet-docker/src/sdk/5.0/alpine3.13/amd64/
justin@marlene:~/dotnet-docker/src/sdk/5.0/alpine3.13/amd64$ git pull
Already up to date.
Modify the last line of the Dockerfile and “poison” the dotnet
binary:
$ cp Dockerfile Dockerfile.orig
$ vim Dockerfile
[... SNIP ...]
$ diff Dockerfile Dockerfile.orig
46c46
< && apk add --no-cache ncurses-terminfo-base && echo -e "#!/bin/sh\necho $'This isn\'t the dotnet you\'re looking for'" > /usr/bin/dotnet
---
> && apk add --no-cache ncurses-terminfo-base
(This Dockerfile, when built, will generate a Docker image with a /usr/bin/dotnet
executable file that will simply echo a fun message.)
Build the image as justinsteven/baddotnet-sdk:5.0-alpine
:
$ sudo docker build --pull -t justinsteven/baddotnet-sdk:5.0-alpine .
Sending build context to Docker daemon 7.68kB
Step 1/6 : ARG REPO=mcr.microsoft.com/dotnet/aspnet
Step 2/6 : FROM $REPO:5.0-alpine3.13-amd64
[... SNIP ...]
Successfully built 1d26e9ddc4ab
Successfully tagged justinsteven/baddotnet-sdk:5.0-alpine
Run the image and execute the dotnet
binary:
$ sudo docker run --rm justinsteven/baddotnet-sdk:5.0-alpine /usr/bin/dotnet
This isn't the dotnet you're looking for
Edit the Docker image history of justinsteven/baddotnet-sdk:5.0-alpine
To edit the Docker image history I will:
- Save the Docker image to a
.tar
file - Unpack the
.tar
file - Edit the metadata that specifies the image history
- In particular, I will remove the part of the command that I tacked on to the last line of the Dockerfile
- Repack the
.tar
file - Load the
.tar
file over the top of the existing Docker image in our local cache
Save the image:
$ sudo -g docker docker save justinsteven/baddotnet-sdk:5.0-alpine -o baddotnet.tar
Extract the saved image:
$ mkdir extracted
$ tar -C extracted -xf baddotnet.tar
$ ls -la extracted/
total 52
drwxrwxr-x 9 justin justin 4096 Feb 14 15:19 .
drwxrwxr-x 3 justin justin 4096 Feb 14 15:19 ..
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 110bffb0ec0b55916aa6b43a0b8c81b0a890c5d42dcb0124571999bc19a2e5d7
-rw-r--r-- 1 justin justin 7960 Feb 14 15:13 1d26e9ddc4abacf3f968001a77a8abb2246f996c507ee8835e4ecb70f9a8d596.json
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 3908b1d9d15773255e32f27acc0a9567dbece4ead3ef31cafe4f93febb3c4e5d
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 55ff4378140fdc511af9c4e87735bf61a90c2cd46592402854103a801a18aa5b
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 72eca9dbd06c4f30908c02a3edb6fa4f8dcbf405897ca624ee7becbe05ac0421
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 7349d60c79b89ac8c46ba339f4e7bb1c864eac9886b1b8a016cbe8601bf1e346
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 7cf72a7ce4db929d904e939a14614213a5da8e760a0b557abe3a12fe75d2436f
drwxr-xr-x 2 justin justin 4096 Feb 14 15:13 a9a9a8a5606b844d86ced594a39bfe25b135c04984ec136d353856fc7f2ded43
-rw-r--r-- 1 justin justin 688 Jan 1 1970 manifest.json
-rw-r--r-- 1 justin justin 113 Jan 1 1970 repositories
Modify the <sha256>.json
file in the root of the extracted image to rewrite the history of the image:
Repack the .tar
:
$ tar -C extracted -cf baddotnet_modified.tar .
Load the .tar
. Doing so will modify the justinsteven/baddotnet-sdk:5.0-alpine
image in our local cache.
$ sudo docker load -i baddotnet_modified.tar
The image justinsteven/baddotnet-sdk:5.0-alpine already exists, renaming the old one with ID sha256:1d26e9ddc4abacf3f968001a77a8abb2246f996c507ee8835e4ecb70f9a8d596 to empty string
Loaded image: justinsteven/baddotnet-sdk:5.0-alpine
Inspecting the history and running container-diff
We can now use docker history
to inspect the history metadata of the local justinsteven/baddotnet-sdk:5.0-alpine
image. Note that viewing the history of images with long commands requires the --no-trunc
flag to show all of each line, and be warned that the output is a little unwieldy.
The column of Image IDs is now littered with the value <missing>
but in my experience this is par for the course for a Docker image obtained from a remote registry or imported from a .tar
file.
We’re now in a position to use container-diff
with the --type=history
flag to compare our justinsteven/baddotnet-sdk:5.0-alpine
image with Microsoft’s mcr.microsoft.com/dotnet/sdk:5.0-alpine
$ sudo -g docker container-diff diff mcr.microsoft.com/dotnet/sdk:5.0-alpine daemon://justinsteven/baddotnet-sdk:5.0-alpine --type=history
-----History-----
Docker history lines found only in mcr.microsoft.com/dotnet/sdk:5.0-alpine: None
Docker history lines found only in justinsteven/baddotnet-sdk:5.0-alpine: None
container-diff
tells us that no Docker history lines differ between the two images, and yet the justinsteven/baddotnet-sdk:5.0-alpine
image is clearly a bit wonky:
$ sudo docker run justinsteven/baddotnet-sdk:5.0-alpine dotnet
This isn't the dotnet you're looking for
We can even push the image to Dockerhub, remove it from our local cache, pull it down and repeat the exercise.
$ sudo -g docker docker push justinsteven/baddotnet-sdk:5.0-alpine
The push refers to repository [docker.io/justinsteven/baddotnet-sdk]
90a9b1d4e00e: Pushed
6aec14639694: Pushed
1a921a44e125: Pushed
82484281586d: Pushed
9b8d02d7ddab: Pushed
30025ca5e025: Pushed
1119ff37d4a9: Pushed
5.0-alpine: digest: sha256:60e4ccbd03dc2fce6e5e13be82cec8c079955acb28fd0b6f9e0d250a281ef4b3 size: 1798
$ sudo docker rmi justinsteven/baddotnet-sdk:5.0-alpine
Untagged: justinsteven/baddotnet-sdk:5.0-alpine
Untagged: justinsteven/baddotnet-sdk@sha256:60e4ccbd03dc2fce6e5e13be82cec8c079955acb28fd0b6f9e0d250a281ef4b3
Deleted: sha256:c211f1df2ca796be658d2e1918a3b772ec14ef143fc8117dfc55bf7d9f3eaa30
$ sudo docker system prune -a
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all images without at least one container associated to them
- all build cache
Are you sure you want to continue? [y/N] y
[... SNIP ...]
Total reclaimed space: 1.191GB
$ sudo docker pull justinsteven/baddotnet-sdk:5.0-alpine
5.0-alpine: Pulling from justinsteven/baddotnet-sdk
4c0d98bf9879: Pull complete
3c97837ed36a: Pull complete
fd1ceae44045: Pull complete
804ffefbc086: Pull complete
c057bb26faa8: Pull complete
b142bbac8154: Pull complete
0d78a2f3501a: Pull complete
Digest: sha256:60e4ccbd03dc2fce6e5e13be82cec8c079955acb28fd0b6f9e0d250a281ef4b3
Status: Downloaded newer image for justinsteven/baddotnet-sdk:5.0-alpine
docker.io/justinsteven/baddotnet-sdk:5.0-alpine
$ sudo docker run --rm justinsteven/baddotnet-sdk:5.0-alpine dotnet
This isn't the dotnet you're looking for
$ sudo -g docker container-diff diff mcr.microsoft.com/dotnet/sdk:5.0-alpine justinsteven/baddotnet-sdk:5.0-alpine --type=history
-----History-----
Docker history lines found only in mcr.microsoft.com/dotnet/sdk:5.0-alpine: None
Docker history lines found only in justinsteven/baddotnet-sdk:5.0-alpine: None
We can successfully pull justinsteven/baddotnet-sdk:5.0-alpine
and execute the dotnet
command within it to get our funny little message. container-diff
still sees the history as being identical to the official mcr.microsoft.com/dotnet/sdk:5.0-alpine
image.
This image is available on Dockerhub at the time of writing. I may remove it in the future, but if you get in quick you can pull it down and have a play.
So what?
The ability to modify Docker image histories means that histories should not be trusted, and should not be used as a proxy to determine the authenticity of a Docker image.
For this technique to be useful to an attacker, in the specific case of Microsoft’s .NET images, the attacker would need to:
- Be a malicious insider within Microsoft, or an external attacker, who can compromise the .NET image build pipeline or the Microsoft Docker image registry
- Publish malicious .NET images or replace published .NET images with malicious contents, using this technique to rewrite the image’s history and hide the tampering.
- Evade any behind-the-scenes spot checks that Microsoft are doing on their published .NET images, and any checks that the wider community is doing on the published .NET images
- Trust that Microsoft’s users are only performing naive checking of the Docker image history to authenticate the image
To pull this off would be no small feat, and I’m not suggesting that Microsoft’s .NET images have been tampered with. All I am suggesting is that, if they had, the guidance given in Staying safe with .NET containers is insufficient to detect conscientious tampering.
Image signing via Docker Content Trust (DCT)
Only just now did I wonder – “Is there a content signing mechanism for Docker images?” It turns out there is something called Docker Content Trust (DCT) which claims to allow image builders to sign images with a private key, so that consumers can verify the image authenticity. I haven’t heard of DCT before today so I don’t know how usable it is or how widely it’s used, but it may be useful for some builders or consumers of Docker images.
Conclusion
My advice to Microsoft is – please reconsider recommending that people use container-diff
to verify the authenticity of a Docker image using its history as a proxy. The image history is metadata only, and can be modified by an image author or by anyone who can get their hands on any Docker image.
Update 2021-02-16: The article has been updated. A section titled “Critical note” now acknowledges the fact that Docker image history is not necessarily trustworthy and that the guidance regarding container-diff
shouldn’t be used to draw security-critical conclusions.
My advice to users of Docker images is – don’t trust the image history of a potentially untrustworthy image. It can be modified.
I don’t have any advice for Docker or for Google (the authors of container-diff
). However, it might be a good idea to remind users at the appropriate time that Docker image histories are not immutable records of the building of an image – they can be modified, and hence should not be trusted.
Reproducible Builds allow anyone to independently verify that a given built artefact is identical to a locally built copy. They discourage and make visible some types of supply chain attacks. Reproducible Builds require build determinism, which Docker images lack, but some people are reportedly working on it.