Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporter: support resetting timestamp for determinism #1058

Closed
Tracked by #587
AkihiroSuda opened this issue Jun 25, 2019 · 9 comments
Closed
Tracked by #587

exporter: support resetting timestamp for determinism #1058

AkihiroSuda opened this issue Jun 25, 2019 · 9 comments

Comments

@AkihiroSuda
Copy link
Member

No description provided.

@AkihiroSuda
Copy link
Member Author

We also need to consider gzip determinism

For small images we might be able to just push the image without gzip and call it a day

@tonistiigi
Copy link
Member

If we only reset times in differ it is little unsafe because snapshots in local build cache and remote build cache will have different timestamps. But if we reset in snapshot it will be 1) slow 2) confuse the containerd naive differ.

I guess the first is fine if this is opt-in from the user and clearly marked as an exporter feature.

With a custom (eg. fuse based) snapshotter+differ we could do this without the above limitations as well.

@ppiotrow
Copy link

Description

The Reproducible Builds guideline points that build tools should make build timestamp configurable https://reproducible-builds.org/docs/source-date-epoch/

While building docker image with docker build the image "created" property is set up to the docker daemon system timestamp. This is against reproducibility guidelines as even on the same host you're unable to build the same image again. It's even more difficult to do it on different hosts with docker daemon.

There is this cool blogpost that describes the simplest ever docker image ending up having different digests when running on two different hosts or without cache.

Workarounds
The JIB builds images on it's own, without using docker build command.
They create *.json and *.tar filles manually just to override "Created" property of the image.
It is described in their FAQ

Describe the results you expected:
I'd love to have additional option in the build command like
docker build --sourceDateEpoch='1970-01-01 00:00:00.0' .
or just --sourceDateEpoch=0 with default value to system.time().

This would enable to reuse layers cache while building docker images from various build tools plugins like this sbt/sbt-native-packager#1321 or described here

@tonistiigi
Copy link
Member

I'm fine with making the "created" configurable (it is already stable in buildkit if you get cache for a layer) but that on its own doesn't really solve this issue. The files generated in run commands still cause timestamps and gzip may not be deterministic. Resetting timestamps in snapshots is quite hard in current implementations. as explained before Some files are created by runc that is out of our control.

@Romain-Geissler-1A
Copy link

Romain-Geissler-1A commented Nov 29, 2020

Hi,

Based on the previous answer, has this build flag to allow overriding just the "created" layer property rather than the layer files been implemented already ? That itself would already help the people who already do make layer storage stable already (ie not depending on current time).

Cheers,
Romain

@leonard84
Copy link

This is a really important feature for reproducible images.
When we try to do it manually, the caching logic produces corrupt images.
The cache will not look at the contents of the file, if the timestamp did not change.

Reproducer

cat <<EOF > Dockerfile
FROM alpine:3.5

COPY ./install /

CMD [ "/bin/sh", "hello.sh" ]
EOF

mkdir install
for i in $(seq 1 5); do
    echo "echo $i" > install/hello.sh
    touch -t 8001010000 install/hello.sh
    docker buildx build --progress plain --tag reproducer:latest .
    echo -------------------------------------------------- 
    echo expect $i
    docker run --rm reproducer:latest
    echo --------------------------------------------------
done

Output

1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 102B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/alpine:3.5
#3 DONE 0.0s

#4 [1/2] FROM docker.io/library/alpine:3.5
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 64B done
#5 DONE 0.0s

#6 [2/2] COPY ./install /
#6 CACHED

#7 exporting to image
#7 exporting layers done
#7 writing image sha256:0edac37a1f051e2031eb1086c0a892fbedf9b32b0deef7f2fcdf7af7dc9d8cdc done
#7 naming to docker.io/library/reproducer:latest done
#7 DONE 0.0s
--------------------------------------------------
expect 1
1
--------------------------------------------------
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 31B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/alpine:3.5
#3 DONE 0.0s

#4 [1/2] FROM docker.io/library/alpine:3.5
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 64B done
#5 DONE 0.0s

#6 [2/2] COPY ./install /
#6 CACHED

#7 exporting to image
#7 exporting layers done
#7 writing image sha256:0edac37a1f051e2031eb1086c0a892fbedf9b32b0deef7f2fcdf7af7dc9d8cdc done
#7 naming to docker.io/library/reproducer:latest done
#7 DONE 0.0s
--------------------------------------------------
expect 2
1
--------------------------------------------------
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 31B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/alpine:3.5
#3 DONE 0.0s

#4 [1/2] FROM docker.io/library/alpine:3.5
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 64B done
#5 DONE 0.0s

#6 [2/2] COPY ./install /
#6 CACHED

#7 exporting to image
#7 exporting layers done
#7 writing image sha256:0edac37a1f051e2031eb1086c0a892fbedf9b32b0deef7f2fcdf7af7dc9d8cdc done
#7 naming to docker.io/library/reproducer:latest done
#7 DONE 0.0s
--------------------------------------------------
expect 3
1
--------------------------------------------------
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 31B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/alpine:3.5
#3 DONE 0.0s

#4 [1/2] FROM docker.io/library/alpine:3.5
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 64B done
#5 DONE 0.0s

#6 [2/2] COPY ./install /
#6 CACHED

#7 exporting to image
#7 exporting layers done
#7 writing image sha256:0edac37a1f051e2031eb1086c0a892fbedf9b32b0deef7f2fcdf7af7dc9d8cdc
#7 writing image sha256:0edac37a1f051e2031eb1086c0a892fbedf9b32b0deef7f2fcdf7af7dc9d8cdc done
#7 naming to docker.io/library/reproducer:latest done
#7 DONE 0.0s
--------------------------------------------------
expect 4
1
--------------------------------------------------
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 31B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/alpine:3.5
#3 DONE 0.0s

#4 [1/2] FROM docker.io/library/alpine:3.5
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 64B done
#5 DONE 0.0s

#6 [2/2] COPY ./install /
#6 CACHED

#7 exporting to image
#7 exporting layers done
#7 writing image sha256:0edac37a1f051e2031eb1086c0a892fbedf9b32b0deef7f2fcdf7af7dc9d8cdc done
#7 naming to docker.io/library/reproducer:latest done
#7 DONE 0.0s
--------------------------------------------------
expect 5
1
--------------------------------------------------

@tonistiigi
Copy link
Member

In LLB level we do support overwriting timestamp on file operations(copy, mkdir etc). So for the last example we could add a flag in COPY. Or maybe if we add a global flag then it is automatically set for COPY (might be unexpected in some cases). Does not solve this issue for RUN commands though.

@sudo-bmitch
Copy link

I'd want the timestamp to be a "no-later-than" value. So if there are files in my base layers with earlier timestamps they should not be modified. That would also allow the cache to be reused when a cache step with an equal or earlier timestamp is seen. And if it finds a cache entry with a later timestamp, I'd find it acceptable to exclude that or recreate it with the earlier timestamp.

The workflow I imagine is either setting the timestamp to the git commit time of the repo I'm building, the timestamp set in a label on an image I'm reproducing, or set it to an effective zero value like Jan 1, 1970. In the first two cases, if I'm being reproducible, my base image would be pinned to something that exists before that git commit was created. And in the latter case, we may have a parallel cache for layers where the timestamps have been stripped.

@AkihiroSuda
Copy link
Member Author

Implemented in:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants