-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make Docker build smarter, add Dockerfile.debian #6344
Conversation
FROM docker.io/library/golang:1.19-bullseye AS builder | ||
|
||
RUN apt update | ||
RUN apt install -y build-essential git bash ca-certificates libstdc++6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure libstdc++6 is needed and why 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i copied the deps from alpine - the matching dependency in terms of most compatibililty for libstdc++ is libstdc++6 (glibc6). Not 100% if it is needed for runtime, i assumed it was there in the alpine one for a reason though.
the other option in debian is libstdc++5 (glibc3.3)
make all | ||
|
||
|
||
FROM docker.io/library/golang:1.19-alpine3.16 AS tools-builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure an alpine here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be best for tools to be build with same lib as where supposedly the db is created.
used the golang alpine so that i could easily use go mod vendor
WORKDIR /app | ||
|
||
ADD Makefile Makefile | ||
ADD tools.go tools.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is useless, because tools.go used to ensure binary dep is stored in go.mod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since i'm not copying the rest of the files, 'go mod vendor' will run 'go mod tidy' first, which cause those deps to be removed, unless tools.go file is there, so it only worked for me when i copied the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see reason to call "go mod tidy" in docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'go mod vendor' which is used by the db-tools build process calls 'tidy'. not sure if there's a way to call vendor without tidy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
this PR makes many changes to the dockerfile in hopes of making it faster to build, download, and upload.
Instead of copying the entire repository at once, it first copies the go.mod and go.sum files, then runs go mod download. This allows the dependencies to exist in their own layer, avoiding the need for the build cache there.
the compilation of the db-tools is moved to a second image. Since these are not often changed, not needing to rebuild them every time makes things a lot faster for local development. It also reduces the amount that is needed to be uploaded when creating new release - since the db-tools layer will be unchanged
each binary is copied individually into its own layer. This allows docker to upload/download each binary in parallel, along with better recovery if the download of the existing 500mb layer fails (since it is done in parts)
it also adds a second dockerfile which builds erigon with a debian image, as a start to addressing #6255
while this dockerfile has a greater total image size, the total size of different layers across versions will be smaller, resulting in smaller effective upload & download sizes
with all that said - I am not really sure how the existing erigon ci/release process works, so maybe these changes are incompatible with it.
comparison
docker build speed
in both examples, i build erigon, then change a file in core/blockchain.go (resulting in recompilation)
these are the produced logs
CURRENT DOCKERFILE
Since the downloading of dependencies is in the cache - rebuild time does not suffer, but notice that it does not go into its own layer.
More importantly, since the db-tools are being rebuilt every time, an extra 10-20s is added to the docker build time.
NEW DOCKERFILE:
since dependencies and db-tools versions didnt change - all those layers are cached, and did not need to rebuild/redownload
an additional advantage - build tools that are able to share cached layers (such as kaniko or gitlab runner) are able to share dependency layers automatically between runs, either sequential or concurrent, while using mounts are an extra piece that needs to be configured, and is not possible to share for concurrent builds
docker push/pull speed
see this example of the image pushing to a docker repo
CURRENT DOCKERFILE
the existing image can only be uploaded in a single layer, and it is very big. if the upload fails part way through - the entire upload is aborted, and i must try again. It is the same with the download
new image
since the image is broken up into many small parts - the upload can happen in parallel, which is faster. Along with this, we can resume after a failure in upload, since we are uploading smaller chunks