-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speculative fixes for docker image building hanging #1149
Speculative fixes for docker image building hanging #1149
Conversation
It's root on most systems. Seems unnecessary to do this. If it needs to be written to, it should be done as root.
In particular we want the root user in the docker container to be buildkite-agent:docker on the host, so use id -u and getent group to determine these dynamically.
--userns=host \ | ||
--rm \ | ||
"tonistiigi/binfmt:${QEMU_BINFMT_TAG}" \ | ||
--install all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, after this is done, we restart the docker daemon.
9eb9f8d
to
2ab8737
Compare
We've dogfooded this version for a bit, and we have a way forward. Let's merge it into v6 and tag v6.0.0-beta2 |
docker buildx build -f tests/Dockerfile --progress=plain -t buildkite-postgres:latest tests: | ||
exit-status: 0 | ||
timeout: 30000 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests building a problematic container image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding this as a comment in the file?
docker buildx build -f tests/Dockerfile --progress=plain -t buildkite-postgres:latest tests: | ||
exit-status: 0 | ||
timeout: 30000 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding this as a comment in the file?
@DrJosh9000 I'm going to beef up the tests based on what I've learnt since making this PR. Watch this space for another PR where I'll explain it better. |
A lot of the PR is cleanup, but there are some speculative fixes for the failure of building of a docker image that was revealed during dogfooding.
I've added the building of that Docker image to the tests, and it clearly passes, but there may be some very rare flake involved.
The changes that may affect this are:
Installing multiarch qemu (via a docker container) has been moved from after the docker daemon has been configured to before, then all trace of the image that was used to do this is removed.
The files (
/etc/subuid
,/etc/subgid
) needed for docker usernamespaceing are generated on first boot as I noticed that they seem to be modified dynamically after we copy them in. They should be dynamic as the values of thebuildkite-agent
user's UID and thedocker
group's GID are dynamic. It was a coincidence that the hard-coded values did not change between builds of the AMI before.