Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speculative fixes for docker image building hanging #1149

Conversation

triarius
Copy link
Contributor

@triarius triarius commented Jul 3, 2023

A lot of the PR is cleanup, but there are some speculative fixes for the failure of building of a docker image that was revealed during dogfooding.

I've added the building of that Docker image to the tests, and it clearly passes, but there may be some very rare flake involved.

The changes that may affect this are:

Installing multiarch qemu (via a docker container) has been moved from after the docker daemon has been configured to before, then all trace of the image that was used to do this is removed.

The files (/etc/subuid, /etc/subgid) needed for docker usernamespaceing are generated on first boot as I noticed that they seem to be modified dynamically after we copy them in. They should be dynamic as the values of the buildkite-agent user's UID and the docker group's GID are dynamic. It was a coincidence that the hard-coded values did not change between builds of the AMI before.

It's root on most systems. Seems unnecessary to do this. If it needs to
be written to, it should be done as root.
In particular we want the root user in the docker container to be
buildkite-agent:docker on the host, so use id -u and getent group to
determine these dynamically.
@triarius triarius changed the title Don't chown /etc/docker to ec2-user:docker Speculative fix for docker image building hanging Jul 3, 2023
--userns=host \
--rm \
"tonistiigi/binfmt:${QEMU_BINFMT_TAG}" \
--install all
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, after this is done, we restart the docker daemon.

@triarius triarius changed the title Speculative fix for docker image building hanging Speculative fixes for docker image building hanging Jul 3, 2023
@triarius triarius force-pushed the pdp-1276-fix-first-build-of-dockerfilepostgres-on-bkbk-is-failing-and branch 5 times, most recently from 9eb9f8d to 2ab8737 Compare July 5, 2023 13:08
@triarius triarius marked this pull request as ready for review July 11, 2023 01:56
@triarius triarius requested a review from a team July 11, 2023 01:57
@triarius
Copy link
Contributor Author

We've dogfooded this version for a bit, and we have a way forward. Let's merge it into v6 and tag v6.0.0-beta2

docker buildx build -f tests/Dockerfile --progress=plain -t buildkite-postgres:latest tests:
exit-status: 0
timeout: 30000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests building a problematic container image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding this as a comment in the file?

docker buildx build -f tests/Dockerfile --progress=plain -t buildkite-postgres:latest tests:
exit-status: 0
timeout: 30000

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding this as a comment in the file?

@triarius
Copy link
Contributor Author

@DrJosh9000 I'm going to beef up the tests based on what I've learnt since making this PR. Watch this space for another PR where I'll explain it better.

@triarius triarius merged commit 0d1b0a6 into v6 Jul 17, 2023
@triarius triarius deleted the pdp-1276-fix-first-build-of-dockerfilepostgres-on-bkbk-is-failing-and branch July 17, 2023 06:09
@triarius triarius mentioned this pull request Jul 25, 2023
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants