-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix docker image to actually work #742
Conversation
6b8fc68
to
61c9e35
Compare
61c9e35
to
e23f06e
Compare
The --mount option is more strict that the --volume option, and will error if the host source directory does not exist, whereas --volume will silently create a directory. We want the strict behaviour, so this changes to use --mount. The motivation for this change is to support job-runner running in docker, which requires that the paths it uses are correct from the host's perspective. Currently, a incorrect path will be sort of work, as the directory would be created inside the container, but only later would the issue arise. https://docs.docker.com/engine/storage/bind-mounts/#differences-between--v-and---mount-behavior
Was slow, complex, and unused
This simplifies the docker container by removing all the previous user/group management inside the container as it is not needed. Instead, we used the much simpler solution of adding an additional group to the running user that is the host's docker group id. This allows the container to have permission to access the host's bind-mounted /var/run/docker.sock docker, but the container itself can run as the local running user, which DTRT with respect to file permissions on bind mounts.
The job-runner container uses the host's docker service. This means that when it bind mounts directories with `docker run`, the source path must exist on the *hosts* file system. We've fixed this already for tests, with PYTEST_HOST_TMP, but we had not actually fixed it for actual using the docker image for real. Before the previous change to use --mount rather than --volume, a job would succeed, but the files would be in the wrong location. Now we have that change, the job fails. We use a simple if inelegant fix for this: we mount the STORAGE_BASE directories inside the container at the same path as on the host. This means that the same config value can be used for job-runner copying files itself, or for passing the host's docker to mount. As part of this, I changedthe way we supply these values to being in one time generated docker/.env file. This is more discoverable, usable outside of just, and more efficant than before. I chose to use `docker/medium` and `docker/high` as the volume mount locations, for simplicity, and also, so the the docker machinery is isolated from the non-docker venv set up. This include a basic `just docker/functional-test` command to spin up a job-runner container and run an actual job on it.
e23f06e
to
7d73d6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always nice to see things getting simpler and more powerful at the same time!
Was a nice, well-explained series of commits as well. Could maybe do with editing the auto-generated PR title and description just to give more context for these changes.
Arg, yes, sorry, have updated |
The job-runner docker image is in a half working state. A couple of attempts have previously been made to figure out how to get it to be able to use the host's docker and file system to run jobs, but they were messy and also didn't fully work.
However, in docker 4.26.0, released Dec '23, they added support for
--group-add
, which made the permissions much simpler.Now we are on 22.04, we have a more up to date docker, and docker-compose v2, so this is simpler change. Using this new features, this change cleans up some of the old approaches.
We use a simple trick to make bind mounts work - the bind mounts must use the same path inside the container as on the host. This avoids us needing to add new configuration options and logic, and seems a reasonable constraint.