Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose an alternative process for docker image generation. #40

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nuclearsandwich
Copy link
Member

In the past we've built custom docker images to support running armhf, arm64, and i386 containers using the scripts in this repository. In order to support running these images on amd64 hosts part of that image creation process has also involved bundling an amd64 qemu-user-static binary in the target image to enable the transparent running of non-native executables when paired with a host that supports binfmt-misc and has qemu-user-static binaries properly registered.

In the years since that process Docker has devised its own multi-platform scheme using image manifests and now the platforms that we've traditionally been creating cross-platform images for provide their own. Additionally, the official ROS build farm has transitioned to relying on native ARM build hosts rather than using qemu on AMD64 hosts and i386 is no longer supported by any ROS distribution. However, I think we should try and maintain support for fully amd64 and qemu based builds as long as it is feasible to do so.

Empirically, it appears that the official images are either doing something in order to allow the host's qemu-user-static to pass through or is bundling a copy that is otherwise not discoverable in the image. Since I can't explain this behavior yet I'm hesitant to rely on it.

The ros_buildfarm scripts and templates also assume that custom osrf images should be used for non-amd64 platforms. We can alter that assumption by changing one or two snippets but that would break anyone still relying on qemu for running builds.
As a long term solution I think we should update all of our buildfarm docker invocations to specify the target platform, however that requires many more modifications than the update of a single template and will require much more scrutiny.

It's possible, although somewhat convoluted for us to re-publish the official docker images together with an injected qemu-user-static binary using the same naming convention that we've historically used for these multi-arch images created with debootstrap. By doing so, we dramatically reduce the difference between our images and the upstream ones, since we would be, in effect, just distributing them with a different name and with a qemu-user-static binary in overlay, as well as reducing the image size as the official images are significantly more compact. Since they're already on the docker registry we also don't have to re-upload them ourselves.

Since this is still a proposal and the technique for getting this done is quite brittle I have not yet fully automated the process. But I've outlined the steps to be performed so that either directly or gradually we can automate more and more or find better ways of accomplishing the same thing.

My ultimate goal would be to retire this repository entirely and rely on docker's builtin multi-arch handling together with changes to ros_buildfarm to run docker in a platform aware fashion and overlay the qemu-user-static binary from the host when it is required. So ultimately a docker run on the buildfarm would look something like the below when running an arm64 target container on amd64.

docker run --platform=linux/aarch64 --volume=/usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static ...

@nuclearsandwich
Copy link
Member Author

@ruffsl I'd also be very interested on your thoughts here since you're quite the docker whisperer in my esteem.

Copy link
Contributor

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I complete support deprecating this repository now that docker actually supports the different architectures. This was mostly a workaround so that we could do the single architecture.

I don't think that we have to inject QEMU. Here's it running w/o that and I can build and run aarch64 executables inside the environment.

$ docker run --platform=linux/aarch64  -ti ubuntu:focal bash
root@64bf983dcffe:/# apt-get update -qqq && apt-get install -qqqy gcc file
# CLIPPED
root@64bf983dcffe:/# cat << EOF > /tmp/helloworld.c
> #include <stdio.h>
> 
> int main()
> {
> printf("Hello World!");
> return 0;
> }
> EOF
root@64bf983dcffe:/# 
root@64bf983dcffe:/# gcc /tmp/helloworld.c
root@64bf983dcffe:/# ./a.out 
Hello World!root@64bf983dcffe:/# 
root@64bf983dcffe:/# file a.out 
a.out: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=5535c724f2dcd20f98c6dcbe6bf5a94cab810960, for GNU/Linux 3.7.0, not stripped

I also am able to run the rclcpp examples w/o mounting in qemu-arm-static

@tfoote
Copy link
Contributor

tfoote commented Feb 4, 2022

The official ros images can be run as is just declaring the platform. I ran into some issues but qemu is clearly being invoked appropriately.

@ruffsl
Copy link
Member

ruffsl commented Feb 4, 2022

@ruffsl I'd also be very interested on your thoughts here since you're quite the docker whisperer in my esteem.

Sure thing, I'd happy to help!

Empirically, it appears that the official images are either doing something in order to allow the host's qemu-user-static to pass through or is bundling a copy that is otherwise not discoverable in the image. Since I can't explain this behavior yet I'm hesitant to rely on it.

You are partly correct, as the old practice was to copy in a qemu-arm-static binary into the container before you run any binaries. See this post by @ computermouth (Ben Young) for more context:

Following the links referenced in computermouth's post leads to an upstream issue in debian (since fixed) that was previously preventing the docker container runtime from seamlessly using the system installed/registered qemu binfmt.

In fact , I recall this topic bubbling over onto ROS Discourse a few years ago as well:

With that resolved, using any modern debian/ubuntu distro, it's now relatively simple to use tools like buildx to set multiple targets platform for building:

https://docs.docker.com/engine/reference/commandline/buildx_build/#platform

As an example using GitHub actions, check out the multi-platform documentation for build-push-action:

Note that the setup-qemu-action must be invoked for the host VM prior to invoking buildx build. I'd wager @tfoote is running a modern release of debian/ubuntu and has already installed qemu-user-static prior, thus the ros arm images just working out of the box.

Looks like the action installs the QEMU static binaries a little unconventionally via a privileged docker container using the tonistiigi/binfmt:latest DockerHub image.

https://github.com/docker/setup-qemu-action/blob/10348241d3ea2d30357b172897afc31824ea2e2e/dist/index.js#L203-L204

Or can also be installed on the host OS simply via a package manager:

apt-get install qemu-user-static

I complete support deprecating this repository now that docker actually supports the different architectures. This was mostly a workaround so that we could do the single architecture.

I agree. There are a lot a repos named after every arch that clutter up the osrf DockerHub org, with many stale images to boot. So a bit of a drag for security updates and whatnot. I'm not sure what else you all need to install/pre-bake on top of the debian/ubuntu base images for the ROS Buildfarm CI jobs, but I'm sure it could be generalized into a single Dockerfile template and a single multi arch docker registry repo.

@nuclearsandwich
Copy link
Member Author

I'm not sure what else you all need to install/pre-bake on top of the debian/ubuntu base images for the ROS Buildfarm CI jobs

We use the base images from upstream directly on amd64. In theory any work that we do in these images should already be replicated in ros_buildfarm dockerfile generation since it has to support amd64 anyway.

a modern release of debian/ubuntu and has already installed qemu-user-static prior, thus the ros arm images just working out of the box.

Thanks for tracing that and linking it up for us! The context brings me much more confidence.

The ROS build farm agents are Ubuntu 20.04 and have qemu-user-static installed. I tested this behavior (no explicit mounting of qemu-aarch64-static into the container) on one of our agents directly and it worked. So it sounds like just passing --platform ... when running cross targets could be an option that's already compatible with every 20.04 build farm deployment. I noticed that using FROM --platform only works reliably when using the buildx builder which is not something that we've switched to in ros_buildfarm.

@nuclearsandwich
Copy link
Member Author

I think that what this all means is that to satisfy the immediate need for multi-arch containers for Jammy and Bullseye we can republish the official images under osrf/{debian,ubuntu}_arm64. Because we're not currently using multi-platform aware docker commands I think we could get away with not even manipulating the architecture information when doing so since the arm64 image would be the only one published under our repository name. However, I think this would result in a warning printed every time the image was used resulting "UNSTABLE" build status being applied to most builds that use warning parsers so it still makes sense to perform the configuration change to describe the image as amd64.

…tatic.

Since the host's installation of qemu-user-static is entirely sufficient
for Ubuntu 20.04 and the current build farm deployment requires 20.04
there is no longer a need to sideload the qemu-user-static binaries.
@nuclearsandwich
Copy link
Member Author

I've updated the instructions to forego adding the qemu-user-static binary since that is not needed at all on current 20.04 build farms. I've also used these instructions to push osrf/ubuntu_arm64:jammy and osrf/debian_arm64:bullseye.

Future work on the ros_buildfarm scripts can bring "native" docker multiplatform support and we can retire the use of these images entirely.

I can also start working on a scripted version off this to run so that we can trigger it whenever upstream images are pushed to stay in sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants