Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arm builds #1399

Merged
merged 6 commits into from
Jul 16, 2021
Merged

Conversation

mathbunnyru
Copy link
Member

This is a POC, do not merge.

Based on @holdenk PR, with some fixes and additions.

Makefile Outdated Show resolved Hide resolved
@consideRatio
Copy link
Collaborator

consideRatio commented Jul 10, 2021

I'd like to have a clear scope for this PR defined, even as a draft, as I think it would help me contribute in general to this work - either directly in this PR or in surrounding PRs.

Main scope suggestion

I suggest that if we can make this PR able to do the following things, we go for a merge.

  • To update the Makefile to be able to, in an opt-in fashion, to build our Dockerfiles for all compatible architectures (like in: Z2JH, JupyterHub, ConfigurableHTTPProxy)
  • To update the Makefile to be able to test our built Dockerfiles specifically in the CI systems architecture.
  • To update the Makefile to be able to, in an opt-in fashion, to publish all our built Dockerfiles for one or all platforms.

Out of scope suggestion

I suggest we consider it out of scope for this PR to look to actually change the Dockerfiles, but instead focus on building whatever dockerfiles we currently can build using --platform linux/arm64.

@consideRatio consideRatio force-pushed the asalikhov/arm_builds branch from 2552034 to 141da2b Compare July 11, 2021 02:58
@consideRatio

This comment has been minimized.

@consideRatio consideRatio marked this pull request as draft July 11, 2021 03:00
@consideRatio consideRatio force-pushed the asalikhov/arm_builds branch 10 times, most recently from 222c97c to 7587c29 Compare July 12, 2021 00:30
@consideRatio consideRatio marked this pull request as ready for review July 12, 2021 01:22
@consideRatio consideRatio force-pushed the asalikhov/arm_builds branch from 7587c29 to b4b4472 Compare July 12, 2021 01:41
@consideRatio
Copy link
Collaborator

I think this may successfully be able to build and publish the images we have made support multiple architectures so far, base + minimal to my knowledge.

Some insights:

  1. Emulating architectures is slow
    It is far slower to build a dockerfile via the emulated architecture setup. The base image took 90 seconds to build I think but the arm64 took several minutes. If we build all images with multi-arch, it will take more than an hour to verify our PRs etc.
  2. It feels quite unsustainable to let a single GitHub job do everything from build, test, tag, push for all images and all arches.
  3. We don't do tagging for arm64, but assume the amd64 tags.

There are plenty of things that can be improved in general to avoid slowing down the CI build duration etc, but it is also a lot of work and will require significant changes.

Review suggestions

To make our CI system build, test, and publish images compatible with amd64 and arm64 is complicated and bound to include some trade offs.

  • slowing down our CI system
  • retaining support for using the Makefile locally
  • being thorough in what we test
  • making arm64 based images accessible

I have outlined a scope in #1399 (comment) that this PR address. Is the scope reasonable is the first question, and the second if the technical implementation and its associated trade offs is reasonable.

I've considered some tweaks to avoid slowing down our CI system too much, such as letting our PRs triggered CI jobs retain the quicker behavior by omitting multi-arch details. At the moment, I think I'd prefer having a slow but thorough CI system while we try get our images arm64 compatible, and perhaps later make it quick and less thorough.

I have some ideas on how we can make the CI system a lot more fancy, but it also involves a lot more work. I don't think I can find time to put in that much work at the moment so I hope we can settle for something not yet perfect. Ideas include:

  • splitting apart our massive workflow with a massive job to build, test, tag, and push all of our images
    • use of needs
  • running only the jobs and steps within that makes sense for any given PR
  • using github's container registry to cache images:
    • from nighly main branch builds
    • from PR test builds

@mathbunnyru
Copy link
Member Author

@consideRatio I will definitely take a look at this, hope today or in the next few days.

@manics may I also ask you to take a look? You seem to be experienced with this kind of changes.

Copy link
Member Author

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should improve naming - I'm absolutely fine breaking make build/something command, it's our implementation detail.

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@mathbunnyru

This comment has been minimized.

Copy link
Member Author

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing - let's not forget about docs:

  • let's explicitly say, that now we support arm for some images
  • we need to list all the caveats there
  • also, updating tagging/README.md would be nice

Makefile Show resolved Hide resolved
@mathbunnyru
Copy link
Member Author

Is the scope reasonable is the first question

Yes, I think it is well described.

the second if the technical implementation and its associated trade offs is reasonable.

I have a question for you.
For example, let's consider a situation, where we get rid off of amd64-only images (I think it's quite possible to do so).
How do you think, would it be possible to get over current limitations easily?

And also, I think we should create at least 2 issues and a new arm tag:

  • tagging is done from amd64 images
  • testing is not done for arm images

@consideRatio
Copy link
Collaborator

How do you think, would it be possible to get over current limitations easily?

I think the limitations should be resolved by having dedicated self-hosted GitHub arm64-based runners where the performance isn't crap building arm64 images, because then it is reasonable to have two jobs side by side and we don't lose any performance - one for amd64, one for arm64.

If I recall correctly, the build Makefile command doesn't specify either amd64 or arm64, and that was a reason for me motivating not calling them build-amd64 - they would be build-arm64 if it was ran on an arm64 machine as an explicit platform wasn't specified - right?


Thanks for the review @mathbunnyru! I'll go through it more properly later, I think I agree with your points but I need to deliberate a bit more on the naming of makefile commands.

@consideRatio
Copy link
Collaborator

consideRatio commented Jul 15, 2021

@mathbunnyru I made a commit (44dfb03) about the naming of the Makefile commands. I don't think of the commands without multi as related to amd64 specifically, but the system architecture - which doesn't make them amd64 specific. Due to that, I think it makes sense to have -multi suffix added to the commands that makes sure to build and push for both amd64 and arm64.

I created two new issues:

I think we should create [...] and a new arm tag:

I don't understand what you mean with this. Do you mean that we should tag our multi-platform images with arm? Perhaps you could push a commit about this and/or clarify this wish?

updating tagging/README.md would be nice

I looked through it and couldn't figure out what kinds of updates you would like there. Perhaps you can push a commit for that?

let's explicitly say, that now we support arm for some images
we need to list all the caveats there

I added 5a08154 for this!

@consideRatio consideRatio force-pushed the asalikhov/arm_builds branch 2 times, most recently from 80b7d86 to 5a08154 Compare July 15, 2021 20:15
@mathbunnyru
Copy link
Member Author

And also, I think we should create at least 2 issues and a new arm tag

Sorry, I meant a label tag in github. I created it and added it to your newly created issues.

@mathbunnyru
Copy link
Member Author

Did the merging, give me 20 minutes to do cleanup.

Co-authored-by: Erik Sundell <[email protected]>
@mathbunnyru mathbunnyru force-pushed the asalikhov/arm_builds branch from 27c5953 to 7ee6bdb Compare July 16, 2021 14:14
Makefile Outdated Show resolved Hide resolved
@mathbunnyru
Copy link
Member Author

@consideRatio I think this change is great and we're almost ready to merge it.

You've already seen that I tried to merge style and not related improvements separately.
May I ask you to create a PR for adding groups to workflow run (without changing env)?

The reason for asking:

  1. The scope of this PR will be much easier to understand (even in the future).
  2. If we decide to go with Github self-hosted runners, we might want to more-or-less revert this change. It's absolutely discussible and right now I don't know at all what's gonna happen, but if this review will be focused on exactly one thing, it will be much easier.

Nicely done!!

Снимок экрана 2021-07-14 в 22 28 00

@mathbunnyru
Copy link
Member Author

Maybe this should do it?
#1406

@consideRatio
Copy link
Collaborator

A note to myself that this can be relevant code to add back in the future so it feels fine removing it here.

import json
from contextlib import contextmanager


def _gwc(command_name, command_value="", **params):
    if not os.environ.get("GITHUB_ACTIONS"):
        return
    # Assume non-string values are meant to be dumped as JSON
    if not isinstance(command_value, str):
        command_value = json.dumps(command_value)
        print(f"dumped json: {command_value}")
    if params:
        comma_sep_params = ",".join([f"{k}={v}" for k, v in params.items()])
        print(f"::{command_name} {comma_sep_params}::{command_value}")
    else:
        print(f"::{command_name}::{command_value}")


@contextmanager
def _gwc_group(group_name):
    """
    Entering the context prints the group command, and exiting the context
    prints the endgroup command.<<
    """
    try:
        yield _gwc("group", group_name)
    finally:
        _gwc("endgroup", group_name)

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Jul 16, 2021

@consideRatio I think I'm done here and I simplified this PR as much as possible :)
Please, take a look that I didn't miss anything.

@manics could you also take a look, please?

Copy link
Collaborator

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Thanks for your work to extract unrelated changes from this PR @mathbunnyru!

(I introduced these changes to simplify the arm64 feature development, as grouping and line breaks helped me read code/logs and think more clearly.)

Btw @mathbunnyru regarding:

If we decide to go with Github self-hosted runners, we might want to more-or-less revert this change.

I think with self-hosted arm64 runners to complement the amd64 runners we get from GitHub, the changes we need are very small.

  • For testing, we can instead of using -multi we just exclude the suffix and voila we have build + test that is architecture native and high performance running in parallel.
  • For publishing of the docker images, we should probably run the push-all-multi workflow on a single arch and rely on docker buildx to do a multi-platform build from scratch.
  • For publishing of arm64 manifests, we could let a arm64 job run that doesn't also try to do push-all-multi but does indeed do the manifest creation stuff and makes sure it becomes clear it is for the arm64 version rather than amd64 version.

Note that the only change required to use a self-hosted runner, I think, is to declare it as part of a job. I figure we configure our jobs to run twice using a matrix strategy for the job.

Copy link
Contributor

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, LGTM 😃

FYI if you wanted to avoid building amd64 twice you could tag each architecture separately and use docker manifest to create the multi-architecture manifest list https://docs.docker.com/engine/reference/commandline/manifest/`
e.g. docker manifest base-notebook:<tag> base-notebook:<tag>-amd64 base-notebook:<tag>-arm64
but if your long term plan is to support even more architectures your current approach makes sense.

@consideRatio
Copy link
Collaborator

consideRatio commented Jul 16, 2021

@manics is it docker manifest create <name of image> <amd64 only image> <arm64 only image>? I figure it makes sense then to build each arch from a native platform, push to githubs container registry, then in a final github job that needs the separate build jobs pull both and combine them.

Btw we do don't spend 2x time by building the absolute same images twice thanks to cache, so that part isn't a big problem. But, we do build the same image once for each arch, and in the same job - and it takes a much longer time to emulate the non-native arch build.

Thanks @manics for your pioneering work on arm64 builds in other repos! ❤️ 🎉 I'm very excited to now let z2jh be fully arm64 compatible!

docker manifest create --help

Usage:  docker manifest create MANIFEST_LIST MANIFEST [MANIFEST...]

Create a local manifest list for annotating and pushing to a registry

EXPERIMENTAL:
  docker manifest create is an experimental feature.
  Experimental features provide early access to product functionality. These
  features may change between releases without warning, or can be removed from a
  future release. Learn more about experimental features in our documentation:
  https://docs.docker.com/go/experimental/

Options:
  -a, --amend      Amend an existing manifest list
      --insecure   Allow communication with an insecure registry

@consideRatio consideRatio merged commit 514883d into jupyter:master Jul 16, 2021
@consideRatio
Copy link
Collaborator

We did it @mathbunnyru @manics!!! 🎉 🌻 ❤️

@consideRatio consideRatio added type:Arm Issue specific to arm architecture type:Enhancement A proposed enhancement to the docker images labels Jul 16, 2021
@mathbunnyru
Copy link
Member Author

Great news!

@consideRatio awesome work 👍
@holdenk I also wanted to thank you for starting this work.

@consideRatio
Copy link
Collaborator

Yes thank you @holdenk!!! ❤️ 🎉

@consideRatio
Copy link
Collaborator

Complemented this PR with #1408

@consideRatio
Copy link
Collaborator

image

@holdenk
Copy link
Contributor

holdenk commented Jul 20, 2021

Thank y'all for carrying this all the way to the finish line, I am so thankful and yay arm64 images for all :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Arm Issue specific to arm architecture type:Enhancement A proposed enhancement to the docker images
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants