Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-arch image for base-notebook #1202

Closed
wants to merge 12 commits into from

Conversation

romainx
Copy link
Collaborator

@romainx romainx commented Dec 21, 2020

Hello,

I'm drafting a long PR to support several CPU architectures (multi-arch) for the base-notebook image: linux/amd64 (default), linux/arm64, linux/ppc64le. It sounds interesting to propose an official base Jupyter image for alternative CPU architectures like the raising ARM. Tools are now available to make it easy to do:

  • Docker buildx to build images for different platforms,
  • QEMU and qemu-user-static to emulate different architectures than the host,
  • Miniforge that provides installers for the different architectures,
  • OS and packages / libraries ready to use on different CPU architectures.

And it works 🎉

JupyterLab

Todo list

  • Build images for: amd64, arm64 and ppc64le
  • Test them by using the same code
  • Documentation update
    • Maintainer / collaborator
    • User documentation
      • supported architectures
      • limitations (for example pandoc is available only for amd64)
  • Manage the impact of in the manifest / hook
  • Makefile finalisation → need some polish, move things around, comments, etc.
  • Remove legacy Dockerfile.ppc64le
  • Push target finalization and test
  • Merge (if any) will have to use squash since I have made a lot of commits (trials and errors) to benefit from GitHub CI.

Pros

  • Propose Jupyter images for other CPU architectures
  • Transparent for the user since the appropriate architecture will be pulled automatically from DockerHub (no increase of image size)
  • One Dockerfile to rule them all: the same Dockerfile is used so there is no duplication
  • Same tests are run against all images
  • Get rid of the legacy ppc64le support

Cons

  • Build time increased
  • Increase complexity (the build is more complex)
  • Limited (at this time) to the base-notebook image, but could be extended
  • Make the maintenance of the stack harder (more potential issues, updates more complex, etc.)

Your feedback is welcome as much on the usefulness of such a feature as on how to implement it.

Best

@romainx romainx marked this pull request as draft December 21, 2020 12:11
@romainx romainx added the type:Enhancement A proposed enhancement to the docker images label Dec 21, 2020
@mathbunnyru
Copy link
Member

@romainx this is so cool 👍

I do like the idea of being able to build many architectures, I think that's a big improvement.

But I have one concern - I do not like a lot of code in Makefile.
I mean, it's not the language place to put some logic.
Maybe you have something in mind about how to change the build process to make our scripts easier?

@bollwyvl
Copy link

bollwyvl commented Dec 25, 2020 via email

@romainx
Copy link
Collaborator Author

romainx commented Dec 26, 2020

@mathbunnyru thanks for your feedback. It's only a first draft, my goal was to prototype it while keeping the existing build process. And yes the Makefile has become less maintainable with this addition.

@bollwyvl I did not know doit, thanks for the idea. I will definitely have a look at it 👀 .

You're right, if we put that in place the build process will need some improvement. One of the main reason is that it's a bit monolithic and will not scale well with this kind of evolutions. It becomes a drawback when it comes to add new images or features.

The advantage of having a Makefile vs. full GitHub actions is to keep the ability to run things locally without GitHub. I think we need to keep this capability. I will have a look at doit.

@bollwyvl
Copy link

Cool!

doit works pretty well, on GitHub actions, especially if the problems are trivially parallelized... or have a "do some up-front linting" task, e.g. doit lint, then fan-out to "build XX things," e.g. doit build:$SOME_THING then fan-in to "do a report," e.g. doit report where each is just performing a small part of the total. But from a local perspective, having doit all be a target that will do what the entire workflow would is really important, especially for reviewing PRs, etc. The doit CLI is kinda picky, so it can be easiest to handle complex behavior with matrix variables, e.g. BUILDING_IN_CI so that you don't run the linter again, or try to rebuild things when reporting/uploading.

One thing about GHA: it's rather hard to share data between multiple workflows, but between tasks is pretty decent.

Getting all this to place nicely with GHA cache can be a little annoying, but is usually worth it... docker's a bear, and rate limits are no fun, so if important base images/layers can be cached, without caching the built product, things should work better than naively pulling every time.

Something that helps a lot for this: a task's need/readiness to be run can be based on the partial contents of a parsed file: what I've ended up doing is parsing a matrix from a workflow YAML as the source of truth for what needs building. These leads to slightly more complex python (handling includes and excludes, for example) but the resulting machine hums pretty well, as adding a new excursion that is combination of existing features can be a one-liner, while changing some unrelated part of the file won't trigger a rebuild of everything.

Once outside of the task planning, it can be useful to just have dodo.py handle caching and looping over lists, and have each step be a small, independently-runnable python/bash script, so it's easy to test Just One Thing.

Happy to help in any way!

@romainx
Copy link
Collaborator Author

romainx commented Dec 27, 2020

@bollwyvl thank you very interesting!
As we said, improving the build is a prerequisite to implement this kind of feature. I've created a dedicated issue for that #1203. Any help is welcome!

@romainx
Copy link
Collaborator Author

romainx commented Jan 7, 2021

Hello, I'm closing this PR for now, since I think it's not the priority and since some prerequisites are not ready. We will see in the future if it's worth supporting multi-arch images.

Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Enhancement A proposed enhancement to the docker images
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants