Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building multi-platform images and add linux/arm64 #50

Merged
merged 15 commits into from
Dec 16, 2022

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Jun 14, 2022

Description of proposed changes

Currently, the Docker images are built for linux/amd64 only. These changes build the Docker images for both linux/amd64 and linux/arm64.

This will substantially improve the performance of the Docker runtime for Apple silicon users, as it no longer needs to use slow Intel architecture emulation. A major drawback is the increased build time since building the linux/arm64 image on a linux/amd64 GitHub Actions runner requires emulation. However, IMO the benefit to users outweighs the increased build-time on our end, which can be improved in the future.

Cross-compilation (#98) is a major build-time improvement on top of this PR to be considered separately.

Changes made outside of PR

  • Added docker/setup-qemu-action@* to allowed third party actions under @nextstrain

Related issue(s)

Testing

  • Checks pass
  • Multi-arch images for the PR branch are available on Docker Hub for nextstrain/base and nextstrain/base-builder
  • Run zika-tutorial on linux/arm64 image "natively" using M1 Mac (54s run time)
  • Run zika-tutorial on linux/amd64 image natively using Linux on GitHub Codespace (50s run time)
  • Run ncov-tutorial on linux/arm64 image "natively" using M1 Mac (2m 23s run time)
  • Run ncov-tutorial on linux/amd64 image natively using Linux on GitHub Codespace (3m 6s run time)
  • test linux/arm64 image without emulation (all my M1 Macs have Rosetta enabled, and it isn't easy to disable. Not doing this since this feature is targeted towards other Apple silicon users which should also have Rosetta enabled.

Summary of comment threads

  • Local registry? (ref)
  • Self-hosted runners to avoid emulation slowdowns? (ref)
  • Cross-compilation? (ref)
  • Emulation of pre-built binaries (ref)
  • Compiling/runtime issues in arm64 build

@victorlin victorlin self-assigned this Jun 14, 2022
@victorlin
Copy link
Member Author

victorlin commented Jun 14, 2022

Comparing to the failure in #48 (using build-push-action), which points to:

RUN make -f Makefile.AVX.PTHREADS.gcc # AVX should be widely-supported enough

and makes sense: #48 (comment)

The failure in this PR (using docker buildx build) is different:

error: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/bash -c apt-get update && apt-get install …

Not sure what's the reason for the discrepancy This was before adding 63118ea.

@victorlin victorlin mentioned this pull request Jul 7, 2022
1 task
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch from 7daea6d to 3f2b6d8 Compare August 8, 2022 20:33
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch from 3f2b6d8 to 76481e0 Compare August 25, 2022 22:57
Copy link
Member Author

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some progress today. Will investigate more another day.

Dockerfile Outdated Show resolved Hide resolved
devel/build Outdated Show resolved Hide resolved
devel/build Outdated Show resolved Hide resolved
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch 2 times, most recently from ac7f2df to 28bcb6f Compare September 1, 2022 20:19
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch 4 times, most recently from 1bdd697 to ecac840 Compare September 2, 2022 18:10
.github/workflows/ci.yml Outdated Show resolved Hide resolved
devel/build-to-localhost Outdated Show resolved Hide resolved
devel/localhost-to-dockerhub Outdated Show resolved Hide resolved
devel/build-to-localhost Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch from 9ca46fb to 401506e Compare September 14, 2022 19:17
.github/workflows/ci.yml Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch 3 times, most recently from 116ad4e to 16a474e Compare September 19, 2022 18:59
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch from 16a474e to ff2edf1 Compare September 27, 2022 22:07
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@victorlin victorlin mentioned this pull request Sep 28, 2022
1 task
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch 2 times, most recently from 7fa55a4 to 53d2356 Compare September 29, 2022 23:52
Copy link
Member

@tsibley tsibley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/ nicely done. I've read thru all the latest commits again and it looks great. One small typo fix and a bunch of non-blocking comments (though I do think some of thing should be addressed).

README.md Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
builder-scripts/get-nextclade-platform Outdated Show resolved Hide resolved
builder-scripts/get-nextclade-platform Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch 2 times, most recently from 4780de1 to 6b305c7 Compare December 15, 2022 19:27
This implies using linux/amd64 since that is the sole target platform.
The repo contents of RAxML 8.2.12 do not support compiling on arm64. A
few immediate changes are required, so put them in a fork repo.

It's also not likely that these changes will make it back to the
official RAxML repo with a version release, since development efforts
have moved to RAxML-NG¹.

¹ https://groups.google.com/g/raxml/c/1cRSPXcZa1o/m/SakFeA3OAgAJ
These variables, useful for multi-platform builds¹, will be used in
subsequent commits.

¹ https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/
Future commits should aim to remove these by either (1) downloading
binaries dynamically based on TARGET* variables¹ or (2) building from
source.

Keeping the note for a package would be a last resort if neither option
is feasible.

¹ https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch from 258050a to cd87a3a Compare December 15, 2022 20:47
@victorlin
Copy link
Member Author

victorlin commented Dec 15, 2022

Planning to merge after CI passes and I successfully re-run through manual testing with the updated test image.

@victorlin
Copy link
Member Author

Update: the push-triggered CI failed on the build job with a seemingly transient network error. It's been >3 hours and still running. PR-triggered CI took >4 hours on ./devel/build. Looking at the raw logs, the long build times seem related to #115 (a few notable lines with timestamps pasted below):

2022-12-15T21:22:55.0645526Z #104 [linux/arm64 builder 44/47] RUN /builder-scripts/download-repo https://github.com/nextstrain/auspice release .  && npm update && npm install && npm run build && npm link
2022-12-15T21:51:04.8952185Z #104 1689.9 added 1528 packages, and audited 1529 packages in 28m
2022-12-15T21:55:10.3682075Z #104 1935.4 > node auspice.js build --verbose
2022-12-15T22:55:13.8493368Z #104 5538.7 changed 1 package, and audited 1529 packages in 1h
2022-12-15T22:55:39.0958178Z #104 5564.0 > node auspice.js build --verbose
2022-12-16T00:48:18.3439466Z #104 12323.4 added 1 package, and audited 3 packages in 58m
2022-12-16T00:48:18.8022211Z #104 DONE 12323.7s

I believe this is because Node.js 14 only came in amd64 flavors and thus somehow ran "natively" on the build platform, even with the implicit --platform=$TARGETPLATFORM. Now with Node.js 16 (which supports arm64 flavors), it is running with emulation which is much slower.

My opinions:

  • 1 hour to build is somewhat acceptable, but 4 hours is not.
  • There may be a slight runtime improvement with Node.js 14 → 16, but not enough to justify the drawbacks of increased build time.
  • This Node.js upgrade is more fit to come alongside #98.

Based on the above points, my planned approach:

  1. Revert back to Node.js 14 in this PR for an acceptable build time.
  2. Confirm that upgrading to Node.js 16 works in #98 without negatively impacting build time.
  3. Merge this PR.
  4. The Node.js 16 upgrade will happen with merge of #98.

Nextclade/Nextalign v2 comes with pre-built binaries for multiple
platforms. Use TARGETPLATFORM¹ to construct URLs for each OS+arch
combination. Put logic in a script to reduce Dockerfile hacking and
maintain readability.

¹ https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope
For pre-built binary downloads, previous commits used platform-specific
downloads when available. Otherwise, they must either (1) use emulation
or (2) be built from source on the target platform for optimal
performance.

For now, I'll go with (1) since this linux/arm64 support is targeted at
Apple silicon users which have easy access to amd64 emulation. (2) comes
with the cost of extra dev time to figure out building for each program,
plus additional build time that should be considered on a case-by-case
basis.
This Augur dependency has pre-built binaries for amd64, but does not for
arm64. Building requires additional system deps and a special
environment variable when built via `pip install`.
This is prepping for the option of building with multiple platforms in a
future commit, while keeping the option to build with a single platform
locally.
Add extra notes to encourage platform-aware changes.
Node.js v16 is very slow when running in an arm64 Docker build on an amd64
build platform.

Node.js v14 runs in the build without emulation. It uses emulation in
the Docker container during run time, but the slowness there is not
nearly as significant as the build time increase due to emulation.

This should be reverted if/when cross-compilation is used, i.e. Node.js
v16 can run natively on the build platform while installing packages for
a different target platform.

This reverts commit 1afd07d.
@victorlin victorlin force-pushed the victorlin/add-arm64-keep-scripts branch from 8cebb0d to 3b53f7a Compare December 16, 2022 14:57
@tsibley
Copy link
Member

tsibley commented Dec 16, 2022

@victorlin Sounds like a good plan. From the log snippet, it seems we're running

node auspice.js build --verbose

twice? Once as part of npm install (since the prepare lifecycle script in package.json is npm run build) and once explicitly as npm run build in our Dockerfile RUN line. We can drop the latter then, right?

@victorlin victorlin merged commit 960fb5b into master Dec 16, 2022
@victorlin victorlin deleted the victorlin/add-arm64-keep-scripts branch December 16, 2022 22:12
@victorlin
Copy link
Member Author

victorlin commented Dec 16, 2022

@tsibley that sounds right! I'll do it in a separate PR: #120

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Build multi-arch image with amd64 + arm64 for M1 Macs
3 participants