Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: publish arm64 image in release #6758

Closed
wants to merge 21 commits into from

Conversation

dragoneena12
Copy link
Contributor

Summary

Closes #4211

I added a function to publish arm64 docker image in release workflow using docker/build-push-action.

I tested this workflow with my repo and docker hub.
https://github.com/dragoneena12/argo-cd/actions/runs/1045272993
https://hub.docker.com/repository/docker/dragoneena12/argocd

This should be useful for argocd users who use arm64 k8s clusters like raspi4 or M1Mac.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).

Signed-off-by: Ayato Tachibana <[email protected]>
@codecov
Copy link

codecov bot commented Jul 20, 2021

Codecov Report

Merging #6758 (71d5fae) into master (0c75753) will increase coverage by 0.30%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6758      +/-   ##
==========================================
+ Coverage   41.17%   41.48%   +0.30%     
==========================================
  Files         161      163       +2     
  Lines       21637    21960     +323     
==========================================
+ Hits         8910     9111     +201     
- Misses      11456    11558     +102     
- Partials     1271     1291      +20     
Impacted Files Coverage Δ
util/app/discovery/discovery.go 46.55% <0.00%> (-29.45%) ⬇️
util/env/env.go 61.53% <0.00%> (-22.68%) ⬇️
util/argo/resource_tracking.go 69.62% <0.00%> (-8.16%) ⬇️
reposerver/repository/repository.go 58.40% <0.00%> (-2.55%) ⬇️
pkg/apis/application/v1alpha1/types.go 54.99% <0.00%> (-2.53%) ⬇️
util/db/repository.go 38.46% <0.00%> (-1.54%) ⬇️
util/helm/cmd.go 28.65% <0.00%> (-1.23%) ⬇️
server/application/application.go 32.18% <0.00%> (-0.63%) ⬇️
cmd/argocd/commands/admin/app.go 30.50% <0.00%> (-0.40%) ⬇️
server/cluster/cluster.go 22.51% <0.00%> (-0.36%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c75753...71d5fae. Read the comment docs.

@mylesagray
Copy link

Am I correct in my assumption and testing that this doesn't actually build the binaries inside the images for ARM64 and instead just relies on qemu to do the translation?

When using your v2.0.4 image to test, I see massive performance issues relating to this, unlike when using @alinbalutoiu's v2.0.3 image: https://hub.docker.com/layers/alinbalutoiu/argocd/v2.0.3/images/sha256-1f705c02255a2807a55e1acb5555f44bb79f68a4c7902f3345873f5a2511461b?context=explore which actually builds all bins for ARM64 making it much, much more performant.

@mylesagray
Copy link

Another update, it seems @alinbalutoiu built v2.0.4 and v2.0.5 as well as v2.1.0-rc2 yesterday as Renovate on my gitops repo picked up the image tags and auto merged them.

When using those images (builds here: https://github.com/alinbalutoiu/argo-cd/blob/v2.0.5/.drone.yml) operations are much quicker and comparing the processes in top confirms that the builds from his containers execute the argocd binaries natively on arm64, while the ones in this PR are all run through qemu-x86_64-static.

If we could get @alinbalutoiu drone pipeline and Dockerfile changes integrated into GitHub actions we will be 99% the way there to native ArgoCD ARM64 builds.

@alinbalutoiu
Copy link
Contributor

Thanks for tagging me into this @mylesagray , I've built the images yesterday when I noticed your comment regarding performance degradation and wanted to have all images for comparison.

I wasn't aware of the impact using qemu to build the binaries for a different architecture, that's really interesting. I think the ideal solution would be to get a self-hosted github actions runner that runs on an ARM system and add it to this repository.

@mylesagray
Copy link

@alinbalutoiu actually, to clarify - I was referring to the runtime performance.

Because your pipeline actually executes the BUILD_ALL_CLIS make target for building the binaries, it will generate an ARM64 binary for the ARM64 image as expected - meaning it executes natively on an ARM64 host.

However, by way of comparison, this PR doesn't build the CLI for ARM64 and instead just builds AMD64 and relies on qemu on the host running the container to convert all the syscalls making it extremely slow.

The actual building of the binary for ARM64 on an AMD64 host using qemu has no impact on the end product (binary) and that performs as expected.

I think with the recent discussion in #4211 around explicitly setting yarn timeouts, we can safely build the ARM64 bins and images on GH Actions using Docker buildx.

@alinbalutoiu
Copy link
Contributor

Thanks for the detailed explanation @mylesagray

I'm getting a bit confused, setting BUILD_ALL_CLIS will not generate an ARM64 binary, it will only skip building the other binaries for performance reasons, as it takes an enormous amount of time to build them on top of what it already takes to build the ARM image.

The only difference is that the pipeline I'm using runs on a native ARM64 system and does not use Docker buildx to build. I'd expect that if you use Docker buildx to target ARM64 it would have the same behaviour as building on a native system.

@mylesagray
Copy link

mylesagray commented Aug 9, 2021

Sorry, that's my bad on the phrasing.

But I think we're agreeing here, that using either your Drone pipeline, or the Docker buildx pipeline should have no effect on the end binary - it should run natively on ARM64 without any need for qemu on the target host.

However, for whatever reason - this PR's implementation of using buildx results in an amd64 binary being included in the arm64 image, then my hosts must run qemu to convert the syscalls (which is where the bad perf comes from).

As an example - if you deploy ArgoCD using v2.0.5 with your tagged image and view the processes on the host running the pods with top/ps you'll see that the bin is executed natively:

argocd-server .......
argocd-application-controller --status-processors 20 ........

When using the v2.0.4 image created via this PR (dragoneena12/argocd:v2.0.4) and you inspect the host with top/ps you'll see similar to the below for each argo process:

/usr/bin/qemu-x86_64-static argocd-server .......
/usr/bin/qemu-x86_64-static argocd-application-controller --status-processors 20 ........

That's where i'm trying to grok the differences between this pipeline and yours - because I have built ARM64 images with buildx on my laptop (an x86_64 Mac) for ArgoCD and they execute natively, just like yours do.

Slightly off topic, but I noticed you were messing with buildx on your branches but stopped due to the yarn timeouts - is it worth trying to add the change in the PR (https://github.com/argoproj/argo-cd/pull/6758/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R94) to your buildx pipeline and seeing if it fixes that problem?

That would essentially make your changes mergeable with the base repo.

@mylesagray
Copy link

@alinbalutoiu I forked your v2.0.5 branch, included the KSONNET skip if building on ARM in the release.yaml workflow and added the yarn timeout specification as in this PR, and we have a successful build.

https://github.com/mylesagray/argo-cd/runs/3283264164?check_suite_focus=true

Output: https://hub.docker.com/r/mylesagray/argocd/tags?page=1&ordering=last_updated

Comparison across our branches: alinbalutoiu/argo-cd@v2.0.5...mylesagray:v2.0.5

@alinbalutoiu
Copy link
Contributor

@mylesagray Thanks for pointing that one out, I've managed to get it to work in the past without the network-timeout flag for arm64 and amd64, but failing for arm. I'm wondering if that helps.

There is still some work in progress I was doing regarding the multi-arch manifest, so that the tag v2.0.5 would work for example on arm/arm64/amd64 without requiring changes to the tag name.

Happy to try to finish it off and open a PR.

@mylesagray
Copy link

@alinbalutoiu if you have a branch you're working off of, let me know and I can help - I contributed the same thing for argocd-notifications:

argoproj-labs/argocd-notifications#184

Then I noticed you were working on argocd so I didn't pursue it :)

As far as I can see, there was a previous attempt to address this here, and there were concerns about image size: #5284 (comment)

So we should keep that in mind when drafting.

@jannfis
Copy link
Member

jannfis commented Aug 10, 2021

Thanks @dragoneena12 for your contribution and the effort you put in this. However, we had evaluated the buildx approach for Argo CD as well. But it is slow. Very slow. Argo CD is a different beast, there's lots of code to compile, not only Go but also all the yarn/npm stuff. On a virtual machine, it takes >90 minutes to build an image for Arm64 with buildx, and I think it's similar on the default GitHub actions runners.

I think the only way to go forward to provide Arm64 images is - as was mentioned above - to have a native build environment that we could use for building the images, and also to run unit- and end-to-end tests on. It's one thing to build the images, but tbh, I feel uncomfortable with providing official cross-compiled images that we - as a project - have no means to run & test at all.

@mylesagray
Copy link

@jannfis - FWIW, we are usually seeing 40-45mins for ARM64 image builds on the GitHub actions default runners:

https://github.com/mylesagray/argo-cd/runs/3283264164?check_suite_focus=true

I realise this doesn’t address your secondary concern, which is running unit and e2e tests, if we were able to prove out running those tests under buildx as well would you be open to this contribution?

The major blocker I see if we don’t go the buildx route is the need for dedicated ARM hardware, either in cloud or otherwise and that adds a cost and maintenance overhead to the project for offering a feature that is largely aimed at lab environments - which to me is hard to justify - maybe you as the project maintainers feel differently about that.

I think @dragoneena12, @alinbalutoiu and my experiments were more to get something out there that is MVP and is “good enough” for lab users because I wouldn’t (yet) be expecting people to run on ARM in prod environments.

Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so after thinking this through for a while, I came to the conclusion that we can try to build arm64 images using cross compile (i.e. buildx).

But we should decouple it from our release workflow, so my request for change is the following:

This will delay availability of arm64 images by some minutes after a release, but I think we should keep the release workflow as lean (and fast) as possible.

@dragoneena12
Copy link
Contributor Author

@mylesagray @alinbalutoiu @jannfis Hi, thank you for discussing this PR!

However, for whatever reason - this PR's implementation of using buildx results in an amd64 binary being included in the arm64 image, then my hosts must run qemu to convert the syscalls (which is where the bad perf comes from).

Thanks for pointing that out. I didn't expect this kind of behavior. Then I tested my image but I cannot find processes running with qemu. Maybe you have the wrong tag name since my arm64 build image name is dragoneena12/argocd:v2.0.4-arm64.

There is still some work in progress I was doing regarding the multi-arch manifest, so that the tag v2.0.5 would work for example on arm/arm64/amd64 without requiring changes to the tag name.

I gave up pushing multi-arch manifest because quay.io does not support multi-arch manifest. Isn't it outdated?

But we should decouple it from our release workflow, so my request for change is the following

I think this is a good idea too. I will fix this PR to decouple workflow soon. Thank you!

@mylesagray
Copy link

@dragoneena12 I think quay.io does support multi-arch images, for example:

https://quay.io/repository/openshift-examples/multi-arch?tab=tags
https://quay.io/repository/prometheus/prometheus?tab=tags

And you were indeed correct @dragoneena12, what a rookie mistake - I assumed you were using multi-arch tags on Docker Hub, I didn't even check! That is why I was seeing amd64 bins running on my hosts instead of arm64 - my bad!

If you need help with the multi-arch manifests, I've done work with them in the past and it seems that @alinbalutoiu has some working GH Actions code for this already.

@alinbalutoiu
Copy link
Contributor

@dragoneena12 I haven't check for quay.io, but this should be working fine https://github.com/alinbalutoiu/argo-cd/blob/github-actions-arm/.github/workflows/release.yaml#L235-L260

However as the workflow needs to be separated, I think we can push the multi-arch manifest from the ARM workflow. (given that the amd64 one is tagged differently)

@geowalrus4gh
Copy link

eagerly waiting for this feature

Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
@dragoneena12
Copy link
Contributor Author

@mylesagray
Since arm64 images are not tested by the project team, I hesitate to provide multiarch images by default tag name because users can use these images without realizing they are not tested.
Maybe we should put this explanation in some documentation but I don't know where it is suited for.
However, I think this kind of tagging is useful for most users too.

@mylesagray
Copy link

@jannfis Any thoughts on the image tagging or layout of the jobs here? It seems to address most of the concerns from further up in the PR and would be a good first step.

If we want to migrate tags in future, we can work on a longer term solution which includes getting e2e and units running on dedicated ARM hardware.

@tbbharaj
Copy link

eagerly waiting for this feature

Can't agree anymore

@agustinvinao
Copy link

eagerly waiting for this feature

Thanks for the effort to get this done.

Any updates on this? I'm counting the days to get this available

@JCzz
Copy link

JCzz commented Sep 21, 2021

Will this make it possible to run on arm64/m1 k8s cluster, and is there any progress?

Thanks

@btrepp
Copy link

btrepp commented Sep 27, 2021

Would be great to see some of these published.

Have hacked together an install using

global:
          image:
            repository: alinbalutoiu/argocd
            tag: "v2.0.4"

in a helm chart, but would be great to move forward to 2.1/2.2. This has worked pretty well for me getting to know/use argocd. I didn't encounter any issues related to it being arm64.

Is there pre-release binaries based of this branch available?. I would be happy to test them.

On the concerns around providing them untested, i would be fine overriding the tag to v2.1.2-arm64-untested, or something, and letting users know in documentation that "here be dragons"?.

@akhenakh
Copy link

akhenakh commented Sep 28, 2021

If anyone wants to build a recent version, grab 2.1.2 sources and build for arm64, from a arm64 host (rpi or Mac M1 ...)

Better doc than mine #4211 (comment)

@alinbalutoiu
Copy link
Contributor

alinbalutoiu commented Sep 28, 2021

Unfortunately the CI which provides ARM systems for building has some issues with the latest Ubuntu 21.04 docker images (it requires Docker to be updated on the build systems).
Currently can't build the ArgoCD v2.1.0 and above using the same CI, going to have to find a replacement for it 😞

Building the latest image should be pretty straightforward though as shown in the comment above.

@jannfis
Copy link
Member

jannfis commented Oct 5, 2021

Hey @dragoneena12, I'm utterly sorry for coming back so late to this again. Please accept my apologies. We very much appreciate the work you've been putting in this.

I'm still opposing to put this directly into the release workflow. Instead of workflow_run, maybe we can just trigger the workflow to build & publish ARM images with a trigger on the push of a tag which matches v* pattern, similar to how the release workflow itself is triggered - just that the release workflow is triggered by a tag matching release-* pattern.

General workflow would then be:

  • Someone (maintainer) triggers a new release by pushing a release-vX.Y.Z tag to the repo
  • Release workflow goes ahead and creates the release. During this process, the workflow creates a vX.Y.Z tag in the repo
  • The creation of the vX.Y.Z tags triggers the workflow to build & publish ARM image. Once the tag is pushed by the release workflow, we can safely assume that the release actually will happen.
  • The ARM workflow will know the version to produce by looking at the tag that triggered the workflow

Does that sound viable?

@dragoneena12
Copy link
Contributor Author

@jannfis Hello, I'm glad to hear your reply!

I tried to create proposed workflows. However, it doesn't work because workflows cannot trigger another workflow by using GITHUB_TOKEN (https://docs.github.com/en/actions/learn-github-actions/events-that-trigger-workflows#triggering-new-workflows-using-a-personal-access-token ).
To trigger multiarch workflow from release workflow, we have to push v* tag with a PAT. Is it possible to provide the PAT for these workflows?

@jannfis
Copy link
Member

jannfis commented Oct 6, 2021

I think what's passed as token here is already the PAT of the argo-bot user, and is just named in a misleading way as secrets.GITHUB_TOKEN.

For example, if you look at this commit done by the release workflow, it uses credentials of argo-bot.

@dragoneena12
Copy link
Contributor Author

@jannfis
I think this workflow set only user.email and user.name of git, and uses auto-generated secrets.GITHUB_TOKEN to push the tag.
My test release workflow creates test tag v2.1.100 by argo-bot user without PAT of argo-bot, and does not trigger multiarch workflow(dragoneena12@cd06dbf ).

@jannfis
Copy link
Member

jannfis commented Oct 6, 2021

Oh, I see. You might be correct there :) Let me think a while about using a PAT for the release workflow and its implications. Generally I think, this should not be a problem and you can assume using a PAT for further testing.

Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
Signed-off-by: Ayato Tachibana <[email protected]>
@dragoneena12
Copy link
Contributor Author

dragoneena12 commented Nov 11, 2021

@alinbalutoiu
Copy link
Contributor

I've just noticed that Argo should have self-hosted Github Actions ARM runners. 🙌
I don't think we'll need to do the whole trick with QEMU and Docker any longer with those runners.

@agorgl
Copy link

agorgl commented Dec 8, 2021

Is there a plan on releasing prebuilt 32-bit ARM (armv7l) builds / images with this too?

@NPastorale
Copy link

Has this been superseded by #8108? As both seem to tackle the same issue and this one has been without activity for some time now.

@34fathombelow
Copy link
Member

@dragoneena12 can this PR be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cross Build Images for ARM