-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ARM image building for regular PRs #24664
Conversation
4cb85ab
to
0fdf8ad
Compare
Testing execution of in-airlfow-repo PR. |
0fdf8ad
to
af877eb
Compare
af877eb
to
2bc069d
Compare
2bc069d
to
00d827a
Compare
cc: @gmsantos - this will prevent the dramas you faced last week. I am running it from "apache" repo, to make sure it actually works. The nice thing is that for now we are just going to test test it and simply mark the build as failure. Nothing waits for the job so it shoudl just add few minutes of overhead/build time and single arm instance starting/stoppping to build all images (~ 2 minutes per image usually when cache is refeshed). |
Seems it's going to work this time (I had to make the builds run sequentially, but it works now. just fine. |
Nice. All ARM images built in under 10m :). Looks green. |
BTW. This one is crucial for those of us who use M1 on a daily basis and rely on the images continue to work for them @dstandish @bbovenzi 👀 |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
@potiuk, just want to double check there is no security/resource considerations here with us spinning up ARM machines for normal, non-committer PRs, right? |
Good questions. I ❤️ that you asked them :). Here are my explanations and way how it is protected, but I would love a challenge/review and confirming that my thinking is right. Security is a teamwork and I don't trust myself (ultimately), so I love someone can verify my apprach. SecurityThe ARM steps are executed in 2 cases: a) build image workfow. Those are not "directly" affected by PR changes - the command will only run the "Dockerfile RUN commands" inside the ARM docker engine - Dockerfile is the only user-provided code that is executed during that build - and it is only executed as part of "docker build" (which runs inside the docker builder). You'd have to escape the container to get into the ARM instance. This is basically the same (or lower) risk we have currently - the Dockerfile RUN commands are executed inside the docker-engine of the S3 AMD instance we run. It's actually a bit lower because the ARM instances do not run the "runner" and the runner - it's a "bare" ARM instance with docker engine running with SSH-port-forwarded connection to it. b) if you run PR from a branch in "apache" repo (like I did in this PR) - then the ARM instance will run in the CI worklfow. This is controlled by The reason for having those duplicate "build" steps in CI workflow (in case of PRs from apache/airflow repo) is that I can actually test those ARM builds. When they are run from "CI" workflow, they run with "my code", when they are run in "build" workflow - they are run with "main" code and I only can test them when I merge them (which is main reason some of the workflows need a few merges to finally succed - as I have literally no way to test them before merge. For example the "build" workflow Start ARM instance/Stop ARM instance had never been tested, because the only way I can test it is by merging the code to main. CostThose ARM machines have auto-destruction built in - no matter if succeded or failed: 50 minues self-kill-switch and "cancel" is run always: https://github.com/apache/airflow/pull/24664/files#diff-b803fcb7f17ed9235f1e5cb1fcd2f5d3b2838429d4368ae4c57ce4436577f03fR417 - so even if someone presses cancel too many times or in case build "hangs" - we got maximum ~ 1hr of instance run. The instances have maximum bid spot price set for requesting the instance. It is set to 0.1 USD /HR. Curent spot price in Ohio: I personally think it's worth it as it saves a lot of our contributors from headaches on M1s - and this is not academic - this has already happened #24635 and it was couple of hours lost for a few people. We have currently no protection mechanisms to prevent it from happening again - and taking into account that a number of deps (including mysql and mssql) do not yet have debian ARM package and fail when building, it is likely to happen again BTW. Probably we loose way more but not merging some optimisations which are currently in the pipeline (for example this one #24666 - I have not assesed that one because it is far more difficult but I would guess it allows to save ~ 3 minutes of build time x2 for AMD instances at 0.08 USD /hr per version in ~ And I already saved quite a lot and more : #24580 (and more to come with more breeze conversion) so I am on a cost-and-build-saving spree here. |
00d827a
to
8731b00
Compare
BTW. @jedcunningham - by writing this answer and looking at the workflows I realized that there is a potential for a small optimization (of cost). I added additional "needs" for the "arm" builds - so that they are executed as late as possible. Previously they were executed always. bu in this case if either regular CI or PROD will fail, it will not be executed. Small cost reduction, but sttill a cool one. |
Actually @jedcunningham - after sleeping over it, it seems we can furter cut down the overhead and cost. It seems that in regular PRs we only need to build it when some depdnencies change. Generally there is no reason to breaj the ARM docker build with the incoming PR unless your change contains some changes to dependencies - which we already detect by "upgradeToNewerDepeendencies" flag. So we can further reduce the overhead/cost by:
I'd guess this will be ~ 5% of the previous estimation - so some 1.5 USD/month overhead :). I think we can afford it |
8731b00
to
79d3f00
Compare
Updated with only running stuff when we are upgrading to newer dependencies :). It should be super-optimized this way |
The image building for ARM is currently only done in the main build only to refresh cache, however there are sometimes cases when new dependency (for example #24635) broke ARM image build and it was only discovered after merge. This PR adds extra ARM-based build that should be run after the AMD64 build. It should not influence the depending steps, it should just signal failure of the PR if the ARM image cannot be build.
79d3f00
to
f7b1b03
Compare
Thanks for the detailed reply @potiuk. All sounds good and makes sense to me. |
The ARM image build introduced in apache#24664 had problem with build image that was additionally checking for arm images which were moved out to a spearate step
The ARM image build introduced in #24664 had problem with build image that was additionally checking for arm images which were moved out to a spearate step
The image building for ARM is currently only done in the main build only to refresh cache, however there are sometimes cases when new dependency (for example apache#24635) broke ARM image build and it was only discovered after merge. This PR adds extra ARM-based build that should be run after the AMD64 build. It should not influence the depending steps, it should just signal failure of the PR if the ARM image cannot be build. (cherry picked from commit 2fbd750)
The ARM image build introduced in apache#24664 had problem with build image that was additionally checking for arm images which were moved out to a spearate step (cherry picked from commit 5321577)
The image building for ARM is currently only done in the main build
only to refresh cache, however there are sometimes cases when
new dependency (for example #24635) broke ARM image build and it
was only discovered after merge.
This PR adds extra ARM-based build that should be run after
the AMD64 build. It should not influence the depending steps,
it should just signal failure of the PR if the ARM image cannot
be build.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named
{pr_number}.significant.rst
, in newsfragments.