-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Selective Check in Python #22327
Rewrite Selective Check in Python #22327
Conversation
@potiuk I created two different files: scc_get_changed_files.py and selective_ci_checks.py and in ci.yaml I am just calling to selective_ci_checks.py.
Both are "merged" why Direct Push doesn't generate a commit SHA? On the other hand, could you explain to me the selected part of this script? airflow/scripts/ci/selective_ci_checks.sh Lines 154 to 155 in 0ec5677
Thanks in advance! This is a great task, I am getting fun 💃🏼 |
It also generates commit hash, but we do not care about it. For Direct Push/Merge we always want to run all posible tests so in this case selective check turns into "run everything possible".
Yep. We have it in all cases (because we are using it to assign the "tag" to images we build). But in case of Pull Request Github prepares a "merge commit" containing only changes coming from the PR, and the commit HASH we have is this particular commit.
This is just a way how you can get the list of files when you get an incoming commit:
So what we get at the end is the list of changed files between the "parent of commit" and "commit" that is incoming from the PR. |
The For example All the "commit-ish" references can be found here: https://mirrors.edge.kernel.org/pub/software/scm/git/docs/gitrevisions.html#_specifying_revisions BTW. one of the great things in our "rebase" workflow is that we have no merge commits. Every commit (after pull request is merged) has exactly one parent, so it makes all the reasoning about changes WAY simpler (as you do not have commits with multiple parents, you know exactly which files has changed by making a diff between the commit and it's parent). |
9ea3f67
to
678cc39
Compare
Thanks, @potiuk. I am retrieving all the files that are modified in the PR. Let me know now if the statement bellow is in the right way. Process: 1. First Python Call: Build image Also, I created another step in ci.yaml for selective_ci_checks.py https://github.com/apache/airflow/runs/5619304320?check_suite_focus=true#step:6:587 |
Good start. I think there are lots of good things there :). One thing though I thin we should improve
This one (based on the checks) should either print :
And so on - basically all the parameters that determine what should be the matrix of tests executed. Then the next step will be to build sther commands:
But we can do it gradually. The I hope it makes sense :) |
b5e509e
to
792927b
Compare
@potiuk I understood the idea. Thanks! I tried to make the first and second points. Created the entrypoint for "selective-checks-python" However, for freespace we need to install "Setup python" and "python -m pip install --editable ./dev/breeze/" which are installed in another job. In the case of "selective check python", I added this dependency to make the entrypoint work. And I am getting "permission denied" => I tried "--user" and "venv", they are not working. build-info is the first job in ci.yaml, for that reason it doesn't have all the inputs necessary to build things. Is this right? |
752403d
to
877b0b7
Compare
@potiuk I have a doubt about how to send the parameter "${GITHUB_SHA}" to the entry point in ci.yaml, I was thinking something like this: What would be the best here? |
add --github-sha as Click parameter with GITHUB_SHA envvar. That should work. |
b1ba34c
to
ed6fd40
Compare
@potiuk please ignore that this job is failing, I just wanna know if this is the right way about your last feedback: #22327 (comment) Please, can see the code of the structure of the files. I restructured the files I am working on.
|
@@ -0,0 +1,48 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those files should all go to selective_checks
package under ci
package.
@@ -0,0 +1,24 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not need "read_only" - it was only to get more sanity for BASH code, but using it in Python is not needed.
@@ -0,0 +1,71 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need a separate command for "get_changed_files". This is (at least for now) not very useful. But it should be a "utility" function. something that other real "commands" can use.
I think our commands should be centered around the "functionality" we need from selective checks.
Just to explain what I envision:
airflow-selective-checks build-image
This should produce one output:
::set-output name=image-build::true
or
::set-output name=image-build::false
Depends on the set of changed files.
Similarly we should have other commands:
airflow-selective-checks matrix-strategy
This one should produce all the "matrix" components (also as ga-output):
- python, backend, kubernetes etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the above should use "get_changed_files" but as an internal Python method rather than click "command".
faade06
to
43085f6
Compare
9edfa58
to
e0edcca
Compare
@potiuk thank you so much.
|
e0edcca
to
ad3edb8
Compare
Hey @edithturn - After #23193 and #23205 - this should be part of https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/configuration_and_maintenance.py Eventually it turned out that instead of standalone "freespace" etc. incorporating those tools into breeze command seems to be a much better and consistent solution. I think maybe simply let's start "small" -> let's merge a tool first that alllows to print "changed files" list as first step? for example Then I could attempt to split this change into a number of smallers "selective check tasks" to make it easier and possibly parallelise the work. Same with kind :). |
Closing this one, It was worked for Jarek in another issue #24610 (comment) |
closes #19971
Source/guide: