Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert selective checks to Breeze Python #24610

Merged
merged 1 commit into from
Jun 25, 2022

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jun 22, 2022

Instead of bash-based, complex logic script to perform PR selective
checks we now integrated the whole logic into Breeze Python code.

It is now much simplified, when it comes to algorithm. We've
implemented simple rule-based decision tree. The rules describing
the decision tree are now are now much easier
to reason about and they correspond one-to-one with the rules
that are implemented in the code in rather straightforward way.

The code is much simpler and diagnostics of the selective checks
has also been vastly improved:

  • The rule engine displays status of applying each rule and
    explains (with yellow warning message what decision was made
    and why. Informative messages are printed showing the resulting
    output

  • List of files impacting the decision are also displayed

  • The names of "ci file group" and "test type" were aligned

  • Unit tests covering wide range of cases are added. Each test
    describes what is the case they demonstrate

  • breeze selective-checks command that is used in CI can also
    be used locally by just providing commit-ish reference of the
    commit to check. This way you can very easily debug problems and
    fix them

Fixes: #19971


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Jun 22, 2022

This is the next step in Breeze conversion. This time -800 lines of Bash Code (replaced with Python but including lots of unit tests) which is far more readable and easy to reason about.

Plus very clear diagnosticts of what decisions are made and why, including breeze static-check tool that can be used locally to debug and reproduce selective check behaviour with any local commit.

Regular PR that triggers all tests (core change) but does not run any Helm/K8S/UI javascript tests

image

More complex change (this one) that trigers all possible tests because environment changed:
image

Fragment of more complex change (amazon + cncf.kubernetes) that triggers only subset of unit tests ("Always", "Providers") bit also all our K8S changes:

image

Tests are pretty comprehensive and we will be able to easily test and add any new cases in the future (especially when we split providers and want to make more fine-grained decisions which tests to run.

@potiuk
Copy link
Member Author

potiuk commented Jun 22, 2022

cc: @edithturn @Bowrna : -800 lines of bash less and much easier "logic" to grasp for selective checks implementation

@edithturn
Copy link
Contributor

I wouldn't have thought of all those changes, really impressive @potiuk.

@potiuk
Copy link
Member Author

potiuk commented Jun 22, 2022

I wouldn't have thought of all those changes, really impressive @potiuk.

Just Python is way easier to write things better :)

@potiuk potiuk force-pushed the add-selective-check branch 3 times, most recently from 4aa85b2 to 5407a01 Compare June 23, 2022 15:33
@potiuk
Copy link
Member Author

potiuk commented Jun 23, 2022

BTW. By converting the checks to Python I also found that current selectiuve checks were not "selective enough" as I missed the cases where "Helm" tests are run unnecessarily in quite a number of cases (For example they we run when only providers were modified - this should speed up PRs from contributors who only modified one or more providers and did not touch the core).

@potiuk potiuk force-pushed the add-selective-check branch 2 times, most recently from 7ef286d to f2215d3 Compare June 23, 2022 20:37
SELECTIVE_CHECKS.md Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the add-selective-check branch 2 times, most recently from 04ca504 to f123eb9 Compare June 24, 2022 18:41
Instead of bash-based, complex logic script to perform PR selective
checks we now integrated the whole logic into Breeze Python code.

It is now much simplified, when it comes to algorithm. We've
implemented simple rule-based decision tree. The rules describing
the decision tree are now are now much easier
to reason about and they correspond one-to-one with the rules
that are implemented in the code in rather straightforward way.

The code is much simpler and diagnostics of the selective checks
has also been vastly improved:

* The rule engine displays status of applying each rule and
  explains (with yellow warning message what decision was made
  and why. Informative messages are printed showing the resulting
  output

* List of files impacting the decision are also displayed

* The names of "ci file group" and "test type" were aligned

* Unit tests covering wide range of cases are added. Each test
  describes what is the case they demonstrate

* `breeze selective-checks` command that is used in CI can also
  be used locally by just providing commit-ish reference of the
  commit to check. This way you can very easily debug problems and
  fix them

Fixes: apache#19971
@potiuk potiuk force-pushed the add-selective-check branch from f123eb9 to 00748eb Compare June 25, 2022 00:14
@potiuk
Copy link
Member Author

potiuk commented Jun 25, 2022

All Green!

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 25, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit d7bd72f into apache:main Jun 25, 2022
@potiuk potiuk deleted the add-selective-check branch June 25, 2022 07:46
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 25, 2022
When apache#24610 was implemented I missed the label-when-reviewed workflow
potiuk added a commit that referenced this pull request Jun 25, 2022
When #24610 was implemented I missed the label-when-reviewed workflow
potiuk added a commit that referenced this pull request Jun 25, 2022
Selective checks docs have been moved to breeze as part of #24610
but some of the references were still left.

This PR cleans it up.
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 29, 2022
Instead of bash-based, complex logic script to perform PR selective
checks we now integrated the whole logic into Breeze Python code.

It is now much simplified, when it comes to algorithm. We've
implemented simple rule-based decision tree. The rules describing
the decision tree are now are now much easier
to reason about and they correspond one-to-one with the rules
that are implemented in the code in rather straightforward way.

The code is much simpler and diagnostics of the selective checks
has also been vastly improved:

* The rule engine displays status of applying each rule and
  explains (with yellow warning message what decision was made
  and why. Informative messages are printed showing the resulting
  output

* List of files impacting the decision are also displayed

* The names of "ci file group" and "test type" were aligned

* Unit tests covering wide range of cases are added. Each test
  describes what is the case they demonstrate

* `breeze selective-checks` command that is used in CI can also
  be used locally by just providing commit-ish reference of the
  commit to check. This way you can very easily debug problems and
  fix them

Fixes: apache#19971
(cherry picked from commit d7bd72f)
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 29, 2022
…e#24651)

When apache#24610 was implemented I missed the label-when-reviewed workflow

(cherry picked from commit 2703874)
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 29, 2022
Selective checks docs have been moved to breeze as part of apache#24610
but some of the references were still left.

This PR cleans it up.

(cherry picked from commit aa8cd30)
@ephraimbuddy ephraimbuddy added this to the Airflow 2.3.3 milestone Jun 30, 2022
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jun 30, 2022
@edithturn
Copy link
Contributor

It was merged, well done @potiuk Jarek 🥳

potiuk added a commit to potiuk/airflow that referenced this pull request Jul 6, 2022
After implementing apache#24610 and few follow-up fixes, it is now easy
to add more optimizations to our unit test execution in CI (and
to give this capability back to our contributors).

This PR adds capability of running tests for selected set of
providers - not for the whole "Providers" group. You can
specify `--test-type "Providers[airbyte,http]" to only run tests
for the two selected providers.

This is the step towards separating providers to separate
repositories, but it also allows to optimize the experience of
the contributors developing only single provider changes (which
is vast majority of contributions).

This also allows to optimize build and elapsed time needd to run
tests for those PRs that only affects selected providers (again -
vast majority of PRs).

The CI selection of which provider tests is done now in Selective
Checkcs - they are a bit smarter in just selecting the providers
that has been changed, they also check if there are any other
providers that depend on it (we keep automatically updated by
pre-commit dependencies.json file and this file determines
which files should be run.
potiuk added a commit that referenced this pull request Jul 6, 2022
After implementing #24610 and few follow-up fixes, it is now easy
to add more optimizations to our unit test execution in CI (and
to give this capability back to our contributors).

This PR adds capability of running tests for selected set of
providers - not for the whole "Providers" group. You can
specify `--test-type "Providers[airbyte,http]" to only run tests
for the two selected providers.

This is the step towards separating providers to separate
repositories, but it also allows to optimize the experience of
the contributors developing only single provider changes (which
is vast majority of contributions).

This also allows to optimize build and elapsed time needd to run
tests for those PRs that only affects selected providers (again -
vast majority of PRs).

The CI selection of which provider tests is done now in Selective
Checkcs - they are a bit smarter in just selecting the providers
that has been changed, they also check if there are any other
providers that depend on it (we keep automatically updated by
pre-commit dependencies.json file and this file determines
which files should be run.
potiuk added a commit that referenced this pull request Jul 21, 2022
After implementing #24610 and few follow-up fixes, it is now easy
to add more optimizations to our unit test execution in CI (and
to give this capability back to our contributors).

This PR adds capability of running tests for selected set of
providers - not for the whole "Providers" group. You can
specify `--test-type "Providers[airbyte,http]" to only run tests
for the two selected providers.

This is the step towards separating providers to separate
repositories, but it also allows to optimize the experience of
the contributors developing only single provider changes (which
is vast majority of contributions).

This also allows to optimize build and elapsed time needd to run
tests for those PRs that only affects selected providers (again -
vast majority of PRs).

The CI selection of which provider tests is done now in Selective
Checkcs - they are a bit smarter in just selecting the providers
that has been changed, they also check if there are any other
providers that depend on it (we keep automatically updated by
pre-commit dependencies.json file and this file determines
which files should be run.

(cherry picked from commit 3dedbd3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Rewrite selective check script in Python
5 participants