Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync v2-3-stable with v2-3-test to release 2.3.3 #24762

Merged
merged 118 commits into from
Jul 2, 2022
Merged

Conversation

ephraimbuddy
Copy link
Contributor

Time for 2.3.3rc1!

future-taga and others added 30 commits June 10, 2022 13:24
(cherry picked from commit 4daf51a)
The dependency we have for flask-appbuilder oauth authentication
(for github/google authentication) should follow the limits
that flask-appbuilder current version has. We added authlib there
but apparently FAB currently limits authlib to <= 1.0 - we should
follow fab rather than have our own dependency here.

This has been pointed out in
dpgaspar/Flask-AppBuilder#1861

(cherry picked from commit 5674491)
The TimedJSONWebSignatureSerializer has been deprecated from the
itsdangerous library and they recommended to use dedicated
libraries for it.

pallets/itsdangerous#129

Since we are going to move to FAB 4+ with #22397 where newer version of
itsdangerous is used, we need to switch to another library.

We are already using PyJWT so the choice is obvious.

Additionally to switching, the following improvements were done:

* the use of JWT claims has been fixed to follow JWT standard.
  We were using "iat" header wrongly. The specification of JWT only
  expects the header to be there and be valid UTC timestamp, but the
  claim does not impact maturity of the signature - the signature
  is valid if iat is in the future.
  Instead "nbf" - "not before" claim should be used to verify if the
  request is not coming from the future. We now require all claims
  to be present in the request.

* rather than using salt/signing_context we switched to standard
  JWT "audience" claim (same end result)

* we have now much better diagnostics on the server side of the
  reason why request is forbidden - explicit error messages
  are printed in server logs and details of the exception. This
  is secure, we do not spill the information about the reason
  to the client, it's only available in server logs, so there is
  no risk attacker could use it.

* the JWTSigner is "use-agnostic". We should be able to use the
  same class for any other signatures (Internal API from AIP-44)
  with just different audience

* Short, 5 seconds default clock skew is allowed, to account for
  systems that have "almost" synchronized time

* more tests addded with proper time freezing testing both
  expiry and immaturity of the request

This change is not a breaking one because the JWT authentication
details are not "public API" - but in case someone reverse engineered
our claims and implemented their own log file retrieval, we
should add a change in our changelog - therefore newsfragment
is added.

(cherry picked from commit 1f8e4c9)
That Also includes regenerating the breeze output images.

(cherry picked from commit 13c62b5)
(cherry picked from commit 4d25365)
Breeze when started produced three warnings that were harmless,
but we should fix them to remove "false positives".

(cherry picked from commit ac8a790)
Choosing version of backend were broken when command line switches
were used. The _VERSION variables were "hard-coded" to defaults
rather than taken from command line. This is a remnant of initial
implementation and converting the parameters to "cacheable" ones.

While looking at the versions we also found that PARAM_NAME_FLAG
is not used any more so we took the opportunity to remove it.

(cherry picked from commit 4482eae)
Until we have debian suppor tof MySQL and MSSQL ARM, runnign
those on ARM platform is not supported. However error about it
was not clear (pulling docker image failed).

This PR adds platform checking also in breeze and fails fast
without even attempting to enter breeze shell when you are on
ARM and wants to run MsSQL or MySQL breeze shell.

Also some errors with running different backend versions via
breeze have been removed.

(cherry picked from commit 00d2a3c)
The links to example sources in exampleinclude have been broken in a
number of providers and they were additionally broken by AIP-47.

This PR fixes it.

Fixes: #23632
Fixes: apache/airflow-site#536
(cherry picked from commit 08b675c)
This is the first step to run breeze tests in parallel in CI.
This flag adds "limited progress" output when running tests
which means that the runnig tests will just print few lines with
percent progress and color status indication from last few
progress lines of Pytest output, but when it completes, the whole output is
printed in a CI group - colored depending on status.

The final version (wnen we implement parallel test execution) should
also defer writing the output to until all tests are completed, but
this should be a follow-up PR.

(cherry picked from commit 41fefa1)
Few improvements:

* direct link to workflow in the docs
* "green" success screenshots (separate screenshot for rc)
* job and step names contain both Airflow version and python version
* running subsequent build with same version will cancel past
  in-progress build (in case you quickly make constraint fis for
  example and re-run)

(cherry picked from commit 8ad18bf)
The old breeze-legacy used to have a possibility of very easy
reproduction of CI failures by executing the right breeze command
that contained the commit hash of the PR being tested. This has
been broken for some time after we migrated to the new breeze,
but finally it was the time when it was needed again.

This PR brings back the capability by:

* addding --image-tag parameters for tests, shell and start-airflow
  commands
* if --image-tag is specified, then rather than building the
  image, it is pulled using the specified hash
* if --image-tag is specified, the local sources are not mounted
  to breeze when started, but the sources already embedded in the
  image are used ("skipped" set for --mount-sources).
* new "removed" command value is added to --mount-sources, it
  causes breeze command to remove the sources from the image (it
  is used when installing airflow during the tests for specified
  version (it's automatically used when --use-airflow-version
  is used).

(cherry picked from commit 7dc794a)
After the images are pushed in CI we are running the verification
of the AMD image now.

This cannot be really done during building and pushing the image,
because we are using multi-platform images using remote builders
so the image is not even available locally, so we need to actually
pull the images after they are built in order to verify them.

This PR adds those features:
* ability to pull images for verification with --pull-flag
* ability to verify slim images (regular tests are skipped and
  we only expect the preinstalled providers to be available
* the steps to verify the images (both regular and slim) are
  added to the workflow

(cherry picked from commit 2936759)
…ed (#24581)

In case there are conflicting changes to breeze command in several
PRs, you might get conflicting images printed. in such case you
should run `breeze regenerate-command-images`

(cherry picked from commit 864cbc9)
In the new Breeze, switching to using parallelism is a ... breeze.

This PR adds the capability of building the images in parallel in Breeze
locally - for breeze command, but also uses this capability to build the
images in parallel in our CI. Our builds are always executed on
powerful, big machines with lots of CPU and docker run in memory
filesystem with 32GB RAM, so it should be possible to run all builds in
parallel on a single machine rather then spin off parallel machines to
run the builds using the matrix strategy of Github Actions.

Generally speaking - this will either speed up or get 4x cost saving for
the build steps for all the "full test needed" PRs as well as all the
main builds.

There are a number of savings and improvements we can achieve this way:

1) less overhead for starting and runnning the machines
2) seems that with the new buildkit, the parallel builds are not
   suffering from some sequential locks (as it used to be, so
   we are basically do the same job using 25% resources for building
   the images.
3) we will stop having random "one image failed to build" cases - they
   will all either fail or succeed.
4) Less checks in the output
5) Production builds will additionally gain from single CI image
   pulled in order to perform the preparation of the packages
   and single package preparation step - it will save 4-5 minutes
   per image.

The disadvantage is a less clear output of such parallel build where
outputs from multiple builds will be interleaved in one CI output.

(cherry picked from commit 893d935)
* Upgrade FAB to 4.1.1

The Flask Application Builder have been updated recently to
support a number of newer dependencies. This PR is the
attempt to migrate FAB to newer version.

This includes:

* update setup.py and setup.cfg upper and lower bounds to
  account for proper version of dependencies that
  FAB < 4.0.0 was blocking from upgrade
* added typed Flask application retrieval with a custom
  application fields available for MyPy typing checks.
* fix typing to account for typing hints added in multiple
  upgraded libraries optional values and content of request
  returned as Mapping
* switch to PyJWT 2.* by using non-deprecated "required" claim as
  list rather than separate fields
* add possibiliyt to install providers without constraints
  so that we could avoid errors on conflicting constraints when
  upgrade-to-newer-dependencies is used
* add pre-commit to check that 2.4+ only get_airflow_app is not
  used in providers
* avoid Bad Request in case the request sent to Flask 2.0 is not
  JSon content type
* switch imports of internal classes to direct packages
  where classes are available rather than from "airflow.models" to
  satisfy MyPY
* synchronize changes of FAB Security Manager 4.1.1 with our copy
  of the Security Manager.
* add error handling for a few "None" cases detected by MyPY
* corrected test cases that were broken by immutability of
  Flask 2 objects and better escaping done by Flask 2
* updated test cases to account for redirection to "path" rather
  than full URL by Flask2

Fixes: #22397

* fixup! Upgrade FAB to 4.1.1

(cherry picked from commit e2f1950)
Recent changes to Breeze cause it to fail in certain situations,
especially at self-upgrade (which was generated by today's
upgrade with rich-click).

There were two problems:

* docker volume inspect missed 'volume' and it caused sometimes
  failures in CI

* inputimeout dependency was missing after recent update to
  pre-commit venvs

(cherry picked from commit 0905e38)
We were ignoring mypy error instead of fixing it.

click had removed `get_terminal_size` and recommend using `shutil.get_terminal_size`

(cherry picked from commit 5f67cc0)
(cherry picked from commit b8a4ac5)
We upgraded flask and werkzeug in #24399, and updated the constraints,
but not everyone uses them (such as me in my local virtual environment
when developing) so the min version in setup.cfg has to match as well

(cherry picked from commit 030169d)
Azure service bus uses uamqp which does not build for ARM architecture
and we need to disable it as a dependency for ARM.

(cherry picked from commit 477f907)
…24634)

Today Python released new images that seems to break some of
our dependencies (at least on M1/ARM). This PR adds a workaround
possibility to add --python-image option to override the
default, latest image with any other version released previously
until we diagnose and fix the real issue.

(cherry picked from commit d6e6e7d)
Instead of bash-based, complex logic script to perform PR selective
checks we now integrated the whole logic into Breeze Python code.

It is now much simplified, when it comes to algorithm. We've
implemented simple rule-based decision tree. The rules describing
the decision tree are now are now much easier
to reason about and they correspond one-to-one with the rules
that are implemented in the code in rather straightforward way.

The code is much simpler and diagnostics of the selective checks
has also been vastly improved:

* The rule engine displays status of applying each rule and
  explains (with yellow warning message what decision was made
  and why. Informative messages are printed showing the resulting
  output

* List of files impacting the decision are also displayed

* The names of "ci file group" and "test type" were aligned

* Unit tests covering wide range of cases are added. Each test
  describes what is the case they demonstrate

* `breeze selective-checks` command that is used in CI can also
  be used locally by just providing commit-ish reference of the
  commit to check. This way you can very easily debug problems and
  fix them

Fixes: #19971
(cherry picked from commit d7bd72f)
When #24610 was implemented I missed the label-when-reviewed workflow

(cherry picked from commit 2703874)
Selective checks docs have been moved to breeze as part of #24610
but some of the references were still left.

This PR cleans it up.

(cherry picked from commit aa8cd30)
potiuk and others added 3 commits July 1, 2022 19:13
The recent changes in selective checks introduced a few problems
in non-main branches:

* constraints branch was not used by CI build in non-main branch
  this was not a proble, (until it started to fail) because
  other branches always work in "upgradeDepencies" case

* default branch calculation did not account for the selective-check
  comment that was run before default branch was retrieved. Therefore
  we did not run build as from 2.3 branch

* Some precomits should be skiped in non-main branch

* Integration was also excluded from Provider Check

* Two more commit outputs added:
  * docs-filter -  depends whether you are with more packages
  * skip-pre-commit - allows to dynamically choose which pre-commits
    should be skipped

(cherry picked from commit c8d3850)
…24780)

When pushing cache we cannot use default builder because it has
no capability to push cache. We need to use airflow_cache for it,
which has additionally the capacity of forwarding the builds
to a remote ARM instance in case cache is built for ARM images
too. Recently we added a new --builder flag to allow to choose
the builder, but our scripts and CI have not been modified to
account for that (previously builder was chosen automatically but
that has proven to be limiting in some manual operations and it
also allows to choose different names for builders in case
someone wants to build their own pipeline of builds.

(cherry picked from commit a1679be)
@eladkal
Copy link
Contributor

eladkal commented Jul 1, 2022

Do we want #22658 included in 2.3.3 as well?

@ephraimbuddy
Copy link
Contributor Author

Do we want #22658 included in 2.3.3 as well?

This is a core change and I feel we should exclude it for not being merged before now. It's late IMO

RELEASE_NOTES.rst Outdated Show resolved Hide resolved
RELEASE_NOTES.rst Outdated Show resolved Hide resolved
RELEASE_NOTES.rst Show resolved Hide resolved
RELEASE_NOTES.rst Outdated Show resolved Hide resolved
RELEASE_NOTES.rst Show resolved Hide resolved
RELEASE_NOTES.rst Show resolved Hide resolved
RELEASE_NOTES.rst Outdated Show resolved Hide resolved
RELEASE_NOTES.rst Outdated Show resolved Hide resolved
@hwooson12
Copy link
Contributor

I was just wondering why the release doesn't include #24373. It seems like it is because the PR isn't labeled as type:bug-fix. I thought it was bug-fix so that I expected it's included in release 2.3.3 since I have the same problem with timezone in logging and look forward to this release. Is the change going to be classified as improvement and released in 2.4.x?

@potiuk potiuk force-pushed the v2-3-test branch 2 times, most recently from 7df10dc to a7a1511 Compare July 2, 2022 16:06
There were errors with retieving constraints branch caused by
using different convention for output names (sometimes dash,
sometimes camelCase as suggested by most GitHub documents).

The "dash-name" looks much better and is far more readable so
we shoud unify all internal outputs to follow it.

During that rename some old, unused outputs were removed,
also it turned out that the new selective-check can
replace previous "dynamic outputs" written in Bash as well.

Additionally, the "defaults" are now retrieved via Python script, not
bash script which will make it much more readable - both build_images
and ci.yaml use it in the right place - before replacing
the scripts and dev with the version coming in from PR in case
of build_images.yaml.

(cherry picked from commit 017507be1e1dbf39abcc94a44fab8869037893ea)
@ephraimbuddy ephraimbuddy merged commit b5202fa into v2-3-stable Jul 2, 2022
@jedcunningham
Copy link
Member

@hwooson12, generally things sometimes fall through the cracks unfortunately. I don't think there was anything specific keeping it from the next release, it just got missed. If you do notice PRs that should be bugfixes in the next release and they aren't in the milestone, just speak up.

(In this specific case, it looks like a better fix is now being proposed anyways.)

@bbovenzi
Copy link
Contributor

bbovenzi commented Jul 6, 2022

(In this specific case, it looks like a better fix is now being proposed anyways.)

We're about to revert that commit and fix that issue properly in #24811.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:dev-tools area:production-image Production image improvements and fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.