Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NOT EXISTS subquery instead of tuple_not_in_condition #33527

Merged
merged 2 commits into from
Aug 21, 2023

Conversation

Taragolis
Copy link
Contributor

Replace tuple-not-in-collection by pure NOT EXISTS statement, which should be supported by all modern RDBMS.
This should improve performance for such of queries in MSSQL, and I do not expect any noticeable changes for other backend


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@Taragolis Taragolis added the full tests needed We need to run full set of tests for this PR to merge label Aug 19, 2023
@Taragolis Taragolis marked this pull request as ready for review August 19, 2023 15:10
Comment on lines 593 to 595
:meta private:
"""
if settings.engine.dialect.name != "mssql":
warnings.warn(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe better remove this function 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to deprecate it, this has :meta private: so no user code should depend on it. I feel it’s fine to keep this though, it’s a mirror to tuple_in_condition and might be useful at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we also could get rid of tuple_in_condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I would remove deprecation note

if num_to_keep <= 0:
return

from airflow.models.dagrun import DagRun
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this actually matters, at this point it’s likely DagRun is imported anyway and the import would be basically free. But either way works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment. I've learned recently to use more and more local imports. I think basically that when not really needed, the advice of "use top level imports by default" is less and less relevant for "heavy" imports.

The lazy import PEP is years away and it does not seem that it will be enabled by default - you will have to specify an option when you launch interpreter, and I think it's simply inevitable that people will use more and more local imports and the tooling (static checks IDE support etc.) will start supporting it better.

That's my bet actually that rather than switching to lazy impors we will switch to "local imports for heavy things by default". You can see it all over Airflow's code already.

The only reason why you would like to use top-level imports for "heavy" packages is when you use the same import in multiple methods in the same module - but this also can be solved by simply local importing that partiular module.

I bet some tooling for "detect heavy top-level imports" will soon show up in one of the static check tooling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and of course the problem is with type annotations and TYPE_CHECKING - would also be nice to solve that one as the TYPE_CHECKING is really a hack as opposed to local imports)

@Taragolis Taragolis merged commit 277cfcb into apache:main Aug 21, 2023
64 checks passed
@Taragolis Taragolis deleted the use-not-exists branch August 21, 2023 21:08
potiuk added a commit to potiuk/airflow that referenced this pull request Sep 12, 2023
Some of the recent sqlalchemy changes are not working with minimum
version of sqlalchemy of ours - for example `where` syntax does
not allow moe than one clause and we are already passing more
in _do_delete_old_records (added in apache#33527). This syntax however
was added in SQL Alchemy 1.4.28 and our minimum version was
1.4.27.

This change bumps the minimum SQLAlchemy version to 1.4.28 but it also
adds a special test job that only runs on Postgres that downgrades
the SQLAlchemy to the minimum supported version (retrieved from
setup.cfg). This way, we will be able to detect such incompatible
changes at the PR time. This is a new flag `--downgrade-sqlalchemy`
on test command that works similar to earlier `--upgrade-boto`.

We also enable the `--upgrade-boto` and `--downgrade-sqlalchemy` flags
to be used for `breeze shell` command - thanks to that we can
easily test both flags with `breeze shell` command.
potiuk added a commit that referenced this pull request Sep 12, 2023
…4293)

Some of the recent sqlalchemy changes are not working with minimum
version of sqlalchemy of ours - for example `where` syntax does
not allow moe than one clause and we are already passing more
in _do_delete_old_records (added in #33527). This syntax however
was added in SQL Alchemy 1.4.28 and our minimum version was
1.4.27.

This change bumps the minimum SQLAlchemy version to 1.4.28 but it also
adds a special test job that only runs on Postgres that downgrades
the SQLAlchemy to the minimum supported version (retrieved from
setup.cfg). This way, we will be able to detect such incompatible
changes at the PR time. This is a new flag `--downgrade-sqlalchemy`
on test command that works similar to earlier `--upgrade-boto`.

We also enable the `--upgrade-boto` and `--downgrade-sqlalchemy` flags
to be used for `breeze shell` command - thanks to that we can
easily test both flags with `breeze shell` command.
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Oct 2, 2023
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Oct 2, 2023
ephraimbuddy pushed a commit that referenced this pull request Oct 5, 2023
…4293)

Some of the recent sqlalchemy changes are not working with minimum
version of sqlalchemy of ours - for example `where` syntax does
not allow moe than one clause and we are already passing more
in _do_delete_old_records (added in #33527). This syntax however
was added in SQL Alchemy 1.4.28 and our minimum version was
1.4.27.

This change bumps the minimum SQLAlchemy version to 1.4.28 but it also
adds a special test job that only runs on Postgres that downgrades
the SQLAlchemy to the minimum supported version (retrieved from
setup.cfg). This way, we will be able to detect such incompatible
changes at the PR time. This is a new flag `--downgrade-sqlalchemy`
on test command that works similar to earlier `--upgrade-boto`.

We also enable the `--upgrade-boto` and `--downgrade-sqlalchemy` flags
to be used for `breeze shell` command - thanks to that we can
easily test both flags with `breeze shell` command.

(cherry picked from commit efbead9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants