Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only count bad refs when moved table exists #23491

Merged
merged 12 commits into from
May 6, 2022

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented May 4, 2022

This keeps the logic to fail without upgrading when (A) there are bad rows and (B) the "moved" table already exists. But we optimize so that we don't count the bad rows unless the "moved" table is there. Previously we counted always, but the first time a user attempts upgrade, the tables won't be there so there's no point in counting.

Instead what we do is skip right to the CTAS, creating the _airflow_moved tables. If there aren't any rows in the "moved" table, then we delete the table immediately.

Also included here is a delete optimization, where we join to the moved table instead of running the not exists query again.

Tested on all 4 dialects with small sample data to ensure that behavior is correct.

airflow/utils/db.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label May 5, 2022
@github-actions
Copy link

github-actions bot commented May 5, 2022

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@dstandish dstandish force-pushed the dont-count-unless-table-already-there branch from 2276083 to 2c09f1b Compare May 5, 2022 19:58
airflow/utils/db.py Show resolved Hide resolved
airflow/utils/db.py Show resolved Hide resolved
@jedcunningham jedcunningham added the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label May 6, 2022
airflow/utils/db.py Outdated Show resolved Hide resolved
@jedcunningham jedcunningham removed the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label May 6, 2022
@jedcunningham jedcunningham reopened this May 6, 2022
@ashb ashb force-pushed the dont-count-unless-table-already-there branch from c314182 to aa719bd Compare May 6, 2022 11:49
@ashb
Copy link
Member

ashb commented May 6, 2022

Test failures were random. Merging now.

@ashb ashb merged commit 6cc41ab into apache:main May 6, 2022
@ashb ashb deleted the dont-count-unless-table-already-there branch May 6, 2022 12:42
@ashb ashb modified the milestones: Airflow 2.4.0, Airflow 2.3.1 May 6, 2022
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request May 6, 2022
This keeps the logic to fail without upgrading when (A) there are bad rows and
(B) the "moved" table already exists. But we optimize so that we don't count
the bad rows unless the "moved" table is there. Previously we counted always,
but the first time a user attempts upgrade, the tables won't be there so
there's no point in counting.

Instead what we do is skip right to the CTAS, creating the _airflow_moved
tables. If there aren't any rows in the "moved" table, then we delete the table
immediately.

Also included here is a delete optimization, where we join to the moved table
instead of running the not exists query again.

Co-authored-by: Jed Cunningham <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 6cc41ab)
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label May 8, 2022
ephraimbuddy pushed a commit that referenced this pull request May 8, 2022
This keeps the logic to fail without upgrading when (A) there are bad rows and
(B) the "moved" table already exists. But we optimize so that we don't count
the bad rows unless the "moved" table is there. Previously we counted always,
but the first time a user attempts upgrade, the tables won't be there so
there's no point in counting.

Instead what we do is skip right to the CTAS, creating the _airflow_moved
tables. If there aren't any rows in the "moved" table, then we delete the table
immediately.

Also included here is a delete optimization, where we join to the moved table
instead of running the not exists query again.

Co-authored-by: Jed Cunningham <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 6cc41ab)
ephraimbuddy pushed a commit that referenced this pull request May 21, 2022
This keeps the logic to fail without upgrading when (A) there are bad rows and
(B) the "moved" table already exists. But we optimize so that we don't count
the bad rows unless the "moved" table is there. Previously we counted always,
but the first time a user attempts upgrade, the tables won't be there so
there's no point in counting.

Instead what we do is skip right to the CTAS, creating the _airflow_moved
tables. If there aren't any rows in the "moved" table, then we delete the table
immediately.

Also included here is a delete optimization, where we join to the moved table
instead of running the not exists query again.

Co-authored-by: Jed Cunningham <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 6cc41ab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants