Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-upgrade check for dangling TI references #22924

Merged
merged 1 commit into from
Apr 14, 2022

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Apr 11, 2022

We are adding some foreign keys in 2.3.0 so we want make it more likely that migration succeeds by detecting FK violations and moving the records out of the table before creating the FK. We already had a check for dag run but this adds one for TaskInstance.

@dstandish dstandish marked this pull request as draft April 11, 2022 22:11
@dstandish dstandish changed the title Fix dangling ti WIP - Fix dangling ti Apr 11, 2022
@dstandish dstandish changed the title WIP - Fix dangling ti Add pre-upgrade check for dangling TI references Apr 12, 2022
@dstandish dstandish requested review from uranusjr and ashb April 12, 2022 07:07
@dstandish
Copy link
Contributor Author

dstandish commented Apr 12, 2022

i have not tested on all 4 but it's very similar to the dagrun check so it's probably fine. i will confirm tomorrow.

tested on all 4 dbs

Test data:

testing from pre-2.2.0: test_2_1_4.txt
testing upgrade from 2.2.0 < ver < 2.3.0: test_2_2_5.txt

procedure:

delete and recreate db
airflow db upgrade --to-version 2.1.4
insert test data
airflow db upgrade

@dstandish dstandish marked this pull request as ready for review April 12, 2022 07:11
airflow/utils/db.py Outdated Show resolved Hide resolved
airflow/utils/db.py Outdated Show resolved Hide resolved
airflow/utils/db.py Outdated Show resolved Hide resolved
@dstandish
Copy link
Contributor Author

OK @ashb @uranusjr i think this is ready for a look.

Recall that prior to this PR, we had "missing dag run" checks for dangling rows. But this wasn't quite all that we needed. It made sense for TaskInstance, when we added run_id to it in 2.2.0. However, for the other tables we are changing in 2.3.0, checking for the existence of a DagRun is not actually strict enough, since we're keying to TI -- we need to look for a record in TI.join(DR).

Initially I thought I should add a second set of checks -- for TI -- alongside the DR checks. But, since the check for dangling TI ref implies a check for dangling DR ref, we can skip the DR check and just check for TI. This enables us to do all checks (missing DR and missing TI) in the same loop, though this required a bit of a refactor.

One complication is that when db < 2.2.0 the join conditions for TI.join(DR) are different from when 2.2.0 <= db < 2.3.0. Initially I was going to read airflow version of db for this determination, but it ended up being easy enough to use a "duck typing" inference approach.

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me

airflow/utils/db.py Outdated Show resolved Hide resolved
airflow/utils/db.py Outdated Show resolved Hide resolved
We are adding some foreign keys in 2.3.0 so we want make it more likely that migration succeeds by detecting FK violations and moving the records out of the table before creating the FK.  We already had a check for dag run but this adds one for TaskInstance.
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Apr 13, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@dstandish dstandish merged commit 26e09df into apache:main Apr 14, 2022
@dstandish dstandish deleted the fix-dangling-ti branch April 14, 2022 01:59
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants