-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not create dagruns for DAGs with import errors #19367
Conversation
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for this case please?
Separately let's check why max_active_runs
is not set via tests too
f186abd
to
e5b0112
Compare
I can't figure a test to add. |
e5b0112
to
4ffb9d2
Compare
dag.max_active_runs
instead of dag_model.max_active_runs
I discovered that this happens because the DAG has import error and is active after upgrading. So scheduler was trying to create dagruns for it. |
Significant changes since I approved.
4ffb9d2
to
5b49c57
Compare
575cc31
to
bcb3822
Compare
This feels like only a partial fix. If a DAG is currently working (and has max_active_runs) and then the file is updated and now has an error, the DAG will still be scheduled, but it probably shouldn't be. To do that without needing to join against ImportError perhaps we need to add a new column to DAG -- we have |
bcb3822
to
2e3913e
Compare
I have updated it and it looks good. Thanks for the review!! |
I agree we no longer need it. I restricted the creation of dagruns on the UI and API too. Let me know if I should remove these restrictions |
If you want to reduce duplication, but keep the rollback you can go from this: def test_dags_needing_dagruns_not_too_early(self):
...
session = settings.Session() to this: def test_dags_needing_dagruns_not_too_early(self, session): And remove the But that should probably be a separate change to this PR. |
937c592
to
895b152
Compare
That's true, will work on it |
An active dag can suddenly have import errors as a result of the DAG file being changed. Currently, we do not consider this before creating dagruns. This PR adds the has_import_errors to dagmodel so dags with import errors are not set to the scheduler to create dagruns fixup! Add has_import_errors to DagModel and ensure Dags with import errors are not sent to the Scheduler fixup! fixup! Add has_import_errors to DagModel and ensure Dags with import errors are not sent to the Scheduler Update airflow/models/dag.py Co-authored-by: Ash Berlin-Taylor <[email protected]> Update docs/apache-airflow/migrations-ref.rst Co-authored-by: Ash Berlin-Taylor <[email protected]> Apply suggestions from code review Co-authored-by: Ash Berlin-Taylor <[email protected]> fixup! Apply suggestions from code review restrict manual dagrun creation if dagrun has import errors fixup! restrict manual dagrun creation if dagrun has import errors use session rollback
895b152
to
4a4e513
Compare
@@ -23,7 +23,9 @@ Here's the list of all the Database Migrations that are executed via when you ru | |||
+--------------------------------+------------------+-----------------+---------------------------------------------------------------------------------------+ | |||
| Revision ID | Revises ID | Airflow Version | Description | | |||
+--------------------------------+------------------+-----------------+---------------------------------------------------------------------------------------+ | |||
| ``7b2661a43ba3`` (head) | ``142555e44c17`` | ``2.2.0`` | Change ``TaskInstance`` and ``TaskReschedule`` tables from execution_date to run_id. | | |||
| ``be2bfac3da23`` (head) | ``7b2661a43ba3`` | ``2.2.3`` | Add has_import_errors column to DagModel | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to say this should be 2.3.0 -- it's maybe a bug fix, but this behaviour has been the same for all of 2.x so far.
What do others think here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am for adding it to 2.2.3. There is hardly a use of DagRuns for DAGs with import errors, especially that you could not see whether there were errors via API. Even if it requires data model change, that's fine as patch-level. This is OK for a bugfix to require new field in the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this issue will crash the scheduler, I lean toward 2.2.3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, just the 2.2.3 vs 2.3 decision.
Co-authored-by: Jed Cunningham <[email protected]>
Do we really need a new field? I think we can instead join |
It looks like it'll not be that efficient. See Not sure the impact the join would have. |
Yeah, we could do it with a join, but this is in the hot path for the main scheduler loop so it seems better to denormalise it here. |
Denormalise to optimise makes sense. |
An active dag can suddenly have import errors as a result of the DAG file being changed. Currently, we do not consider this before creating dagruns. This PR adds the has_import_errors to dagmodel so dags with import errors are not sent to the scheduler to create dagruns Co-authored-by: Ash Berlin-Taylor <[email protected]>
An active dag can suddenly have import errors as a result of the DAG file being changed. Currently, we do not consider this before creating dagruns. This PR adds the has_import_errors to dagmodel so dags with import errors are not sent to the scheduler to create dagruns Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 3ccb794)
An active dag can suddenly have import errors as a result of the DAG file being changed. Currently, we
do not consider this before creating dagruns.
This PR adds the has_import_errors to
dag_model
so dags with import errors are not set to the scheduler tocreate dagruns. Also, dag runs can no longer be created on the UI/API when the dag has import errors
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.