Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not create dagruns for DAGs with import errors #19367

Merged
merged 4 commits into from
Nov 29, 2021

Conversation

ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Nov 2, 2021

An active dag can suddenly have import errors as a result of the DAG file being changed. Currently, we
do not consider this before creating dagruns.

This PR adds the has_import_errors to dag_model so dags with import errors are not set to the scheduler to
create dagruns. Also, dag runs can no longer be created on the UI/API when the dag has import errors


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Nov 2, 2021
@jedcunningham jedcunningham added this to the Airflow 2.2.2 milestone Nov 2, 2021
jedcunningham
jedcunningham previously approved these changes Nov 2, 2021
@github-actions
Copy link

github-actions bot commented Nov 2, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 2, 2021
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this case please?

Separately let's check why max_active_runs is not set via tests too

@ephraimbuddy ephraimbuddy force-pushed the use-dag-max-active-runs branch from f186abd to e5b0112 Compare November 3, 2021 16:46
@ephraimbuddy
Copy link
Contributor Author

Can we add a test for this case please?

Separately let's check why max_active_runs is not set via tests too

I can't figure a test to add.
My guess for the issue is that they removed max_active_runs_per_dag from their configuration.
To check this, I have added a fallback and also updated dag's test to ensure once DagModel is instantiated, it'll have a max_active_runs

airflow/models/dag.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the use-dag-max-active-runs branch from e5b0112 to 4ffb9d2 Compare November 3, 2021 21:10
@ephraimbuddy ephraimbuddy changed the title Use dag.max_active_runs instead of dag_model.max_active_runs Do not send DAGs with import errors to the scheduler Nov 3, 2021
@ephraimbuddy
Copy link
Contributor Author

I discovered that this happens because the DAG has import error and is active after upgrading. So scheduler was trying to create dagruns for it.
I have updated the code and commit message.
cc: @jedcunningham @kaxil

@jedcunningham jedcunningham dismissed their stale review November 3, 2021 21:14

Significant changes since I approved.

@ephraimbuddy ephraimbuddy force-pushed the use-dag-max-active-runs branch from 4ffb9d2 to 5b49c57 Compare November 3, 2021 21:22
airflow/models/dag.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy requested a review from kaxil November 4, 2021 00:18
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the use-dag-max-active-runs branch from 575cc31 to bcb3822 Compare November 4, 2021 10:32
@ephraimbuddy ephraimbuddy changed the title Do not send DAGs with import errors to the scheduler Do not send DAGs with null DagModel.max_active_runs to the scheduler Nov 4, 2021
@ashb
Copy link
Member

ashb commented Nov 4, 2021

This feels like only a partial fix.

If a DAG is currently working (and has max_active_runs) and then the file is updated and now has an error, the DAG will still be scheduled, but it probably shouldn't be.

To do that without needing to join against ImportError perhaps we need to add a new column to DAG -- we have is_active which is close, but setting that to false hides it from the UI which isn't the behaviour we want here.

@ephraimbuddy ephraimbuddy force-pushed the use-dag-max-active-runs branch from bcb3822 to 2e3913e Compare November 4, 2021 22:25
@ephraimbuddy
Copy link
Contributor Author

This feels like only a partial fix.

If a DAG is currently working (and has max_active_runs) and then the file is updated and now has an error, the DAG will still be scheduled, but it probably shouldn't be.

To do that without needing to join against ImportError perhaps we need to add a new column to DAG -- we have is_active which is close, but setting that to false hides it from the UI which isn't the behaviour we want here.

I have updated it and it looks good. Thanks for the review!!

@ephraimbuddy ephraimbuddy requested a review from ashb November 4, 2021 22:39
tests/models/test_dag.py Outdated Show resolved Hide resolved
@ephraimbuddy
Copy link
Contributor Author

Happy to discuss about if we still need the max_active_runs condition or not too.

I agree we no longer need it.

I restricted the creation of dagruns on the UI and API too. Let me know if I should remove these restrictions

@ashb
Copy link
Member

ashb commented Nov 11, 2021

Happy to discuss about if we still need the max_active_runs condition or not too.

I agree we no longer need it.

I restricted the creation of dagruns on the UI and API too. Let me know if I should remove these restrictions

If you want to reduce duplication, but keep the rollback you can go from this:

     def test_dags_needing_dagruns_not_too_early(self):
         ...
        session = settings.Session()

to this:

     def test_dags_needing_dagruns_not_too_early(self, session):

And remove the session = settings.Session(). That session pytest fixture defined in tests/conftest.py already does the rollback.

But that should probably be a separate change to this PR.

@ephraimbuddy ephraimbuddy force-pushed the use-dag-max-active-runs branch from 937c592 to 895b152 Compare November 11, 2021 15:49
@ephraimbuddy
Copy link
Contributor Author

Happy to discuss about if we still need the max_active_runs condition or not too.

I agree we no longer need it.
I restricted the creation of dagruns on the UI and API too. Let me know if I should remove these restrictions

If you want to reduce duplication, but keep the rollback you can go from this:

     def test_dags_needing_dagruns_not_too_early(self):
         ...
        session = settings.Session()

to this:

     def test_dags_needing_dagruns_not_too_early(self, session):

And remove the session = settings.Session(). That session pytest fixture defined in tests/conftest.py already does the rollback.

But that should probably be a separate change to this PR.

That's true, will work on it

An active dag can suddenly have import errors as a result of the DAG file being changed. Currently, we
do not consider this before creating dagruns.

This PR adds the has_import_errors to dagmodel so dags with import errors are not set to the scheduler to
create dagruns

fixup! Add has_import_errors to DagModel and ensure Dags with import errors are not sent to the Scheduler

fixup! fixup! Add has_import_errors to DagModel and ensure Dags with import errors are not sent to the Scheduler

Update airflow/models/dag.py

Co-authored-by: Ash Berlin-Taylor <[email protected]>

Update docs/apache-airflow/migrations-ref.rst

Co-authored-by: Ash Berlin-Taylor <[email protected]>

Apply suggestions from code review

Co-authored-by: Ash Berlin-Taylor <[email protected]>

fixup! Apply suggestions from code review

restrict manual dagrun creation if dagrun has import errors

fixup! restrict manual dagrun creation if dagrun has import errors

use session rollback
@@ -23,7 +23,9 @@ Here's the list of all the Database Migrations that are executed via when you ru
+--------------------------------+------------------+-----------------+---------------------------------------------------------------------------------------+
| Revision ID | Revises ID | Airflow Version | Description |
+--------------------------------+------------------+-----------------+---------------------------------------------------------------------------------------+
| ``7b2661a43ba3`` (head) | ``142555e44c17`` | ``2.2.0`` | Change ``TaskInstance`` and ``TaskReschedule`` tables from execution_date to run_id. |
| ``be2bfac3da23`` (head) | ``7b2661a43ba3`` | ``2.2.3`` | Add has_import_errors column to DagModel |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to say this should be 2.3.0 -- it's maybe a bug fix, but this behaviour has been the same for all of 2.x so far.

What do others think here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am for adding it to 2.2.3. There is hardly a use of DagRuns for DAGs with import errors, especially that you could not see whether there were errors via API. Even if it requires data model change, that's fine as patch-level. This is OK for a bugfix to require new field in the DB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this issue will crash the scheduler, I lean toward 2.2.3.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, just the 2.2.3 vs 2.3 decision.

tests/dag_processing/test_processor.py Outdated Show resolved Hide resolved
tests/dag_processing/test_processor.py Outdated Show resolved Hide resolved
tests/models/test_dag.py Show resolved Hide resolved
tests/models/test_dag.py Outdated Show resolved Hide resolved
@uranusjr
Copy link
Member

Do we really need a new field? I think we can instead join ImportError on import_error.filename = dag.fileloc and error if the result is not empty.

@ephraimbuddy
Copy link
Contributor Author

Do we really need a new field? I think we can instead join ImportError on import_error.filename = dag.fileloc and error if the result is not empty.

It looks like it'll not be that efficient. See
#19367 (comment)

Not sure the impact the join would have.
cc: @ashb

@ashb
Copy link
Member

ashb commented Nov 23, 2021

Yeah, we could do it with a join, but this is in the hot path for the main scheduler loop so it seems better to denormalise it here.

@uranusjr
Copy link
Member

Denormalise to optimise makes sense.

@ephraimbuddy ephraimbuddy requested a review from ashb November 24, 2021 16:36
@ashb ashb dismissed their stale review November 29, 2021 10:16

Fixed.

@ephraimbuddy ephraimbuddy merged commit 3ccb794 into apache:main Nov 29, 2021
@ephraimbuddy ephraimbuddy deleted the use-dag-max-active-runs branch November 29, 2021 10:21
dillonjohnson pushed a commit to dillonjohnson/airflow that referenced this pull request Dec 1, 2021
An active dag can suddenly have import errors as a result of the DAG file being changed. Currently, we
do not consider this before creating dagruns.

This PR adds the has_import_errors to dagmodel so dags with import errors are not sent to the scheduler to
create dagruns

Co-authored-by: Ash Berlin-Taylor <[email protected]>
jedcunningham pushed a commit that referenced this pull request Dec 7, 2021
An active dag can suddenly have import errors as a result of the DAG file being changed. Currently, we
do not consider this before creating dagruns.

This PR adds the has_import_errors to dagmodel so dags with import errors are not sent to the scheduler to
create dagruns

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 3ccb794)
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants