Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clear logging to tasks killed due to a Dagrun timeout #19950

Merged
merged 3 commits into from
Dec 10, 2021

Conversation

SamWheating
Copy link
Contributor

@SamWheating SamWheating commented Dec 2, 2021

When a DagRun exceeds its dagrun_timeout value a few things happen:

  • The run is marked as failed
  • All unfinished tasks are marked as skipped (which causes running tasks to be SIGTERM'd)
  • A line is logged in the scheduler logs: INFO: Run $RUN_NUMBER of $DAG_ID has timed-out

This has caused some confusion amongst users as its hard to tell why running tasks were killed without either:

  1. Cross-referencing the dagrun_timeout value with the execution time
  2. Reading the scheduler logs.

This PR adds additional messaging into the task logs when it can be inferred that the task was killed due to a DagRun timeout.

I'm not super happy with this implementation (as it kind of duplicates the timeout logic to infer that a timeout occurred) and would really appreciate some advice on other ways we can improve the clarity around timeouts.

I'll add some tests for this once I get some feedback on the initial approach.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Dec 2, 2021
@SamWheating SamWheating force-pushed the sw-log-dagrun-timeouts branch from d7adcf3 to c124c53 Compare December 2, 2021 01:24
@SamWheating SamWheating force-pushed the sw-log-dagrun-timeouts branch 2 times, most recently from 6c58260 to ef9090a Compare December 6, 2021 23:52
@ephraimbuddy
Copy link
Contributor

cc: @ashb

@SamWheating SamWheating force-pushed the sw-log-dagrun-timeouts branch from ef9090a to d3be32a Compare December 9, 2021 00:56
@kaxil kaxil merged commit fab778e into apache:main Dec 10, 2021
@kaxil kaxil added this to the Airflow 2.3.0 milestone Dec 10, 2021
@jedcunningham jedcunningham added the type:improvement Changelog: Improvements label Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants