-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix task retries when they receive sigkill and have retries. Properly handle sigterm too #16301
Merged
ephraimbuddy
merged 1 commit into
apache:main
from
astronomer:fix-task-retries-on-sigkill
Jul 28, 2021
Merged
Fix task retries when they receive sigkill and have retries. Properly handle sigterm too #16301
ephraimbuddy
merged 1 commit into
apache:main
from
astronomer:fix-task-retries-on-sigkill
Jul 28, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
boring-cyborg
bot
added
the
area:Scheduler
including HA (high availability) scheduler
label
Jun 7, 2021
ephraimbuddy
force-pushed
the
fix-task-retries-on-sigkill
branch
2 times, most recently
from
June 8, 2021 08:31
034516f
to
0f9ac16
Compare
ephraimbuddy
changed the title
Fix task retries when they receive sigkill and have retries
Fix task retries when they receive sigkill and have retries. Properly handle sigterm too
Jun 8, 2021
uranusjr
reviewed
Jun 8, 2021
ephraimbuddy
force-pushed
the
fix-task-retries-on-sigkill
branch
from
June 8, 2021 13:14
0f9ac16
to
059a1ba
Compare
ephraimbuddy
commented
Jun 8, 2021
ephraimbuddy
force-pushed
the
fix-task-retries-on-sigkill
branch
4 times, most recently
from
June 10, 2021 07:13
aba146b
to
6eff8e3
Compare
ashb
reviewed
Jun 10, 2021
ashb
reviewed
Jun 10, 2021
ashb
reviewed
Jun 10, 2021
ashb
reviewed
Jun 10, 2021
ashb
requested changes
Jun 10, 2021
ephraimbuddy
force-pushed
the
fix-task-retries-on-sigkill
branch
3 times, most recently
from
June 10, 2021 21:53
8753159
to
459dcab
Compare
…rly handle sigterm Currently tasks are not retried when they receive sigkill or sigterm even if the task has retries. This change fixes it and added test for both sigterm and sigkill so we don't experience regression Before, sigterm sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when task gets sigterm add comment about exit code 143 fix tests Apply suggestions from code review fixup! Apply suggestions from code review fixup! fixup! Apply suggestions from code review fixup! fixup! fixup! Apply suggestions from code review Properly test sigkill and sigterm fort 'ti' fixup! Properly test sigkill and sigterm fort 'ti' Move sigterm test for tasks to taskinstance.py fix sigkill fixup! fix sigkill Fix tests and move retry callback out of finished callback Apply suggestions from code review Fix conflict fixup! Apply suggestions from code review fixup! fixup! Apply suggestions from code review fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm fixup! fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm Fix and remove other tests marked quarantined that changes task state Properly document each tests and set error to None when there's sigkill/sigterm remove task instantiation as it's not needed Take back retry callback to finished callback since finished callback is always run fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm fixup! fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm Apply suggestions from code review Co-authored-by: Ash Berlin-Taylor <[email protected]> fixup! Apply suggestions from code review
ephraimbuddy
force-pushed
the
fix-task-retries-on-sigkill
branch
from
July 27, 2021 20:46
1f57248
to
089a3c2
Compare
This was referenced Aug 3, 2021
jhtimmins
pushed a commit
that referenced
this pull request
Aug 12, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
kaxil
pushed a commit
that referenced
this pull request
Aug 13, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
kaxil
pushed a commit
that referenced
this pull request
Aug 13, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
kaxil
pushed a commit
that referenced
this pull request
Aug 14, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
jhtimmins
pushed a commit
that referenced
this pull request
Aug 15, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
jhtimmins
pushed a commit
that referenced
this pull request
Aug 17, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
potiuk
pushed a commit
that referenced
this pull request
Aug 17, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
jhtimmins
pushed a commit
that referenced
this pull request
Aug 17, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
kaxil
pushed a commit
that referenced
this pull request
Aug 17, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
jhtimmins
pushed a commit
that referenced
this pull request
Aug 17, 2021
…rly handle sigterm (#16301) Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it and added test for both SIGTERM and SIGKILL so we don't experience regression Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM Co-authored-by: Ash Berlin-Taylor <[email protected]> (cherry picked from commit 4e2a94c)
1 task
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:Scheduler
including HA (high availability) scheduler
full tests needed
We need to run full set of tests for this PR to merge
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes: #16285
Currently, tasks are not retried when they receive sigkill even if the task has retries. Sigkill is uncatchable on task runner. However, sigterm on task runner fails tasks without retry. This change fixes it
and added test for both sigterm and sigkill so we don't experience regression
This change also removes quarantined marker on tests that had to do with changing tasks states
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.