Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix task retries when they receive sigkill and have retries. Properly handle sigterm too #16301

Merged
merged 1 commit into from
Jul 28, 2021

Conversation

ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Jun 7, 2021

Closes: #16285

Currently, tasks are not retried when they receive sigkill even if the task has retries. Sigkill is uncatchable on task runner. However, sigterm on task runner fails tasks without retry. This change fixes it
and added test for both sigterm and sigkill so we don't experience regression

This change also removes quarantined marker on tests that had to do with changing tasks states


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jun 7, 2021
@ephraimbuddy ephraimbuddy force-pushed the fix-task-retries-on-sigkill branch 2 times, most recently from 034516f to 0f9ac16 Compare June 8, 2021 08:31
@ephraimbuddy ephraimbuddy changed the title Fix task retries when they receive sigkill and have retries Fix task retries when they receive sigkill and have retries. Properly handle sigterm too Jun 8, 2021
@ephraimbuddy ephraimbuddy reopened this Jun 8, 2021
@ephraimbuddy ephraimbuddy force-pushed the fix-task-retries-on-sigkill branch from 0f9ac16 to 059a1ba Compare June 8, 2021 13:14
@kaxil kaxil closed this Jun 9, 2021
@kaxil kaxil reopened this Jun 9, 2021
@ephraimbuddy ephraimbuddy reopened this Jun 9, 2021
@ephraimbuddy ephraimbuddy force-pushed the fix-task-retries-on-sigkill branch 4 times, most recently from aba146b to 6eff8e3 Compare June 10, 2021 07:13
@ephraimbuddy ephraimbuddy reopened this Jun 10, 2021
tests/jobs/test_local_task_job.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the fix-task-retries-on-sigkill branch 3 times, most recently from 8753159 to 459dcab Compare June 10, 2021 21:53
@ephraimbuddy ephraimbuddy reopened this Jul 27, 2021
…rly handle sigterm

Currently tasks are not retried when they receive sigkill or sigterm even if the task has retries. This change fixes it
and added test for both sigterm and sigkill so we don't experience regression

Before, sigterm sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when task gets sigterm

add comment about exit code 143

fix tests

Apply suggestions from code review

fixup! Apply suggestions from code review

fixup! fixup! Apply suggestions from code review

fixup! fixup! fixup! Apply suggestions from code review

Properly test sigkill and sigterm fort 'ti'

fixup! Properly test sigkill and sigterm fort 'ti'

Move sigterm test for tasks to taskinstance.py

fix sigkill

fixup! fix sigkill

Fix tests and move retry callback out of finished callback

Apply suggestions from code review

Fix conflict

fixup! Apply suggestions from code review

fixup! fixup! Apply suggestions from code review

fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm

fixup! fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm

Fix and remove other tests marked quarantined that changes task state

Properly document each tests and set error to None when there's sigkill/sigterm

remove task instantiation as it's not needed

Take back retry callback to finished callback since finished callback is always run

fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm

fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm

fixup! fixup! Fix task retries when they receive sigkill and have retries and properly handle sigterm

Apply suggestions from code review

Co-authored-by: Ash Berlin-Taylor <[email protected]>

fixup! Apply suggestions from code review
@ephraimbuddy ephraimbuddy force-pushed the fix-task-retries-on-sigkill branch from 1f57248 to 089a3c2 Compare July 27, 2021 20:46
@uranusjr uranusjr closed this Jul 28, 2021
@uranusjr uranusjr reopened this Jul 28, 2021
@ephraimbuddy ephraimbuddy merged commit 4e2a94c into apache:main Jul 28, 2021
@ephraimbuddy ephraimbuddy deleted the fix-task-retries-on-sigkill branch July 28, 2021 14:57
jhtimmins pushed a commit that referenced this pull request Aug 12, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
kaxil pushed a commit that referenced this pull request Aug 13, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
kaxil pushed a commit that referenced this pull request Aug 13, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
kaxil pushed a commit that referenced this pull request Aug 14, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
jhtimmins pushed a commit that referenced this pull request Aug 15, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
jhtimmins pushed a commit that referenced this pull request Aug 17, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
potiuk pushed a commit that referenced this pull request Aug 17, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
jhtimmins pushed a commit that referenced this pull request Aug 17, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
kaxil pushed a commit that referenced this pull request Aug 17, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
jhtimmins pushed a commit that referenced this pull request Aug 17, 2021
…rly handle sigterm (#16301)

Currently, tasks are not retried when they receive SIGKILL or SIGTERM even if the task has retry. This change fixes it
and added test for both SIGTERM and SIGKILL so we don't experience regression

Also, SIGTERM sets the task as failed and raises AirflowException which heartbeat sometimes see as externally set to fail
and not call failure_callbacks. This commit also fixes this by calling handle_task_exit when a task gets SIGTERM

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 4e2a94c)
@WattsInABox WattsInABox mentioned this pull request Sep 17, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Airflow 2.1.0 doesn't retry a task if it externally killed
4 participants