Tasks stuck in queued despite stalled_task_timeout #28120
Labels
affected_version:2.4
Issues Reported for 2.4
area:core
area:Scheduler
including HA (high availability) scheduler
priority:medium
Bug that should be fixed before next release but would not block a release
provider:celery
Milestone
It seems what’s happening is the
airflow tasks run <task>
command is failing on the Celery worker:The Celery status is set to failed , but the task in Airflow remains in queued for some arbitrary amount of time (often hours):
Note the
state=queued
andexecutor_state=failed
-- Airflow should be marking the task as failed. When this happens, these tasks also bypassstalled_task_timeout
, because whenupdate_task_state
is called, the celery state isSTARTED
.self._set_celery_pending_task_timeout(key, None)
removes the task from the list of tasks eligible forstalled_task_timeout
, and so these tasks remain in queued indefinitely.Summary of what's happening:
update_task_state
method callsfail()
, which is a method from BaseExecutor.fail
calls CeleryExecutor’schange_state
method.change_state
method calls BaseExecutor’schange_state
method viasuper()
change_state
method is as follows:Because the
airflow tasks run
command failed, the task is never set to the running state. Theexcept KeyError
block allows the code to continue unabated. Once BaseExecutor’schange_state
method completes, CeleryExecutor’schange_state
method completes:self._set_celery_pending_task_timeout(key, None)
removes the task from the list of tasks thatstalled_task_timeout
checks for, allowing the tasks to remain in queued indefinitely.Instead, when the
airflow tasks run
command fails, the Airflow task instance should be failed or retried (if applicable).The text was updated successfully, but these errors were encountered: