Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task stuck in "scheduled" when running in backfill job #23145

Closed
1 of 2 tasks
ashlee3209 opened this issue Apr 21, 2022 · 6 comments · Fixed by #23720
Closed
1 of 2 tasks

Task stuck in "scheduled" when running in backfill job #23145

ashlee3209 opened this issue Apr 21, 2022 · 6 comments · Fixed by #23720
Labels
affected_version:2.2 Issues Reported for 2.2 area:backfill Specifically for backfill related kind:bug This is a clearly a bug
Milestone

Comments

@ashlee3209
Copy link

Apache Airflow version

2.2.4

What happened

We are running airflow 2.2.4 with KubernetesExecutor. I have created a dag to run airflow backfill command with SubprocessHook. What was observed is that when I started to backfill a few days' dagruns the backfill would get stuck with some dag runs having tasks staying in the "scheduled" state and never getting running.

We are using the default pool and the pool is totoally free when the tasks got stuck.

I could find some logs saying:
TaskInstance: <TaskInstance: test_dag_2.task_1 backfill__2022-03-29T00:00:00+00:00 [queued]> found in queued state but was not launched, rescheduling and nothing else in the log.

What you think should happen instead

The tasks stuck in "scheduled" should start running when there is free slot in the pool.

How to reproduce

Airflow 2.2.4 with python 3.8.13, KubernetesExecutor running in AWS EKS.

One backfill command example is: airflow dags backfill test_dag_2 -s 2022-03-01 -e 2022-03-10 --rerun-failed-tasks

The test_dag_2 dag is like:

import time
from datetime import timedelta

import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.models.dag import dag
from airflow.operators.bash import BashOperator
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import PythonOperator

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email': ['[email protected]'],
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}


def get_execution_date(**kwargs):
    ds = kwargs['ds']
    print(ds)

with DAG(
        'test_dag_2',
        default_args=default_args,
        description='Testing dag',
        start_date=pendulum.datetime(2022, 4, 2, tz='UTC'),
        schedule_interval="@daily", catchup=True, max_active_runs=1,
) as dag:
    t1 = BashOperator(
        task_id='task_1',
        depends_on_past=False,
        bash_command='sleep 30'
    )

    t2 = PythonOperator(
        task_id='get_execution_date',
        python_callable=get_execution_date
    )

    t1 >> t2

Operating System

Debian GNU/Linux

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==3.0.0
apache-airflow-providers-celery==2.1.0
apache-airflow-providers-cncf-kubernetes==3.0.2
apache-airflow-providers-docker==2.4.1
apache-airflow-providers-elasticsearch==2.2.0
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-google==6.4.0
apache-airflow-providers-grpc==2.0.1
apache-airflow-providers-hashicorp==2.1.1
apache-airflow-providers-http==2.0.3
apache-airflow-providers-imap==2.2.0
apache-airflow-providers-microsoft-azure==3.6.0
apache-airflow-providers-microsoft-mssql==2.1.0
apache-airflow-providers-odbc==2.0.1
apache-airflow-providers-postgres==3.0.0
apache-airflow-providers-redis==2.0.1
apache-airflow-providers-sendgrid==2.0.1
apache-airflow-providers-sftp==2.4.1
apache-airflow-providers-slack==4.2.0
apache-airflow-providers-snowflake==2.5.0
apache-airflow-providers-sqlite==2.1.0
apache-airflow-providers-ssh==2.4.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@ashlee3209 ashlee3209 added area:core kind:bug This is a clearly a bug labels Apr 21, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 21, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@ashlee3209
Copy link
Author

I tried with this workaround #13542 (comment) and it made the stuck tasks running again.

@wookiist
Copy link
Contributor

Is there any update regarding this issue?
There are many tasks that fall into the 'scheduled' state when working on the backfill 😂

Apache Airflow 2.3.4 / Kubernetes Executor

@potiuk
Copy link
Member

potiuk commented Sep 18, 2022

I think there were quite q few changes to backfill - so you might try 2.4.0rc1 @wookiist - and see if it solves the problem. It might not be it, but worth trying

@ephraimbuddy ephraimbuddy modified the milestones: Airflow 2.4.3, Airflow 2.4.4 Nov 9, 2022
@ephraimbuddy ephraimbuddy modified the milestones: Airflow 2.4.4, Airflow 2.5.0 Nov 23, 2022
@venkateshnyq550
Copy link

venkateshnyq550 commented Dec 15, 2022

Is there any update regarding this issue?
My tasks that fall into the 'scheduled' state when working on the backfill

Apache Airflow 2.4.0 / Kubernetes Executor

@potiuk
Copy link
Member

potiuk commented Jan 2, 2023

@venkateshnyq550

Is there any update regarding this issue? My tasks that fall into the 'scheduled' state when working on the backfill
Apache Airflow 2.4.0 / Kubernetes Executor

There will be no update to that issue any more unless the author retests it in latest version of Airlfow and confirms whether it was solved or not and provide some more evidences from the latest version.

This issue is presumed closed. And your problem might or might not be the same even if it looks similar. So if you want to have some help on your problem, you should open a new issue with all the details you can provide - reproducible path, circumstances, logs, screenshots and all evidences you can find. Ideally in the latest released version: 2.5.0 as of now. In any case any fixes will be implemented in 2.5.* so upgrading to latest version is something that you will have to do anyway to fix, so better to do it sooner to report it already based on 2.5.* - and maybe you will find that the issue is already solved there, which will save time both - you and anyone who would look at the issue.

There is no action possible to be taken by somone stating "I have the same issue" without providing any evidences, and since it is really difficult to asses if the issue is the same or not, so the only way (if you want help) is to report a new issue with the details - feel free to refer to that issue by number as likely similar issue, that might help with finding out if they are related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.2 Issues Reported for 2.2 area:backfill Specifically for backfill related kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants