Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks set to queued by a backfill get cleared and rescheduled by the kubernetes executor, breaking the backfill #23356

Closed
1 of 2 tasks
leeft95 opened this issue Apr 29, 2022 · 4 comments · Fixed by #23720
Closed
1 of 2 tasks
Assignees
Labels
area:core kind:bug This is a clearly a bug

Comments

@leeft95
Copy link

leeft95 commented Apr 29, 2022

Apache Airflow version

2.2.5 (latest released)

What happened

A backfill launched from the scheduler pod, queues tasks as it should but while they are in the process of starting the kubernentes executor loop running in the scheduler clears these tasks and reschedules them via this function

def clear_not_launched_queued_tasks(self, session=None) -> None:

This causes the backfill to not queue any more tasks and enters an endless loop of waiting for the task it has queued to complete.

The way I have mitigated this is to set the AIRFLOW__KUBERNETES__WORKER_PODS_QUEUED_CHECK_INTERVAL to 3600, which is not ideal

What you think should happen instead

The function clear_not_launched_queued_tasks should respect tasks launched by a backfill process and not clear them.

How to reproduce

start a backfill with large number of tasks and watch as they get queued and then subsequently rescheduled by the kubernetes executor running in the scheduler pod

Operating System

Debian GNU/Linux 10 (buster)

Versions of Apache Airflow Providers


apache-airflow            2.2.5            py38h578d9bd_0    
apache-airflow-providers-cncf-kubernetes 3.0.2              pyhd8ed1ab_0    
apache-airflow-providers-docker 2.4.1              pyhd8ed1ab_0    
apache-airflow-providers-ftp 2.1.2              pyhd8ed1ab_0    
apache-airflow-providers-http 2.1.2              pyhd8ed1ab_0    
apache-airflow-providers-imap 2.2.3              pyhd8ed1ab_0    
apache-airflow-providers-postgres 3.0.0              pyhd8ed1ab_0    
apache-airflow-providers-sqlite 2.1.3              pyhd8ed1ab_0    

Deployment

Other 3rd-party Helm chart

Deployment details

Deployment is running the latest helm chart of Airflow Community Edition

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@leeft95 leeft95 added area:core kind:bug This is a clearly a bug labels Apr 29, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 29, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@snjypl
Copy link
Contributor

snjypl commented May 5, 2022

seems similar to #23048 .

query = session.query(TaskInstance).filter(TaskInstance.state == State.QUEUED)
if self.kubernetes_queue:
query = query.filter(TaskInstance.queue == self.kubernetes_queue)
queued_tis: List[TaskInstance] = query.all()

i think, we need to add a filter TaskInstance.queued_by_job_id == self.job_id. so that the schedulerJob does not clear backfilljob's taskinstnace and vice versa.

@snjypl
Copy link
Contributor

snjypl commented May 16, 2022

duplicate of #23145

@uranusjr
Copy link
Member

Merging into #23145

@uranusjr uranusjr closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2022
@eladkal eladkal removed this from the Airflow 2.3.1 milestone Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants