-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airflow Scheduler with Kubernetes Executor has errors in logs and stuck slots with no running tasks #36478
Comments
Discussed in #35426 |
Hi @crabio, |
Thank you (@dirrao) very much! |
@dirrao Seems it works well! thank you |
Hello @dirrao |
@crabio, |
@dirrao Yes, thank you! I rollback our cluster to 1 Scheduler and 64 parallelism to have the same capacity :) |
You can deduce the number of iterations based on scheduler loop time metric. |
This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. |
This issue has been closed because it has not received response from the issue author. |
Hi! |
@ephraimbuddy @dirrao may you reopen the issue? |
duplicated in #36998 |
duplicated in #36998 |
Apache Airflow version
2.8.0
If "Other Airflow 2 version" selected, which one?
2.7.3
What happened?
I found race condition between 2 schedulers.
Scheduler 1 starts task pod
task finished
Scheduler 2 processes task finish and kill pod
Scheduler 1 thinks that task is still running, but Scheduler 2 updates tasks status in the database.
What you think should happen instead?
Multiple schedulers should process properly concurrent work with Kubernetes Executor
How to reproduce
Operating System
Docker based on apache/airflow:2.7.3
Versions of Apache Airflow Providers
apache-airflow == 2.7.3
dbt-core == 1.6.6
dbt-snowflake == 1.6.4
apache-airflow-providers-snowflake
apache-airflow[statsd]
facebook-business == 16.0.2
google-ads == 21.1.0
twitter-ads == 11.0.0
acryl-datahub-airflow-plugin
acryl-datahub[dbt]
checksumdir
filelock
openpyxl
cronsim
apache-airflow-providers-cncf-kubernetes==7.8.0
apache-airflow-providers-apache-kafka == 1.2.0
kubernetes
snowplow_analytics_sdk
Deployment
Other 3rd-party Helm chart
Deployment details
No response
Anything else?
logs:
Untitled discover search.csv
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: