Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow scheduler zombies don't go away #1157

Closed
criccomini opened this issue Mar 16, 2016 · 5 comments
Closed

Airflow scheduler zombies don't go away #1157

criccomini opened this issue Mar 16, 2016 · 5 comments

Comments

@criccomini
Copy link
Contributor

Our monitoring is seeing zombies hang around for a long time:

airflow  10331 10329 15 16:10 ?        00:19:25 /bin/python /bin/airflow scheduler --num_runs=60
airflow  10369 10331  0 16:10 ?        00:00:00 /bin/python /bin/airflow scheduler --num_runs=60
airflow  10370 10331  0 16:10 ?        00:00:00 [/bin/python /bi] <defunct>
airflow  10371 10331  0 16:10 ?        00:00:00 [/bin/python /bi] <defunct>
airflow  10372 10331  0 16:10 ?        00:00:00 [/bin/python /bi] <defunct>

Here, you can see that the parent scheduler 10331 has a bunch of defunct subprocesses. These are hanging around for many minutes.

This appears to correlate with when I set --num_runs to 60 (with 15s job scheduler heartbeat). Prior to this, I didn't have num_runs set for the scheduler. I'm wondering if something isn't right about the way that num_runs is working.

Also, I'm using supervisord to manage the scheduler. I have autorestart set to always.

@criccomini
Copy link
Contributor Author

Note: this is on version 1.6.2

@bolkedebruin
Copy link
Contributor

This is fixed in PR-#855 but needs testing (and rebasing). If I rebase can you please test?

@criccomini
Copy link
Contributor Author

@bolkedebruin awesome! Yes, I can test.

@bolkedebruin
Copy link
Contributor

@criccomini I updated the PR. Please note that my change involves a behavioral change to airflow (see command line options)

@criccomini
Copy link
Contributor Author

Closing this issue. I believe it's fixed by #855

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants