-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGCHLD doesn't reap properly #44
Comments
Hi, Does this look like the line you're looking for? We have a test which creates a zombie which is reowned by PID1 and reaped: If it's not working for you, can you provide a test case? Thanks! |
Generally the practice is to wait on the pid of the child for the sigchild and then waitpid(-1) with WNOHANG for children to exit. It appears that your method of just doing waitpid(-1) every time will catch the exit code on the next iteration of the while loop, but I believe if a parent and child die at roughly the same time but the kernel sends you the parent pid first, your code will not clean up the child. This is generally the reason for doing a loop with WNOHANG. This is probably an edge case but worth considering. |
@vishvananda I don't think there's an actual problem here, in any case that I can imagine.
Can you present an actually problematic case? |
Fair points. I can't come up with a pressing issue that would be caused by this unless the main init process is failing for some reason. I just tend to go for the theoretically correct approach in case there is some case I can't think of. Happy to close this if you don't see it as an issue. |
I can't see any case where things go wrong. Long running systems have no zombies, and any SIGTERM handlers (et al) get run in all cases. Any zombies that do occur will be reaped in a timely manner, somehow. Closing this until we come up with a better counterexample. |
Doing a quick survey, systemd handles zombie reaping like so:
And upstart's strategy is pretty analogous:
Neither uses WNOHANG, which would cause a busy loop. We could have done all the work in a SIGCHLD handler, but then our main() function needs to run infinitely without busy looping, and also be able to be told to die by the SIGCHLD handler. |
no problem with doing it in the main loop if that is what you prefer. just wanted to comment that WNOHANG does not have to cause a busy loop since it returns -1 if there are no children that have exited. The loop will exit as soon as all of the waiting children have exited. This of course means reaping after your initial wait. This is how tini does it for example: https://github.com/krallin/tini/blob/master/src/tini.c#L348 |
This assumes a very flat process tree which wont always be the case. If the "parent" in question is not a first-level child of dumb-init then the container has no need to exit and the zombies will persist for an extended period of time. This matches the behaviour we have seen. |
If this is intended to be pid 1, it needs to properly reap orphan processes by handling SIGCHLD. There should be a handler for SIGCHLD that essentially does:
while (waitpid((pid_t)(-1), 0, WNOHANG) > 0) {};
more info here:
http://stackoverflow.com/questions/20688982/zombie-process-vs-orphan-process
The text was updated successfully, but these errors were encountered: