Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is this package needed? #23

Closed
mitar opened this issue May 12, 2023 · 13 comments
Closed

Why is this package needed? #23

mitar opened this issue May 12, 2023 · 13 comments

Comments

@mitar
Copy link

mitar commented May 12, 2023

If everything you want to do is just reap any zombies, then it is much easier to just do:

signal.Ignore(syscall.SIGCHLD)

By explicitly setting SIGCHLD handler to ignore, you ask Linux to reap zombies for us. This then handles zombies for you. No need to make a loop and wait yourself.

@ramr
Copy link
Owner

ramr commented May 15, 2023

Sure - it would work for most SysV based systems (dunno about windows *bsd).
But the downside is that you ignore all child terminations inside the process.
If there is any code running some scripts and waiting on child terminations, you could well run into other issues.
Example: see #11

Also the ancestor processes in the container need not be directly started via this process. Process daemons, set sid or anything doing a PR_SET_CHILD_SUBREAPER prctl could potentially be outside this process chain.
Since this is a library and have no real way of knowing how its going to be used downstream (aka what code/calls the code using this library is going to do), this is a method to reap zombies.

Now that said, a simple solution in a lot of cases would be to simply run an "init" process inside the docker container and
let that take care of any "unharvested" children.

@mitar
Copy link
Author

mitar commented May 15, 2023

But the downside is that you ignore all child terminations inside the process.

That holds also if you handle SIGCHLD and wait on a process in signal's handler. It is exactly the same thing if you wait for SIGCHLD and non-discriminatory wait for any process (pid set to -1) in handlers or call explicitly signal.Ignore(syscall.SIGCHLD). So current implementation of go-reaper is the same as calling signal.Ignore(syscall.SIGCHLD). Am I mistaken?

Everything else you wrote holds. I am just saying that this implementation could be simplified by replacing the handler and a wait loop with simply calling signal.Ignore(syscall.SIGCHLD).

@ramr
Copy link
Owner

ramr commented May 15, 2023

But the downside is that you ignore all child terminations inside the process.

That holds also if you handle SIGCHLD and wait on a process in signal's handler. It is exactly the same thing if you wait for SIGCHLD and non-discriminatory wait for any process (pid set to -1) in handlers or call explicitly signal.Ignore(syscall.SIGCHLD). So current implementation of go-reaper is the same as calling signal.Ignore(syscall.SIGCHLD). Am I mistaken?

If you ignore the signal, you will never be delivered that signal and you would be disabling it for the entire process.
Again there is no way for us to know what the code downstream is doing.

Example: User code could have a chain of signal handlers ala:

func death() {
	var sigs = make(chan os.Signal, 3)
	signal.Notify(sigs, syscall.SIGCHLD)
	var sig = <-sigs
	fmt.Printf(" - death: reaping child signal %v\n", sig)
	// do something here
}

func susan_sto_helit () {
	var sigs = make(chan os.Signal, 2)
	signal.Notify(sigs, syscall.SIGCHLD)
	var sig = <-sigs
	fmt.Printf(" - susan: reaping child signal %v\n", sig)
	// do something else here
}

func main() {
	go reaper.Reap()
	go death()
	go susan_sto_helit()

	//  some other code
}

Death and Susan can have different ideas of what they want to do!
Aside: In discworld lore, Susan is death's daughter!

@mitar
Copy link
Author

mitar commented May 16, 2023

Ignoring the signal is exactly the same as waiting for it and non-discriminatory reaping the process, not exposing any information about the waited process to the rest of the code (which what this library is doing). Especially because you do Wait4 without WNOHANG which means that once you get one SIGCHLD, you block in a loop for the next process to die and you generally preempt any other wait there might be in the code. Sadly, wait does not deliver based on specificity: wait(-1) and wait(123) compete exactly the same.

The example you shared is not complete. It is true that both Susan and death both are notified of SIGCHLD, but only one of them will be able to reap the process successfully, generally nondeterministically. Which means that for practical purposes, one cannot rely on those notifications to do anything useful with processes, except for just blindly reaping them, which could then be simplified to signal.Ignore(syscall.SIGCHLD).

This is why you have section "Into The Woods" where you propose to have two processes. Because once you start non-discriminatory reaping children, that process becomes useless for any other handling of children processes.

I have spent few days on this now. :-) In my own code (I am working on init system for Docker containers in Go, https://gitlab.com/tozd/dinit, still WIP) I have to expose metadata about reaped processes to the rest of the process so that if any other part of the code spawns a direct subprocess and fails to wait on it (which you can learn from syscall.ECHILD error from wait), it can check if the reaping loop maybe did it sooner. Sadly, I could not find a way to hide this inside Golang's stdlib transparently, so abstraction slightly breaks. But it works for my limited use case.

@ramr
Copy link
Owner

ramr commented May 16, 2023

Ignoring the signal is exactly the same as waiting for it and non-discriminatory reaping the process, not exposing any information about the waited process to the rest of the code (which what this library is doing). Especially because you do Wait4 without WNOHANG which means that once you get one SIGCHLD, you block in a loop for the next process to die and you generally preempt any other wait there might be in the code. Sadly, wait does not deliver based on specificity: wait(-1) and wait(123) compete exactly the same.

No have to re-iterate this [again] (double redundancy!) - it is not the same, you are explicitly disabling signals - which has other ramifications as I explained in the example and explained below. Signals are a notification mechanism - reaping is something you can optionally do "after the fact" ... which this library does to prevent zombies.
Sure you can use the signal.Ignore(syscall.SIGCHLD) mechanism but the use case is extremely restrictive. As shown by the prior example.

We can expose the information about the reaped processes - haven't had an ask as yet.

The example you shared is not complete. It is true that both Susan and death both are notified of SIGCHLD, but only one of them will be able to reap the process successfully, generally nondeterministically. Which means that for practical purposes, one cannot rely on those notifications to do anything useful with processes, except for just blindly reaping them, which could then be simplified to signal.Ignore(syscall.SIGCHLD).

The example I shared was a stub to indicate what some downstream code could potentially do. It is useful if someone wants to be notified about children dying (which disabling the signals would prevent).
I probably wasn't clear ... it is late here :^( ... But the aim was to indicate it is not just about reaping the processes, signals are a notification mechanism - what you do after that fact could vary. The use case of ignoring the signal is very restrictive. And in such a case prevents the user code from getting any notifications.

@ramr
Copy link
Owner

ramr commented May 16, 2023

I have spent few days on this now. :-) In my own code (I am working on init system for Docker containers in Go, https://gitlab.com/tozd/dinit, still WIP) I have to expose metadata about reaped processes to the rest of the process so that if any other part of the code spawns a direct subprocess and fails to wait on it (which you can learn from syscall.ECHILD error from wait), it can check if the reaping loop maybe did it sooner. Sadly, I could not find a way to hide this inside Golang's stdlib transparently, so abstraction slightly breaks. But it works for my limited use case.

If you have an example of what you are trying to do (some test code and the problems you are facing), that could be useful and maybe can suggest something.

@mitar
Copy link
Author

mitar commented May 16, 2023

Signals are a notification mechanism - reaping is something you can optionally do "after the fact" ... which this library does to prevent zombies.

What else can you do after SIGCHLD signal except to reap the process? I think I am lacking imagination here.

haven't had an ask as yet.

Of course you had. This is why you have the "Into The Woods" section in the README. It was asked implicitly; running a subprocess in a Go program with this package enabled is not really possible and non-deterministic. If one dislikes the "make two processes" solution, one starts thinking about other solutions.

It is useful if someone wants to be notified about children dying (which disabling the signals would prevent).

I agree, theoretically it demonstrates that, but because the example stops short of showing also two different things one could do responding to such an event, it stays very theoretical to me.

If you have an example of what you are trying to do (some test code and the problems you are facing), that could be useful and maybe can suggest something.

Oh, the problem is the one you described in the "Into The Woods" section. I want both to reap zombies and spawn subprocesses. Once I enable this library, spawning of subprocesses does not work anymore. It does not work in the same way as if I had signal.Ignore(syscall.SIGCHLD). So I wondered, why this package does not simply do signal.Ignore(syscall.SIGCHLD). Doing signal.Ignore(syscall.SIGCHLD) also deterministically prevents one to spawn a subprocess and wait for it, while using a signal handler and a loop sometimes works and sometimes does not.

I understand now that this library does not use signal.Ignore(syscall.SIGCHLD) to enable some theoretical possibility of one wanting to use SIGCHLD signal for another notification for something else (but not reaping) while it breaks a very common use case of spawning subprocesses. So if it already breaks that, it is surprising that we worry about secondary uses of the signal itself.

Anyway, I do not think there is much usefulness if pursuing this discussion. We both described our viewpoints well. Thank you.

@mitar mitar closed this as completed May 16, 2023
@ramr
Copy link
Owner

ramr commented May 16, 2023

Signals are a notification mechanism - reaping is something you can optionally do "after the fact" ... which this library does to prevent zombies.

What else can you do after SIGCHLD signal except to reap the process? I think I am lacking imagination here.

SIGCHILD indicates a child processs has changed state. This can happen due to various other reasons - e.g. a process is suspended/resumed see: SIGSTOP/SIGCONT ... exited, killed, dumped or stopped or continued.

man 2 wait | grep -A 5 -B 5 si_code - it is buried in the wait man pages (and sorta the ptrace ones) but not in wait4 variants.

...

As re: the other points, as you mentioned there was not an explicit ask. I am more than willing to add something if there is an ask. I am sorry things did not work out for you.

@mitar
Copy link
Author

mitar commented May 16, 2023

it is buried in the wait man pages (and sorta the ptrace ones) but not in wait4 variants.

But si_code is not exposed by Golang's signal API, no?

I am sorry things did not work out for you.

Oh, no. Things are working great for me. As I mentioned, I am writing my own init in Go for Docker containers (to address similar issues you described in this package, but few more features) so I was looking at your code for insights. And I thought it is doing something more so I needed to clarify why you are doing it this way. Thank you.

@ramr
Copy link
Owner

ramr commented May 17, 2023

What else can you do after SIGCHLD signal except to reap the process? I think I am lacking imagination here.

SIGCHILD indicates a child processs has changed state. This can happen due to various other reasons - e.g. a process is suspended/resumed see: SIGSTOP/SIGCONT ... exited, killed, dumped or stopped or continued.

man 2 wait | grep -A 5 -B 5 si_code - it is buried in the wait man pages (and sorta the ptrace ones) but not in wait4 variants.

But si_code is not exposed by Golang's signal API, no?

Not in the Golang signal API but see: https://pkg.go.dev/syscall#WaitStatus returned by Wait4

@mitar
Copy link
Author

mitar commented May 17, 2023

Not in the Golang signal API but see: https://pkg.go.dev/syscall#WaitStatus returned by Wait4

And now we came full circle. :-) So you are saying that one should not just ignore SIGCHLD because one could be interested in other reasons for SIGCHLD and not just reaping, like knowing if process was stopped or continued, but to do that you have to use the wait call but that one does not work reliably anymore once one has a loop waiting from this package. It is true that this package does not set WUNTRACED and WCONTINUED so it could work to do another wait with those flags.

@ramr
Copy link
Owner

ramr commented May 17, 2023

Not in the Golang signal API but see: https://pkg.go.dev/syscall#WaitStatus returned by Wait4

And now we came full circle. :-) So you are saying that one should not just ignore SIGCHLD because one could be interested in other reasons for SIGCHLD and not just reaping, like knowing if process was stopped or continued, but to do that you have to use the wait call but that one does not work reliably anymore once one has a loop waiting from this package. It is true that this package does not set WUNTRACED and WCONTINUED so it could work to do another wait with those flags.

I am not saying that. You asked me whether the si_code is exposed by Golang's signal API and I responded to that.
What I am saying is that if you want to reap processes and also wait for spawned processes the "Into The Woods" section provides instructions on how to do this. You may not want to use that, that's your choice.

@mitar
Copy link
Author

mitar commented May 17, 2023

Yes, and then the parent process which is doing reaping could be just signal.Ignore(syscall.SIGCHLD). :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants