Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait: no child processes #2

Closed
timretout opened this issue Jul 21, 2017 · 4 comments
Closed

wait: no child processes #2

timretout opened this issue Jul 21, 2017 · 4 comments
Assignees

Comments

@timretout
Copy link

When trying go-reaper in a service at work, we're getting this error:

Jul 21 11:27:28 app01 docker/doc-view-service.01[677]:  - Received signal child exited
Jul 21 11:27:28 app01 docker/doc-view-service.01[677]:  - Grim reaper cleanup: pid=701, wstatus=0
Jul 21 11:27:28 app01 docker/doc-view-service.01[677]:  - Grim reaper cleanup: pid=685, wstatus=0
Jul 21 11:27:28 app01 docker/doc-view-service.01[677]: 2017/07/21 10:27:28 http: panic serving 172.16.0.38:40850: wait: no child processes
Jul 21 11:27:28 app01 docker/doc-view-service.01[677]: goroutine 22346444 [running]:

This looks quite similar to this issue on gitlab-ci-multi-runner:

https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/issues/989

Our code calls cmd.Run(), so I think what's happening is that the process is getting cleaned up by the reaper before cmd.Wait() can reach it, but I need to debug it further.

@ramr
Copy link
Owner

ramr commented Jul 25, 2017

@diocles that would be an issue for your use case as the reaper could potentially "harvest" the process (depending on the scheduler) before cmd.{Wait,Run}() runs.

Since the code here is waiting for the process group ID matching the parent (the process where this code runs),
https://github.com/ramr/go-reaper/blob/master/reaper.go#L51
one potential workaround I can see that you can try is to set the Process Group Id to the new pid.
Assuming this is a linux container (might work for other oses but ymmv).
See: https://golang.org/pkg/syscall/#SysProcAttr for details

So try something like:

cmd := exec.Command(...) 
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true, Pgid: 0}
// ... 
// cmd.Run()  or cmd.Start() + cmd.Wait()
// ...

Otherwise you'd probably need some other custom code to do the bookkeeping (in lieu of this library) - aka keep track of all the processes started and wait only on those.

HTH

@ramr ramr self-assigned this Jul 25, 2017
@ramr
Copy link
Owner

ramr commented Jul 28, 2017

@diocles closing this out - please reopen if it is still an issue.

@ramr ramr closed this as completed Jul 28, 2017
@timretout
Copy link
Author

timretout commented Aug 11, 2017

@ramr sorry for my slow response to this.

I think the "-1" in the Wait4 call you linked to means "clean any child process". Even if this constant were 0, I'm not sure that setting the process group id is enough to stop the child process getting cleaned up. I can't see a way forward yet, though.

Here's a test for the issue:

package reaper_test

import (
	"os/exec"
	"testing"
	"time"

	"github.com/ramr/go-reaper"
)

func TestWait(t *testing.T) {
	go reaper.Reap()

	cmd := exec.Command("sleep", "1")
	err := cmd.Start()
	if err != nil {
		t.Fatal("Could not start sleep command")
	}

	time.Sleep(2 * time.Second)

	err = cmd.Wait()
	if err != nil {
		t.Errorf("Got err: %s", err)
	}
}

And here's the output:

$ go test -c    # to get a binary to run inside docker
$ docker run -it -v $(pwd):/app debian:latest /app/go-reaper.test
 - Received signal child exited
 - Grim reaper cleanup: pid=11, wstatus=0
--- FAIL: TestWait (2.00s)
	wait_test.go:24: Got err: waitid: no child processes
FAIL

EDIT: I'm not sure this is a bug in go-reaper, though, so I'm not intending to reopen the issue unless there's something to fix.

@ramr
Copy link
Owner

ramr commented Aug 14, 2017

@diocles sorry my bad - I forgot the pid is the first argument and was set to -1 and not the third argument 0. You are right in that it won't work with -1. So I created a new PR #3
which adds a new export Start which you can use.
There's a test called sleeper_test specifically for your case with code to above. This part checks for expected error https://github.com/ramr/go-reaper/blob/master/test/testpid1.go#L42-L48 and if we set the process attributes we shouldn't get an error.

PTAL and see if that fixes your problem/helps you out.
Usage is as shown in the above mentioned PR #3

If you use the Reap entry point, then behavior is as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants