syscall: forkAndExecInChild assumes clone doesn't block, but it sometimes does #11155

jacobsa · 2015-06-10T22:22:14Z

syscall.forkAndExecInChild uses RawSyscall6 to call clone. This means the goroutine doesn't give up its scheduler slot, making the assumption that clone can't block. But in the case of a program with open files on a fuse file system, clone may block indefinitely. So if the program execs GOMAXPROCS times in parallel, it may lock up and fail to schedule any other runnable goroutines. If the fuse file system is unresponsive or if the fuse server is the same process as has the files open, the program will never make progress.

Here is an example of the kernel stack for a thread that is blocking in sys_clone on behalf of an os/exec.Cmd on Linux 3.19.0-16-generic (Ubuntu 15.04):

[<ffffffff812e708d>] __fuse_request_send+0xcd/0x290
[<ffffffff812e7262>] fuse_request_send+0x12/0x20
[<ffffffff812f0d6d>] fuse_flush+0x12d/0x170
[<ffffffff811f0ac3>] filp_close+0x33/0x80
[<ffffffff81212168>] put_files_struct+0x78/0xd0
[<ffffffff8121226b>] exit_files+0x4b/0x60
[<ffffffff810736c4>] copy_process.part.27+0x8e4/0x1b10
[<ffffffff81074aa1>] do_fork+0xd1/0x350
[<ffffffff81074da6>] SyS_clone+0x16/0x20
[<ffffffff817c9699>] stub_clone+0x69/0x90
[<ffffffffffffffff>] 0xffffffffffffffff

I previously filed #10202 about a similar issue with syscall.Dup2, which Ian fixed in 28074d5. Can a similar fix be made here? I know there are tricky issues with the fork lock, but I don't pretend to be familiar enough to say.

The text was updated successfully, but these errors were encountered:

minux · 2015-06-10T22:45:55Z

I think its not safe to execute any code concurrent with fork. It's also not possible to use Syscall after calling runtime_BeforeFork.

ianlancetaylor · 2015-06-11T23:50:07Z

As @minux says, we can't change from RawSyscall to Syscall. That said, I think that with some care we might be able to temporarily increase GOMAXPROCS in syscall.forkExec.

Does your test case work if GOMAXPROCS > 1?

jacobsa · 2015-06-11T23:57:47Z

I'm actually a bit confused about that. I see the deadlock even if I set GOMAXPROCS to a very high value (e.g. 128). I have only 4 CPUs on this machine, and the concurrency in my test is higher than that (16). Is it possible that the runtime actually caps the number of scheduler slots to some lower value than what I set GOMAXPROCS to?

ianlancetaylor · 2015-06-12T06:08:09Z

No, there is no cap on the number of goroutines less than GOMAXPROCS.

If you can get the program to hang at will, send it a SIGILL or something to get the stack dump of all the goroutines.

jacobsa · 2015-06-14T23:15:43Z

Note that I'm asking about the number of OS threads (this might be the wrong terminology; I mean the thing limited by GOMAXPROCS), not the number of goroutines. This gist shows an invocation of my test case that deadlocked then received SIGQUIT. Though I invoked it with GOMAXPROCS=4 and there are many runnable goroutines, only two are running:

% cat tmux.history | grep -w goroutine | grep run                                
goroutine 11 [runnable]:
goroutine 13 [running]:
        goroutine running on other thread; stack unavailable
goroutine 18 [runnable]:
goroutine 19 [runnable]:
goroutine 22 [runnable]:
goroutine 23 [runnable]:
goroutine 2803 [runnable]:
goroutine 2327 [runnable]:
goroutine 2421 [runnable]:
goroutine 2782 [runnable]:
goroutine 2769 [runnable]:
goroutine 2802 [runnable]:
goroutine 2792 [runnable]:
goroutine 2479 [runnable]:
goroutine 2767 [runnable]:
goroutine 2631 [runnable]:
goroutine 2791 [runnable]:
goroutine 2506 [runnable]:
goroutine 2196 [runnable]:
goroutine 2793 [runnable]:
goroutine 2781 [running]:
        goroutine running on other thread; stack unavailable
goroutine 2532 [runnable]:
goroutine 2575 [runnable]:

This gist shows what the kernel was doing on behalf of each thread after SIGQUIT. (cat /proc/15386/stack hung for the fusermount process; I don't know why.)

In any case, there is no safe value of GOMAXPROCS -- if I set it to N, then N concurrent calls to clone may freeze the program, right?

ianlancetaylor · 2015-06-15T02:39:41Z

Yes, sorry, I misspoke: GOMAXPROCS sets a limit on the number of concurrently running threads of execution, not on the number of goroutines.

The Go runtime only permits one thread to execute fork at a time, so the number of concurrent calls to fork should be irrelevant. What matters is that a single thread should not hang in fork, because there should always be another thread available to serve the FUSE request.

I don't know the details of how FUSE works in Go. I assume that some goroutine is sitting around waiting for FUSE requests. Can you find that goroutine the stack dump? Why is it not ready to handle the request?

I can't help but notice that the exec appears to have been invoked by a call to unmount within the FUSE library. It is possible that the FUSE library is in effect deadlocking against itself?

jacobsa · 2015-06-15T03:20:02Z

Background: this is using github.com/jacobsa/fuse, which wraps bazil.org/fuse, which reads requests from /dev/fuse directly.

In the SIGQUIT output above, there are several file systems active. Each one is sitting in a call to syscall.Read, like goroutine 2574. Compare that to goroutine 2479, who is also in syscall.Read, but whose state is runnable. I believe this is a case where /dev/fuse has delivered a request that sys_clone is waiting for, but Go has not allowed the syscall.Read thread to proceed running due to some limit on parallelism. Hence the clone never gets served. But I'm not an expert on Go scheduling; I could definitely be wrong in the details there.

In other words, "there should always be another thread available to serve the FUSE request" is violated because while there is such a thread, Go is not running it.

My confusion remains: that example shows GOMAXPROCS=4 and many runnable goroutines, but only two running. Any idea what is going on there?

jacobsa · 2015-06-15T03:38:27Z

By the way, for the record: the example above is Go 1.4.2, but I still see the same behavior at 7d48237.

ianlancetaylor · 2015-06-15T03:58:16Z

You should get a better stack dump at tip: you shouldn't get the "goroutine running on other thread" messages. I don't know that that will help, though.

The list of goroutines that are running is a bit more complex than the ones in [running] state, as it can also includes those in raw syscalls.

Can you get a stack dump with the environment variable GOTRACEBACK=2 ? That will show us the runtime frames that are currently being hidden.

jacobsa · 2015-06-15T04:15:53Z

Here is one from tip with GOTRACEBACK set. I still get "goroutine running on other thread", but maybe something can be divined from the runtime frames?

ianlancetaylor · 2015-06-15T04:44:01Z

Thanks. Looking more closely, I guess the stack dump for running threads only happens when using GOTRACEBACK=crash.

I see now that you have two running goroutines (28 and 69) and you also have two goroutines that are locked to threads (17 and 21). That may account for the four threads permitted by GOMAXPROCS. Unfortunately we don't really know what any of them are doing, except that 21 is running ensureSigM which ensures that signal masks are correct when using the os/signal package.

jacobsa · 2015-06-15T06:07:54Z

Here is one with GOTRACEBACK=crash. Note that it froze at the last line (about to dump trace for goroutine 0) until I aborted all fuse connections, at which point it quit and the shell printed

zsh: abort (core dumped)  GOTRACEBACK=crash GOMAXPROCS=4 ./foo

I see that not all goroutines are represented, in particular 34 is missing. :-/

Here is one with GOMAXPROCS=128, where I also only got partial output. At a glance I don't see any difference.

In any case, I think I've lost the thread here: is there a theory you are chasing?

ianlancetaylor · 2015-06-15T15:57:47Z

CC @RLH @aclements See below.

The dump with GOTRACEBACK=crash may show us something useful. It shows a goroutine with this stack trace:

goroutine 0 [idle]:
runtime.futex(0xb60570, 0x0, 0x7f7cc17f9de0, 0x0, 0x0, 0x45d4ec, 0x0, 0x186a0, 0x40de34, 0xb60570, ...)
/home/jacobsa/clients/go/src/runtime/sys_linux_amd64.s:283 +0x23 fp=0x7f7cc17f9db0 sp=0x7f7cc17f9da8
runtime.futexsleep(0xb60570, 0x0, 0x186a0)
/home/jacobsa/clients/go/src/runtime/os1_linux.go:56 +0xf0 fp=0x7f7cc17f9df8 sp=0x7f7cc17f9db0
runtime.notetsleep_internal(0xb60570, 0x186a0, 0xc2081da800)
/home/jacobsa/clients/go/src/runtime/lock_futex.go:167 +0x134 fp=0x7f7cc17f9e28 sp=0x7f7cc17f9df8
runtime.notetsleep(0xb60570, 0x186a0, 0xc2081da800)
/home/jacobsa/clients/go/src/runtime/lock_futex.go:187 +0x71 fp=0x7f7cc17f9e48 sp=0x7f7cc17f9e28
runtime.stopTheWorldWithSema()
/home/jacobsa/clients/go/src/runtime/proc1.go:568 +0x1e9 fp=0x7f7cc17f9e90 sp=0x7f7cc17f9e48
runtime.systemstack(0xc208020000)
/home/jacobsa/clients/go/src/runtime/asm_amd64.s:262 +0x7c fp=0x7f7cc17f9e98 sp=0x7f7cc17f9e90
runtime.mstart()
/home/jacobsa/clients/go/src/runtime/proc1.go:656 fp=0x7f7cc17f9ea0 sp=0x7f7cc17f9e98

This goroutine is trying to stop all threads in order to run the stop-the-world phase of a garbage collection. It tells all running threads to stop and then loops until they do.

A thread waiting for an ordinary syscall to complete is considered stopped by the GC, because it will stop as soon as the syscall completes and the goroutine calls ExitSyscall. However, in this case there is a thread waiting for a raw system call, the call to clone in forkAndExecInChild. The thread is running, and the GC will wait for it to stop. However, the thread is waiting for a FUSE thread to run, but all the FUSE threads have been stopped by the GC. The result is a deadlock.

If that theory is correct, then I think we need to add some sort of marker to beforefork in runtime/proc1.go indicating that the GC can consider the thread to be stopped. Perhaps it would be enough to check for stackguard0 == stackFork in preemptall and stopTheWorldWithSema. We may not need to do anything else other than add comments, as the forking process never allocates memory, and its next step will be to call entersyscall for readlen.

jacobsa · 2015-06-15T23:27:41Z

Nice find! I'm not equipped to comment on your proposed fix, but the diagnosis seems sane to me.

rsc · 2015-07-21T01:18:42Z

I have my doubts, but you could try applying this patch to go/src/runtime:

diff --git a/src/runtime/os1_linux.go b/src/runtime/os1_linux.go
index e6942a9..894a864 100644
--- a/src/runtime/os1_linux.go
+++ b/src/runtime/os1_linux.go
@@ -132,8 +132,15 @@ func newosproc(mp *m, stk unsafe.Pointer) {
    // Disable signals during clone, so that the new thread starts
    // with signals disabled.  It will enable them in minit.
    var oset sigset
+   havep := getg().m.p != nil
    rtsigprocmask(_SIG_SETMASK, &sigset_all, &oset, int32(unsafe.Sizeof(oset)))
+   if havep {
+       entersyscall(0)
+   }
    ret := clone(cloneFlags, stk, unsafe.Pointer(mp), unsafe.Pointer(mp.g0), unsafe.Pointer(funcPC(mstart)))
+   if havep {
+       exitsyscall(0)
+   }
    rtsigprocmask(_SIG_SETMASK, &oset, nil, int32(unsafe.Sizeof(oset)))

    if ret < 0 {

rsc · 2015-07-21T23:47:40Z

I thought more about this and it is VERY complicated. It's not something we can fix for 1.5 at this point.

jacobsa · 2015-09-10T22:47:14Z

@rsc: Thanks for trying with the patch above. Do you still think this is something that could be addressed in 1.6?

rsc · 2015-12-05T04:45:27Z

I've thought about this off and on but have not figured out a good solution. There may not be one. We're certainly out of time for Go 1.6. Sorry. The workaround is to set GOMAXPROCS as high as you can (128 I think).

ianlancetaylor added this to the Go1.5Maybe milestone Jun 10, 2015

ianlancetaylor self-assigned this Jun 10, 2015

jacobsa mentioned this issue Jun 14, 2015

os: OpenFile may return EINTR on OS X #11180

Closed

rsc modified the milestones: Go1.6, Go1.5Maybe, Go1.6Early Jul 21, 2015

rsc modified the milestones: Unplanned, Go1.6Early Dec 5, 2015

ianlancetaylor mentioned this issue Dec 16, 2016

runtime/pprof: TestCPUProfileWithFork is flaky #18349

Closed

rsc unassigned ianlancetaylor Jun 23, 2022

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syscall: forkAndExecInChild assumes clone doesn't block, but it sometimes does #11155

syscall: forkAndExecInChild assumes clone doesn't block, but it sometimes does #11155

jacobsa commented Jun 10, 2015

minux commented Jun 10, 2015 via email

ianlancetaylor commented Jun 11, 2015

jacobsa commented Jun 11, 2015

ianlancetaylor commented Jun 12, 2015

jacobsa commented Jun 14, 2015

ianlancetaylor commented Jun 15, 2015

jacobsa commented Jun 15, 2015

jacobsa commented Jun 15, 2015

ianlancetaylor commented Jun 15, 2015

jacobsa commented Jun 15, 2015

ianlancetaylor commented Jun 15, 2015

jacobsa commented Jun 15, 2015

ianlancetaylor commented Jun 15, 2015

jacobsa commented Jun 15, 2015

rsc commented Jul 21, 2015

rsc commented Jul 21, 2015

jacobsa commented Sep 10, 2015

rsc commented Dec 5, 2015

syscall: forkAndExecInChild assumes clone doesn't block, but it sometimes does #11155

syscall: forkAndExecInChild assumes clone doesn't block, but it sometimes does #11155

Comments

jacobsa commented Jun 10, 2015

minux commented Jun 10, 2015 via email

ianlancetaylor commented Jun 11, 2015

jacobsa commented Jun 11, 2015

ianlancetaylor commented Jun 12, 2015

jacobsa commented Jun 14, 2015

ianlancetaylor commented Jun 15, 2015

jacobsa commented Jun 15, 2015

jacobsa commented Jun 15, 2015

ianlancetaylor commented Jun 15, 2015

jacobsa commented Jun 15, 2015

ianlancetaylor commented Jun 15, 2015

jacobsa commented Jun 15, 2015

ianlancetaylor commented Jun 15, 2015

jacobsa commented Jun 15, 2015

rsc commented Jul 21, 2015

rsc commented Jul 21, 2015

jacobsa commented Sep 10, 2015

rsc commented Dec 5, 2015