-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: CPU profiling provokes a runtime hang under load #24260
Comments
What happens if you don’t run runtime.Sched in a loop?
… On 6 Mar 2018, at 10:38, Byron Rakitzis ***@***.***> wrote:
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (go version)?
go1.10 (NB: cross-compiling for armv7)
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env)?
Cross compiling for 32 bit arm: GOOS=linux GOARCH=arm go test -c
The host OS is a mac:
; go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/byron/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/byron/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qv/yq0hdcc10vj2y4gpy_ypjj2m0000gn/T/go-build940051274=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
CPU profiling under load seems to hang the process indefinitely. It becomes unresponsive to SIGINT and SIGQUIT, for example.
We discovered this in production code which doesn't do anything as goofy as the minimal example I am submitting, but I did manage to pare it down to a unit test of a few lines:
https://play.golang.org/p/_B_zC9QeQQr
package main
import (
"bytes"
"runtime"
"runtime/pprof"
"testing"
"time"
)
func TestProfileUnderLoad(t *testing.T) {
go func() {
for {
runtime.Gosched()
}
}()
if err := pprof.StartCPUProfile(&bytes.Buffer{}); err != nil {
t.Fatal(err)
}
time.Sleep(30 * time.Second)
pprof.StopCPUProfile()
}
What did you expect to see?
This should not hang.
What did you see instead?
This code example hangs after 2-3 tries on a Marvel Armada SoC (a uniprocessor) with Linux 4.9.37
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This works ok when the CPU is idle. The loop is there as a placeholder for CPU load. In practice we use ARM SoC's as storage processors in front of SATA disks. So they spend most of their time shuttling bytes from the disk to the network (in both directions). Please let me know if you can suggest a better proxy for CPU load. I have not tried, e.g., an io.Copy of /dev/zero to /dev/null but that may produce the same result. Some more details: this does not occur in go-1.8. It occurs reliably from go-1.9 onwards, and manifests in our distributed system as the storage processors first becoming nonresponsive to RPC calls and then to SIGQUIT. It's holding up our upgrade to go-1.9 in production. |
Do you mean CPU load, as in computing fib(9001), or IO load? I think your
io.Copy suggestion is a more realistic representation of your workload.
…On 6 March 2018 at 11:07, Byron Rakitzis ***@***.***> wrote:
This works ok when the CPU is idle. The loop is there as a placeholder for
CPU load.
In practice we use ARM SoC's as storage processors in front of SATA disks.
So they spend most of their time shuttling bytes from the disk to the
network (in both directions).
Please let me know if you can suggest a better proxy for CPU load. I have
not tried, e.g., an io.Copy of /dev/zero to /dev/null but that may produce
the same result.
Some more details: this does not occur in go-1.8. It occurs reliably from
go-1.9 onwards, and manifests in our distributed system as the storage
processors first becoming nonresponsive to RPC calls and then to SIGQUIT.
It's holding up our upgrade to go-1.9 in production.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24260 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcAx__JmxJWkDPAl568XL6pjO-xUyMks5tbdNAgaJpZM4Sd2yz>
.
|
Yes, i/o load, which does end up keeping this underpowered CPU quite busy. |
this underpowered CPU quite busy.
In kernel space, not userspace hopefully
…On 6 March 2018 at 11:18, Byron Rakitzis ***@***.***> wrote:
Yes, i/o load, which does end up keeping this underpowered CPU quite busy.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24260 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcAyLW2Xmu_3Y2zscwfH8Z0IoORfUeks5tbdXogaJpZM4Sd2yz>
.
|
Indeed. If it is useful I can try to get a repro with io.Copy. The gosched loop was simply the first thing I tried that reproduced the same symptoms with a minimum of code. |
I have a repro which takes much longer to strike (maybe 30 attempts), using this "load" to approximate I/O plus processing overhead. It is not a proper model (our arms do not md5 every byte) but we do incur some protocol overhead:
|
Sorry to be pendantic but this sample isn't complete. How are you
preventing the main goroutine from exiting?
…On 6 March 2018 at 12:13, Byron Rakitzis ***@***.***> wrote:
I have a repro which takes much longer to strike (maybe 30 attempts),
using this "load" to approximate I/O plus processing overhead. It is not a
proper model (our arms do not md5 every byte) but we do incur some protocol
overhead:
dst, err := os.Create("/dev/null")
if err != nil {
t.Fatal(err)
}
src, err := os.Open("/dev/zero")
if err != nil {
t.Fatal(err)
}
r, w := io.Pipe()
go func() {
buf := make([]byte, 32768)
for {
n, err := src.Read(buf)
if err != nil {
w.CloseWithError(err)
return
}
md5.Sum(buf[:n])
w.Write(buf[:n])
}
}()
go io.Copy(dst, r)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24260 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcAwmv3K7CSxfz7dWADuSyVqXsUyhHks5tbeLKgaJpZM4Sd2yz>
.
|
The test code above (with the contrived Gosched example) is complete -- the code exits when time.Sleep runs for 30 sec. Otherwise it takes "forever". Not sure how long forever is -- but minutes to hours at least (my patience runs out first). |
And note that the process becomes unresponsive to SIGQUIT and SIGINT. |
#24260 (comment) is not complete, can you please show the full program so that others can try to reproduce the problem. The full program is important because I want to know how you're avoiding the main goroutine exiting as both the goroutines in #24260 (comment) will not prevent the main goroutine from exiting. |
If your example is structured as a test, be aware that |
Here is the full source: https://play.golang.org/p/lhf4TbbFMqS Here is how it is compiled:
Here is how it is run on the target architecture:
The last time I ran this the loop made it through about 250 iterations before hanging. |
Let me know if it helps to recast the repro as a standalone binary with a |
I can confirm this with a piece of code I'm running: Reading text at ~1gbit/sec from a handful of sockets and updating some memory. Linux/Samsung Exynos5422 |
It looks like this is an oversight in the code that is meant to prevent systems that emulate 64-bit atomics from deadlocking when the signal handler code grabs one of the spinlocks while it's already held. Basically, this is issue #20146 but for arm instead of mips. The solution appears to be straightforward:
I'm going to go confirm that this does indeed fix the issue, and then send a CL. |
Change https://golang.org/cl/117057 mentions this issue: |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go1.10 (NB: cross-compiling for armv7)
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?Cross compiling for 32 bit arm:
GOOS=linux GOARCH=arm go test -c
The host OS is a mac:
What did you do?
CPU profiling under load seems to hang the process indefinitely. It becomes unresponsive to SIGINT and SIGQUIT, for example.
We discovered this in production code which doesn't do anything as goofy as the minimal example I am submitting, but I did manage to pare it down to a unit test of a few lines:
https://play.golang.org/p/_B_zC9QeQQr
What did you expect to see?
This should not hang.
What did you see instead?
This code example hangs after 2-3 tries on a Marvel Armada SoC (a uniprocessor) with Linux 4.9.37
The text was updated successfully, but these errors were encountered: