Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: crash in race detector when execution tracer reads from CPU profile buffer #65607

Closed
nsrip-dd opened this issue Feb 8, 2024 · 6 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Feb 8, 2024

Go version

go version go1.22.0 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/ec2-user/.cache/go-build'
GOENV='/home/ec2-user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/ec2-user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/ec2-user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.0'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/ec2-user/testrace/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2165696407=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Ran this program, compiled with the race detector enabled:

package main

import (
	"io"
	"runtime"
	"runtime/trace"
)

func main() {
	runtime.SetCPUProfileRate(100)
	trace.Start(io.Discard)
	defer trace.Stop()

	for {
		waste()
	}
}

//go:noinline
func waste() int {
	return 42
}

What did you see happen?

The program crashes with a segfault during a race detector call:

crash output
SIGSEGV: segmentation violation
PC=0x41d920 m=4 sigcode=1 addr=0x4

goroutine 0 gp=0xc0000821c0 m=4 mp=0xc000080008 [idle]:

goroutine 20 gp=0xc000102a80 m=4 mp=0xc000080008 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:474 +0x8 fp=0xc000054e70 sp=0xc000054e60 pc=0x49b028
runtime.traceAdvance(0x0)
        /usr/local/go/src/runtime/trace2.go:519 +0x57f fp=0xc000054fb0 sp=0xc000054e70 pc=0x48a9df
runtime.(*traceAdvancerState).start.func1()
        /usr/local/go/src/runtime/trace2.go:883 +0x2f fp=0xc000054fe0 sp=0xc000054fb0 pc=0x48bcaf
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000054fe8 sp=0xc000054fe0 pc=0x49d001
created by runtime.(*traceAdvancerState).start in goroutine 1
        /usr/local/go/src/runtime/trace2.go:877 +0x165

goroutine 1 gp=0xc0000061c0 m=nil [runnable]:
runtime.asyncPreempt2()
        /usr/local/go/src/runtime/preempt.go:307 +0x39 fp=0xc000057d78 sp=0xc000057d58 pc=0x469779
runtime.asyncPreempt()
        /usr/local/go/src/runtime/preempt_amd64.s:53 +0xdb fp=0xc000057f00 sp=0xc000057d78 pc=0x49e51b
main.main()
        /home/ec2-user/testrace/main.go:15 +0x77 fp=0xc000057f50 sp=0xc000057f00 pc=0x4a78b7
runtime.main()
        /usr/local/go/src/runtime/proc.go:271 +0x29d fp=0xc000057fe0 sp=0xc000057f50 pc=0x46ac1d
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000057fe8 sp=0xc000057fe0 pc=0x49d001

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x5585f0?, 0x564700?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0000427a8 sp=0xc000042788 pc=0x46b06e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:408
runtime.forcegchelper()
        /usr/local/go/src/runtime/proc.go:326 +0xb3 fp=0xc0000427e0 sp=0xc0000427a8 pc=0x46aef3
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000427e8 sp=0xc0000427e0 pc=0x49d001
created by runtime.init.6 in goroutine 1
        /usr/local/go/src/runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000052f80 sp=0xc000052f60 pc=0x46b06e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:408
runtime.bgsweep(0xc00006a000)
        /usr/local/go/src/runtime/mgcsweep.go:278 +0x94 fp=0xc000052fc8 sp=0xc000052f80 pc=0x457374
runtime.gcenable.gowrap1()
        /usr/local/go/src/runtime/mgc.go:203 +0x25 fp=0xc000052fe0 sp=0xc000052fc8 pc=0x44bee5
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000052fe8 sp=0xc000052fe0 pc=0x49d001
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:                                                                                                                              [31/766]
runtime.gopark(0xc00006a000?, 0x4dc0b0?, 0x1?, 0x0?, 0xc000007340?)
        /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000112f78 sp=0xc000112f58 pc=0x46b06e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:408
runtime.(*scavengerState).park(0x5640a0)
        /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000112fa8 sp=0xc000112f78 pc=0x454d89
runtime.bgscavenge(0xc00006a000)
        /usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc000112fc8 sp=0xc000112fa8 pc=0x4552fc
runtime.gcenable.gowrap2()
        /usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc000112fe0 sp=0xc000112fc8 pc=0x44be85
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000112fe8 sp=0xc000112fe0 pc=0x49d001
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:204 +0xa5

goroutine 18 gp=0xc000102700 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000053e20 sp=0xc000053e00 pc=0x46b06e
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:194 +0x145 fp=0xc000053fe0 sp=0xc000053e20 pc=0x44af25
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000053fe8 sp=0xc000053fe0 pc=0x49d001
created by runtime.createfing in goroutine 1
        /usr/local/go/src/runtime/mfinal.go:164 +0x3d

goroutine 19 gp=0xc0001028c0 m=nil [chan receive]:
runtime.gopark(0x471ef6?, 0xc000058f20?, 0xe5?, 0x8b?, 0xc000032508?)
        /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000058ed8 sp=0xc000058eb8 pc=0x46b06e
runtime.chanrecv(0xc0001180c0, 0x0, 0x1)
        /usr/local/go/src/runtime/chan.go:583 +0x36d fp=0xc000058f50 sp=0xc000058ed8 pc=0x43c90d
runtime.chanrecv1(0xc000116050?, 0xeedefa415e?)
        /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc000058f78 sp=0xc000058f50 pc=0x43c592
runtime.(*wakeableSleep).sleep(0xc000136000, 0x5f5e100)
        /usr/local/go/src/runtime/trace2.go:943 +0xa7 fp=0xc000058fa8 sp=0xc000058f78 pc=0x48be27
runtime.traceStartReadCPU.func1()
        /usr/local/go/src/runtime/trace2cpu.go:56 +0x45 fp=0xc000058fe0 sp=0xc000058fa8 pc=0x48c9a5
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000058fe8 sp=0xc000058fe0 pc=0x49d001
created by runtime.traceStartReadCPU in goroutine 1
        /usr/local/go/src/runtime/trace2cpu.go:44 +0x156

goroutine 21 gp=0xc000102c40 m=nil [trace reader (blocked)]:
runtime.gopark(0xc00003f778?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc00003f738 sp=0xc00003f718 pc=0x46b06e
runtime.ReadTrace()
        /usr/local/go/src/runtime/trace2.go:678 +0x31 fp=0xc00003f7a0 sp=0xc00003f738 pc=0x48b551
runtime/trace.Start.func1()
        /usr/local/go/src/runtime/trace/trace.go:130 +0x4a fp=0xc00003f7e0 sp=0xc00003f7a0 pc=0x4a768a
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00003f7e8 sp=0xc00003f7e0 pc=0x49d001
created by runtime/trace.Start in goroutine 1
        /usr/local/go/src/runtime/trace/trace.go:128 +0x125

rax    0x41db10
rbx    0x5c3768
rcx    0x0
rdx    0x5c3768
rdi    0x0
rsi    0x0
rbp    0x7f37a7ffe908
rsp    0x7f37a7ffe8c8
r8     0xa
r9     0x52
r10    0xc0000821c0
r11    0xc0003c17a0
r12    0x7f37a7ffe8d0
r13    0xc000080008
r14    0xc0000821c0
r15    0x52
rip    0x41d920
rflags 0x10246
cs     0x33
fs     0x0
gs     0x0

Here's the traceback from a debugger:

Thread 5 "testrace" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffadefc700 (LWP 9980)]
0x000000000041d920 in __tsan::Acquire(__tsan::ThreadState*, unsigned long, unsigned long) ()
(gdb) bt
#0  0x000000000041d920 in __tsan::Acquire(__tsan::ThreadState*, unsigned long, unsigned long) ()
#1  0x000000000049e71a in racecall () at /usr/local/go/src/runtime/race_amd64.s:396
#2  0x000000000047b807 in runtime.raceacquireg (gp=<optimized out>, addr=<optimized out>) at /usr/local/go/src/runtime/race.go:529
#3  0x000000000047aa25 in runtime.raceacquire (addr=<optimized out>) at /usr/local/go/src/runtime/profbuf.go:557
#4  runtime.(*profBuf).read (b=<optimized out>, mode=1, data=..., tags=..., eof=<optimized out>) at /usr/local/go/src/runtime/profbuf.go:557
#5  0x000000000048cba5 in runtime.traceReadCPU (gen=1, ~r0=<optimized out>) at /usr/local/go/src/runtime/trace2cpu.go:118
#6  0x000000000048d1d8 in runtime.traceCPUFlush (gen=4315920) at /usr/local/go/src/runtime/trace2cpu.go:202
#7  0x000000000048aea5 in runtime.traceAdvance.func4 () at /usr/local/go/src/runtime/trace2.go:525
#8  0x000000000049b08a in runtime.systemstack () at /usr/local/go/src/runtime/asm_amd64.s:509
#9  0x01007fffa4000020 in ?? ()
#10 0x0000000000a00000 in __sanitizer::theDepot ()
#11 0x000000c0000821c0 in ?? ()
#12 0x000000c0000821c0 in ?? ()
#13 0x000000000049af85 in runtime.mstart () at /usr/local/go/src/runtime/asm_amd64.s:394
#14 0x0000000000437664 in runtime/cgo(.text) ()
#15 0x00007fffaf2fdb80 in ?? ()
#16 0x00000000014feaa0 in ?? ()
#17 0x00007fffaf2fda8f in ?? ()
#18 0x00007fffaf2fda8e in ?? ()
#19 0x000000c0000821c0 in ?? ()
#20 0x000000000049af80 in ?? ()
#21 0x0000000000437244 in threadentry ()
#22 0x0000000000000000 in ?? ()

It's faulting on the first instruction in __tsan::Acquire, looks like the first argument is 0:

(gdb) disas $rip,$rip+8
Dump of assembler code from 0x41d920 to 0x41d928:
=> 0x000000000041d920 <_ZN6__tsan7AcquireEPNS_11ThreadStateEmm+0>:      mov    0x4(%rdi),%eax
   0x000000000041d923 <_ZN6__tsan7AcquireEPNS_11ThreadStateEmm+3>:      test   %eax,%eax
   0x000000000041d925 <_ZN6__tsan7AcquireEPNS_11ThreadStateEmm+5>:      je     0x41d930 <_ZN6__tsan7AcquireEPNS_11ThreadStateEmm+16>
   0x000000000041d927 <_ZN6__tsan7AcquireEPNS_11ThreadStateEmm+7>:      retq 
(gdb) p $rdi
$1 = 0

What did you expect to see?

No crash.

FWIW, it seems like the combination of race detector + execution tracer + CPU profiler, on Go 1.22, is not working. If I just run the CPU profiler, without the tracer, I don't see the crash. I also tried with Go 1.21.7 and didn't see a crash.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 8, 2024
@prattmic
Copy link
Member

prattmic commented Feb 9, 2024

Oops, I suspect this is a consequence of https://go-review.googlesource.com/c/go/+/555495.

The race annotation in profBuf.read is manual, perhaps we could borrow a racectx, or maybe even safely skip it altogether (not sure that is a great idea)?

cc @mknyszek

@prattmic prattmic self-assigned this Feb 9, 2024
@prattmic prattmic added this to the Go1.23 milestone Feb 9, 2024
@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Feb 9, 2024
@prattmic
Copy link
Member

prattmic commented Feb 9, 2024

@gopherbot Please backport to 1.22. This causes random crashes when using race+trace+cpu profiling.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #65644 (for 1.22).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/562998 mentions this issue: internal/trace: run children in race mode during race testing

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/562996 mentions this issue: runtime: don't call traceReadCPU on the system stack

gopherbot pushed a commit that referenced this issue Feb 9, 2024
This would have caught #65607.

On my machine:

internal/trace/v2:               ~13s
internal/trace/v2: -short:        ~4s
internal/trace/v2: -race -short: ~20s
internal/trace/v2: -race:        ~46s

For #65607.

Change-Id: I03a555ebaf8c761c98a82b6d35fa16d7a99aa9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/562998
Auto-Submit: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/562559 mentions this issue: [release-branch.go1.22] runtime: don't call traceReadCPU on the system stack

gopherbot pushed a commit that referenced this issue Feb 16, 2024
…m stack

traceReadCPU calls profBuf.read, which does a raceacquire. g0 does not
have a race context, so this crashes when running on the system stack.

We could borrow a race context, but it is simpler to just move
traceReadCPU off of the system stack.

For #65607.
Fixes #65644.

Change-Id: I335155b96d683aebb92b2f4e1eea063dd139f2d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/562996
Auto-Submit: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
(cherry picked from commit 9fa153b)
Reviewed-on: https://go-review.googlesource.com/c/go/+/562559
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
traceReadCPU calls profBuf.read, which does a raceacquire. g0 does not
have a race context, so this crashes when running on the system stack.

We could borrow a race context, but it is simpler to just move
traceReadCPU off of the system stack.

Fixes golang#65607.

Change-Id: I335155b96d683aebb92b2f4e1eea063dd139f2d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/562996
Auto-Submit: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
This would have caught golang#65607.

On my machine:

internal/trace/v2:               ~13s
internal/trace/v2: -short:        ~4s
internal/trace/v2: -race -short: ~20s
internal/trace/v2: -race:        ~46s

For golang#65607.

Change-Id: I03a555ebaf8c761c98a82b6d35fa16d7a99aa9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/562998
Auto-Submit: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
bradfitz pushed a commit to tailscale/go that referenced this issue Mar 5, 2024
…m stack

traceReadCPU calls profBuf.read, which does a raceacquire. g0 does not
have a race context, so this crashes when running on the system stack.

We could borrow a race context, but it is simpler to just move
traceReadCPU off of the system stack.

For golang#65607.
Fixes golang#65644.

Change-Id: I335155b96d683aebb92b2f4e1eea063dd139f2d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/562996
Auto-Submit: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
(cherry picked from commit 9fa153b)
Reviewed-on: https://go-review.googlesource.com/c/go/+/562559
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Mar 6, 2024
…m stack

traceReadCPU calls profBuf.read, which does a raceacquire. g0 does not
have a race context, so this crashes when running on the system stack.

We could borrow a race context, but it is simpler to just move
traceReadCPU off of the system stack.

For golang#65607.
Fixes golang#65644.

Change-Id: I335155b96d683aebb92b2f4e1eea063dd139f2d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/562996
Auto-Submit: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
(cherry picked from commit 9fa153b)
Reviewed-on: https://go-review.googlesource.com/c/go/+/562559
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants