Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: panic in appendLocsForStack on fuchsia-arm64 #40823

Open
tamird opened this issue Aug 16, 2020 · 26 comments
Open

runtime/pprof: panic in appendLocsForStack on fuchsia-arm64 #40823

tamird opened this issue Aug 16, 2020 · 26 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Fuchsia
Milestone

Comments

@tamird
Copy link
Contributor

tamird commented Aug 16, 2020

We've been seeing this issue for a while; filing this bug now that we've upgraded to go1.15. (See https://fxbug.dev/52575).

This happens sometimes when running the runtime/pprof tests:

panic: runtime error: slice bounds out of range [3:2]

goroutine 6 [running]:
runtime/pprof.(*profileBuilder).appendLocsForStack(0x3e93c15c160, 0x3e93c130840, 0x0, 0x8, 0x3e93c0560b0, 0x2, 0x2, 0x0, 0x2, 0x8) 
/b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/pprof/proto.go:410 +0x5d4
runtime/pprof.(*profileBuilder).build(0x3e93c15c160)
/b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/pprof/proto.go:365 +0xf4
runtime/pprof.profileWriter(0xb4050418c0, 0x3e93c386030)
/b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/pprof/pprof.go:813 +0xd4
created by runtime/pprof.StartCPUProfile
/b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/pprof/pprof.go:784 +0x114

There's no diff between Fuchsia's fork and go1.15 in src/runtime/pprof. See https://fuchsia.googlesource.com/third_party/go/+diff/go1.15..master.

Possibly a regression of #35538, related to #37446 and its fix in https://golang.org/cl/221577.

@prattmic

@randall77
Copy link
Contributor

@hyangah

@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 17, 2020
@prattmic prattmic modified the milestones: Backlog, Go1.16 Aug 17, 2020
@prattmic
Copy link
Member

I agree that at face value this looks like a regression of #35538, likely in http://golang.org/cl/221577.

http://golang.org/cl/221577 was merged into the Fuchsia tree in https://fuchsia-review.googlesource.com/c/third_party/go/+/374435 (April 4).

However, https://fxbug.dev/52575 was filed on May 13 just a few hours after https://fuchsia-review.googlesource.com/c/third_party/go/+/389033 (Go 1.14.3 merge), which IMO strongly implicates something in that merge.

@tamird do you know if the May 13 bug was really the first cases of this you saw, or was there perhaps some earlier bug that's not attached?

@tamird
Copy link
Contributor Author

tamird commented Aug 17, 2020

Yes, there were earlier bugs; sorry to have mislead you!

https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=44718 was filed on January 24
https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=47563 was filed on March 3

@prattmic
Copy link
Member

Ah, interesting.

https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=44718

I don't recognize this crash (log, search for pprof) (@hyangah maybe you do?), but it does not seem related to this bug.

https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=47563

This crash looks like #38605. The May 13 merge included the fix for this. Though it is not clear to me if the appendLocsForStack panics were really new after that merge or the other allocToCache crash was just so much more frequent that we never saw the appendLocsForStack panic.

@hyangah
Copy link
Contributor

hyangah commented Aug 17, 2020

Ah, interesting.

https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=44718

I don't recognize this crash (log, search for pprof) (@hyangah maybe you do?), but it does not seem related to this bug.

TestBlockProfile failed with debug=1 option. That's not the code path that exercises the proto encoding code for debug>0. Not sure why the block profile does not include the expected channel close event, but it's not a related bug to appendLocsForStack.

https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=47563

This crash looks like #38605. The May 13 merge included the fix for this. Though it is not clear to me if the appendLocsForStack panics were really new after that merge or the other allocToCache crash was just so much more frequent that we never saw the appendLocsForStack panic.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/248728 mentions this issue: runtime/pprof: add logging for debugging issue 40823

@tamird
Copy link
Contributor Author

tamird commented Aug 21, 2020

With the change above applied, here's what we see:

=== RUN   TestCPUProfileMultithreaded
stk:#   0xde474a80eb    runtime.nanotime1+0x10b /b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/os_fuchsia.go:478
#       0xde474b1f83    runtime._System+0x3     /b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/proc.go:3911
l.pcs:# 0xde474a80eb    runtime.nanotime1+0x10b /b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/os_fuchsia.go:478
#       0xde474a80e4    runtime.nanotime+0x104  /b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/time_nofake.go:19
#       0xde474a8050    runtime.suspendG+0x70   /b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/preempt.go:252
panic: runtime error: slice bounds out of range [3:2]

goroutine 6 [running]:
runtime/pprof.(*profileBuilder).appendLocsForStack(0x1fd1c0b3340, 0x1fd1c130840, 0x0, 0x8, 0x1fd1c216130, 0x2, 0x2, 0x0, 0x6, 0x8) 
/b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/pprof/proto.go:415 +0x640
runtime/pprof.(*profileBuilder).build(0x1fd1c0b3340)
/b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/pprof/proto.go:366 +0xf4
runtime/pprof.profileWriter(0xde4768f8c0, 0x1fd1c10a120)
/b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/pprof/pprof.go:813 +0xd4
created by runtime/pprof.StartCPUProfile
/b/s/w/ir/x/w/fuchsia/out/not-default/host-tools/goroot/src/runtime/pprof/pprof.go:784 +0x114

@prattmic
Copy link
Member

Ah, this is very helpful. It looks like this might be a bug in runtime.expandFinalInlineFrame.

From the PCs, I suspect that nanotime1 and nanotime are inlined into suspendG, so expandFinalInlineFrame should have expanded the stack to include nanotime and suspendG, but it seems that it didn't for some reason.

Could you provide a copy of the crashing binary, or tell me how to build it so I can take a closer look?

@prattmic
Copy link
Member

So I think the problem here is that expandFinalInlineFrame doesn't account for the magic _System, _GC, etc frames. It should expand the second frame if one of those appear at the bottom. I'll send out a patch for this.

That said, I don't understand why the traceback would be failing here: https://fuchsia.googlesource.com/third_party/go/+/refs/heads/master/src/runtime/proc.go#4069

@prattmic
Copy link
Member

prattmic commented Aug 21, 2020

Regarding why the traceback is failing in the first place:

Since nanotime1 does call into the VDSO, it is possible there is some interaction with the handling of vdsoPC/vdsoSP or some weirdness with how profiling signals are delivered on Fuchsia (haven't looked at that much), though I'll want to take a closer at the binary.

As an aside, you all will want to port http://golang.org/cl/246763 for Fuchsia. I don't think that is the direct cause of these crashes, but is certainly suspicious.

@tamird
Copy link
Contributor Author

tamird commented Aug 21, 2020

@prattmic
Copy link
Member

If you can reproduce this issue, I think it would be valuable to see if making nanotime1 reentrant fixes the issue, to rule that out. I wrote https://fuchsia-review.googlesource.com/c/third_party/go/+/419640, which I believe is what you need, though it is untested since I don't have a Fuchsia environment.

@tamird
Copy link
Contributor Author

tamird commented Aug 23, 2020

I haven't been able to reproduce locally, but the issue only appears on arm64. The previous file I provided was the amd64 binary; here's the arm64: https://gofile.io/d/xPjFh3.

@prattmic
Copy link
Member

prattmic commented Aug 24, 2020

Reproducing only on ARM64 makes me wonder if this is a memory ordering issue.

VDSO calls set m.vdsoPC followed by m.vdsoSP. This order is important, because if sigprof observes that m.vdsoSP is set, then it assumes m.vdsoPC is valid. On Unix, sigprof is running on the same thread as the m it is profiling, therefore no memory synchronization is necessary.

However, on Fuchsia it is running on a different thread. Thus given that ARM allows stores to be reordered after stores, sigprof may actually observe non-zero m.vdsoSP without a valid m.vdsoPC.

Before calling sigprof, we suspend the target thread with zx_task_suspend_token. Does that call include a complete memory barrier to ensure we see fully-consistent values? The docs don't specify.

On AMD64, stores are ordered, so without a memory barrier we will still see consistent values (though perhaps stale!).

I see that the Windows port has a very similar structure, though there is no Windows ARM64 port, and I don't know if _SuspendThread includes memory barrier either.

cc @aclements @mknyszek to see if this sounds right, plus if you know more about Windows.

@mknyszek
Copy link
Contributor

@prattmic Your theory seems plausible to me, though I'm still tracking down how "bad m.vdsoPC" precisely translates into this crash. I suppose what this means is the gentraceback based on mp.vdsoPC and mp.vdsoSP will fail, we'll fall into the next case (the one you mentioned earlier with funcPC(_System)) and that's why stk gets populated the way it does.

Oh, I see, in an earlier post you posit that nanotime and nanotime1 are inlined into suspendG at the PC we encounter, so after expandFinalInlineFrame the _System is the extra element causing the crash. Have you confirmed yet that they got inlined? If so, I can see this all connecting together. For this inlined case the profiling code implicitly expects not to find a symbol that's marked as inline if it was marked with the pseudo-symbol _System, but because we observe a bad vdsoPC we end up falling into the wrong case (actually, any ordering where they're not both observed to be set could lead us to drop down into that case, making this somewhat likelier).

If this is the problem then we need either (1) synchronization of vdsoPC and vdsoSP or (2) make the profiling code be able to handle this extra _System element in the inlined case (which is what you suggested earlier), though it means some stacks might not be attributed "correctly." Does all this align with your thinking @prattmic?

Windows is the only platform (AFAIK) we officially support that has the "suspend thread at a distance" behavior, everything else uses signals. We had a windows-arm port (not arm64, as you point out) going at some point, but the builder was routinely broken.

@prattmic
Copy link
Member

Yes, nanotime and nanotime1 are inlined in the binary from #40823 (comment). It looks like:

  :0                    0x7b0d8                 390083e1                MOVB R1, 32(RSP)                                
  :0                    0x7b0dc                 b4000284                CBZ R4, 20(PC)                                  
  :0                    0x7b0e0                 f90027e5                MOVD R5, 72(RSP)                                
  :0                    0x7b0e4                 d503201f                NOOP                                            
  :0                    0x7b0e8                 9400d2de                CALL syscall/zx.Sys_clock_get_monotonic(SB)     
  :0                    0x7b0ec                 f94007e0                MOVD 8(RSP), R0                                 
  :0                    0x7b0f0                 f94027e1                MOVD 72(RSP), R1                                
  :0                    0x7b0f4                 eb01001f                CMP R1, R0                                      
  :0                    0x7b0f8                 540000ca                BGE 6(PC)                                       
  :0                    0x7b0fc                 d2800140                MOVD $10, R0                                    
  :0                    0x7b100                 b9000be0                MOVW R0, 8(RSP)                                 
  :0                    0x7b104                 9400ad4f                CALL runtime.procyield(SB)                      
  :0                    0x7b108                 f94027e0                MOVD 72(RSP), R0                                
  :0                    0x7b10c                 17ffffc8                JMP -56(PC)                                     
  :0                    0x7b110                 9400b0dc                CALL runtime.osyield(SB)                        
  :0                    0x7b114                 d503201f                NOOP                                            
  :0                    0x7b118                 9400d2d2                CALL syscall/zx.Sys_clock_get_monotonic(SB)     
  :0                    0x7b11c                 f94007e0                MOVD 8(RSP), R0                                 
  :0                    0x7b120                 d282711b                MOVD $5000, R27                                 
  :0                    0x7b124                 8b1b0000                ADD R27, R0, R0                                 
  :0                    0x7b128                 17ffffc1                JMP -63(PC)                                     
...

Note that this isn't the exact same binary that crashed, but these instructions match up almost perfectly with the stack, so perhaps it is identical.

I think we ultimately need (1) as you suggest. (2) should stop the immediate crashing, but stacks will be misattributed, and I wonder if there are other odd cases we'll hit with odd memory ordering.

We could make these vdso fields explicitly atomics, or perhaps the ARM64 assembly should just manually include DMB/DSB instructions at the right point?

Of course, this is predicated on the assumption that zx_task_suspend_token doesn't include an implicit memory barrier. I briefly looked at the Zircon source and didn't notice any, but I'm far from familiar. @tamird do you know, or know who would be a better authority on this?

@prattmic
Copy link
Member

w.r.t. Windows, I forgot that Windows has no VDSO and thus doesn't use vdsoSP/vdsoPC at all! That said, it has very similar libcallsp/libcallpc, which may have similar issues.

@dhobsd
Copy link
Contributor

dhobsd commented Aug 24, 2020

This makes sense. I believe there are implicit memory barriers in zx_object_wait_one (due to locking) -- which is called to wait for the profiled thread to stop after the zx_task_suspend_token but before sigprof is called. My familiarity with ARM RMO is poor; wouldn't we also need a barrier between the store to vdsoPC and vdsoSP to guarantee remote stores are seen in-order? I think sequential consistency is easier to reason about; my vote would be to make these atomic given that those APIs are generally expected to make that guarantee.

@odeke-em
Copy link
Member

odeke-em commented Feb 7, 2021

Thanks everyone for the investigations, and for the patience. Punting to Go1.17.

@odeke-em odeke-em modified the milestones: Go1.16, Go1.17 Feb 7, 2021
@tamird
Copy link
Contributor Author

tamird commented Feb 9, 2021

We stopped seeing this for a while, but having just upgraded to go1.16rc1, it's back.

test.log

EDIT: correction, we stopped seeing it because we started skipping the previous occurrence. It seems that we now have a new occurrence.

@odeke-em
Copy link
Member

odeke-em commented Feb 9, 2021

Ouch, sorry about that @tamird, and thank you for the report. I shall be reinstating as a release blocker for Go1.16

@odeke-em odeke-em modified the milestones: Go1.17, Go1.16 Feb 9, 2021
@dmitshur
Copy link
Contributor

dmitshur commented Feb 9, 2021

I don't think this needs to block the Go 1.16 release. Given that Go 1.16 RC 1 is already out and the bar for fixes is very high, it doesn't seem that this can make it to Go 1.16. I'll reset the milestone to Backlog, and it can be updated when the timing of the fix is more clear.

@dmitshur dmitshur modified the milestones: Go1.16, Backlog Feb 9, 2021
@dhobsd
Copy link
Contributor

dhobsd commented Mar 18, 2021

The fence was added, and we're still seeing this come up in the same context.

I was never able to repro this on a 128-core, multi-node Cavium system with innumerable iterations over several days, even with cpusets that should've guaranteed at least half of the threads to run on a different node (and thus increase the likelihood of a memory ordering issue).

I'm not happy blaming this on qemu without actually understanding the problem, but I'm wondering if this is a similar issue to #9605. These ARM tests do run in a modern descendant of the qemu-based emulator that was being used in that issue, but that issue suggested this was flaky on real hardware, too. However, I have a hard time reconciling not being able to reproduce the issue once (in who-knows-how-many thousands of runs) on real hardware with it happening quite regularly (~hundreds or thousands of runs) in a qemuish arm64 env.

@tamird
Copy link
Contributor Author

tamird commented Dec 30, 2021

The benchmark that was producing this failure has been disabled until yesterday; having enabled it, we're now seeing this even on fuchsia-x64; e.g. this build.

stk:#   0x179ab65affb   runtime.(*gcBits).bytep+0x13b   /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/mheap.go:1938
#       0x179ab66cbe0   runtime._ExternalCode+0x0       /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/proc.go:4668
l.pcs:# 0x179ab65affb   runtime.(*gcBits).bytep+0x13b   /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/mheap.go:1938
#       0x179ab65afe3   runtime.(*gcBits).bitp+0x123    /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/mheap.go:1944
#       0x179ab65affc   runtime.(*mspan).markBitsForIndex+0x13c /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/mbitmap.go:257
#       0x179ab65afe2   runtime.wbBufFlush1+0x122       /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/mwbbuf.go:266
panic: runtime error: slice bounds out of range [4:2]
goroutine 34 [running]:
runtime/pprof.(*profileBuilder).appendLocsForStack(0x5dac058000, {0x5dac102300, 0x0, 0x20}, {0x5dac49a450, 0x2, 0x2})
        /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/pprof/proto.go:414 +0x893
runtime/pprof.(*profileBuilder).build(0x5dac058000)
        /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/pprof/proto.go:365 +0x2c5
runtime/pprof.profileWriter({0x179aba3c500, 0x5dac108008})
        /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/pprof/pprof.go:813 +0x125
created by runtime/pprof.StartCPUProfile
        /b/s/w/ir/cache/incremental/out/not-default/host-tools/goroot/src/runtime/pprof/pprof.go:784 +0x171

@prattmic
Copy link
Member

The problem here is two-fold (as previously partially discussed in #40823 (comment)):

  1. We have a stack with missing inline frames which expandFinalInlineFrame fails to expand because the truncated stack is followed by _ExternalCode, which it doesn't handle. expandFinalInlineFrame could be adjusted to handle this case.
  2. We really shouldn't be getting _ExternalCode here at all. This is clearly in Go code. _ExternalCode comes from sigprofNonGoPC, which Fuchsia calls when profiling at https://fuchsia.googlesource.com/third_party/go/+/refs/heads/main/src/runtime/os_fuchsia.go#642. AFAICT, the logic here here is wrong: mp.curg is not always set. Fuchsia should almost certainly more closely match the Windows logic, which handles more cases: https://cs.opensource.google/go/go/+/master:src/runtime/os_windows.go;l=1221-1223. I'll send a CL for this.

@prattmic
Copy link
Member

I've mailed (untested!) https://fuchsia-review.googlesource.com/c/third_party/go/+/629462, which I hope fixes this.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@mknyszek mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Fuchsia
Projects
Status: Triage Backlog
Development

No branches or pull requests

9 participants