Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gccgo problems with pprof CPU profile #26595

Closed
thanm opened this issue Jul 25, 2018 · 6 comments
Closed

gccgo problems with pprof CPU profile #26595

thanm opened this issue Jul 25, 2018 · 6 comments
Milestone

Comments

@thanm
Copy link
Contributor

thanm commented Jul 25, 2018

What version of Go are you using (go version)?

Tip for Go:
go version devel +48c79734ff Fri Jul 20 20:08:15 2018 +0000 linux/amd64

Tip for Gccgo
go version go1.10.3 gccgo (GCC) 9.0.0 20180724 (experimental) linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Linux/amd64.

What did you do?

When collecting pprof CPU profiles using a gccgo-based toolchain I've noticed that there are unfriendly artifacts in the profile data, at least relative to what I see with the main Go compiler.

Here is a repro recipe. This uses "dwarf-check", a very small application that inspects the generated DWARF for ELF + MachoO binaries, looking for a specific class of bugs/insanities.

Step 1: download and build the app to profile:

 $ cd `go env GOPATH`
 $ go get github.com/thanm/dwarf-check
 $ cd github.com/thanm/dwarf-check
 $ go build .
 $ file dwarf-check
 dwarf-check: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
   statically linked, not stripped

Step 2: run the program to generate a profile. This invocation runs the program on itself, emitting a CPU profile to cpu.p:

 $ ./dwarf-check -v 1 -iters 3 -cpuprofile cpu.p ./dwarf-check
 initiating CPU profile
 in main
 loading ELF for ./dwarf-check
 loading DWARF for ./dwarf-check
 examining DWARF for ./dwarf-check
 read 142948 DIEs, processed 41499 abstract origin refs
 ...
 leaving main
 finishing CPU profile
 $

Step 3: inspect the profile using "pprof"

 $ pprof ./dwarf-check cpu.p
 File: dwarf-check
 Type: cpu
 ...
 $ top15 > top15.txt
 Generating report in top15.txt
 $ raw > raw.txt
 Generating report in raw.txt
 $ tree > tree.txt
 Generating report in tree.txt
 $ quit

Repeat the steps above with the main Go, then with gccgo.

For gccgo:

  • in the build step I used "go build -gccgoflags -static-libgo .", mainly to create a binary with enough interesting DWARF in it

NB: I am using upstream 'pprof', in case that matters (e.g. "go get -u github.com/google/pprof").

What did you expect to see?

For the profile collected with the main Go compiler, the top functions in the rofile look like:

   Showing top 15 nodes out of 61
      flat  flat%   sum%        cum   cum%
     310ms 13.30% 13.30%      800ms 34.33%  runtime.mallocgc
     230ms  9.87% 23.18%      230ms  9.87%  runtime.heapBitsSetType
     200ms  8.58% 31.76%     1390ms 59.66%  debug/dwarf.(*buf).entry
     200ms  8.58% 40.34%      230ms  9.87%  runtime.mapaccess2_fast32
     120ms  5.15% 45.49%      230ms  9.87%  runtime.mapassign_fast64
     100ms  4.29% 49.79%      100ms  4.29%  runtime.memclrNoHeapPointers
      80ms  3.43% 53.22%      120ms  5.15%  debug/dwarf.(*buf).string
      60ms  2.58% 55.79%     1460ms 62.66%  debug/dwarf.(*Reader).Next
      60ms  2.58% 58.37%       60ms  2.58%  runtime.nextFreeFast (inline)
      ...

What did you see instead?

For gccgo, I see the following top function in the profile:

Showing top 15 nodes out of 86
      flat  flat%   sum%        cum   cum%
     2.73s 98.91% 98.91%      2.73s 98.91%  runtime.sigprof.isra.164
     0.01s  0.36% 99.28%      0.02s  0.72%  [dwarf-check.gccgo]
         0     0% 99.28%      2.73s 98.91%  _IO_funlockfile
         0     0% 99.28%      0.10s  3.62%  __nss_passwd_lookup
         0     0% 99.28%      0.02s  0.72%  debug_dwarf.Data.parseAbbrev
         0     0% 99.28%      0.02s  0.72%  debug_dwarf.Data.parseUnits
         0     0% 99.28%      0.02s  0.72%  debug_dwarf.New
         0     0% 99.28%      1.23s 44.57%  debug_dwarf.Reader.Next

The first problem seems to be that the sigprof handler itself is showing up in the profile... seems to me that this should not be happening, based on what I see in the regular Go CPU profile.

The second problem seems to be that there (apparently) bogus entries in the call stacks that are attributed to routines in libc.so, which I am pretty sure are not really being invoked (e.g. _IO_funlockfile).

Here is an excerpt from the "raw" dump from the gccgo run:

PeriodType: cpu nanoseconds
Period: 10000000
Time: 2018-07-25 09:01:46.066810243 -0400 EDT
Duration: 2.51
Samples:
samples/count cpu/nanoseconds
          1   10000000: 1 2 2 3 4 5 6 7 8 9 10 11 12 
          1   10000000: 1 2 2 3 4 13 14 15 16 17 18 11 12 
          1   10000000: 1 2 2 3 4 19 20 21 16 17 18 11 12 
          1   10000000: 1 2 2 3 4 22 16 17 18 11 12 
          9   90000000: 1 2 2 3 4 23 24 25 26 27 28 29 
          1   10000000: 1 2 2 3 4 30 30 31 32 16 17 18 11 12 

and the corresponding locations ("1", "2" above, etc) from later in the dump:

Locations
     1: 0x4c79ef M=1 runtime.sigprof.isra.164 /ssd/gcc-trunk/build-gcc/x86_64-pc-linux-gnu/libgo/../../../gcc-trunk/libgo/go/runtime/proc.go:3421 s=0
     2: 0x4d95bc M=1 runtime.sighandler /ssd/gcc-trunk/build-gcc/x86_64-pc-linux-gnu/libgo/../../../gcc-trunk/libgo/go/runtime/signal_sighandler.go:35 s=0
             runtime.sigtrampgo /ssd/gcc-trunk/build-gcc/x86_64-pc-linux-gnu/libgo/../../../gcc-trunk/libgo/go/runtime/signal_unix.go:312 s=0
     3: 0x569c06 M=1 runtime.sigtramp /ssd/gcc-trunk/build-gcc/x86_64-pc-linux-gnu/libgo/../../../gcc-trunk/libgo/runtime/go-signal.c:131 s=0
     4: 0x7fa77097e0bf M=2 _IO_funlockfile ??:?:0 s=0
     5: 0x4df1f4 M=1 runtime.mapassign /ssd/gcc-trunk/build-gcc/x86_64-pc-linux-gnu/libgo/../../../gcc-trunk/libgo/go/runtime/hashmap.go:583 s=0
     6: 0x42cdd8 M=1 debug_dwarf.Data.parseAbbrev /ssd/gcc-trunk/build-gcc/x86_64-pc-linux-gnu/libgo/../../../gcc-trunk/libgo/go/debug/dwarf/entry.go:89 s=0
     7: 0x42d860 M=1 debug_dwarf.Data.parseUnits /ssd/gcc-trunk/build-gcc/x86_64-pc-linux-gnu/libgo/../../../gcc-trunk/libgo/go/debug/dwarf/unit.go:77 s=0

All of the entries in the profile start with the same sequence:

        1   10000000: 1 2 2 3 4 <N> ...

where is the actual function executing at the time SIGPROF was delivered. Location 4 is the suspicious one -- the IP is recorded as 0x7fa77097e0bf, which falls into the libc mapping:

1: 0x400000/0x74e000/0x0 ./dwarf-check.gccgo 52734d70535f365f564c596b3853416e645572522f5f56623832316f347a6f75636f6c6e4c6644454d2f346a4170633677515135414a714d724d4876542d2f5239433562436a50584756325f72356c45735a4a [FN][FL][LN][IN]
2: 0x7fa77096d000/0x7fa770985000/0x0 /lib/x86_64-linux-gnu/libpthread-2.24.so 29eacef9300ae49e471349173628703ccfeed6d7 [FN][FL][LN][IN]

Here 0x7fa77097e0bf - 0x7fa77096d000 is 0x110bf; disassembling the library in question I see:

  ...
  0000000000011070 <_IO_funlockfile>:
  __funlockfile():
  /tmp/build-debs.Lfl5zt/glibc-2.24/nptl/../sysdeps/pthread/funlockfile.c:28
  ...
     110b0:	nop
     110b1:	nopl   0x0(%rax,%rax,1)
     110b6:	nopw   %cs:0x0(%rax,%rax,1)

  00000000000110c0 <__restore_rt>:
  __restore_rt():
     110c0:	mov    $0xf,%rax
     110c7:	syscall 
     110c9:	nopl   0x0(%rax)

In other words, the IP value in question doesn't really make sense as far as I can tell.

@gopherbot gopherbot added this to the Gccgo milestone Jul 25, 2018
@thanm thanm self-assigned this Jul 25, 2018
@thanm
Copy link
Contributor Author

thanm commented Jul 25, 2018

Updating the skip count in the call to runtime.callers from runtime.sigprof seems to help:

diff --git a/libgo/go/runtime/proc.go b/libgo/go/runtime/proc.go
index a6746c9f..1cd9e1dd 100644
--- a/libgo/go/runtime/proc.go
+++ b/libgo/go/runtime/proc.go
@@ -3416,7 +3416,7 @@ func sigprof(pc uintptr, gp *g, mp *m) {
        n := 0
        if traceback {
                var stklocs [maxCPUProfStack]location
-               n = callers(0, stklocs[:])
+               n = callers(5, stklocs[:])
 
                for i := 0; i < n; i++ {
                        stk[i] = stklocs[i].pc

Oddly, this doesn't work for gollvm (there it looks like we need a skip count of 6).

@thanm
Copy link
Contributor Author

thanm commented Jul 26, 2018

OK, here is a cleaned up version of the call stack at the point where SIGPROF invokes runtime.callers.

#2  0x00000000004a8499 in runtime.sigprof ... libgo/go/runtime/proc.go:3424
#3  0x00000000004ad2f3 in runtime.sighandler ... libgo/go/runtime/signal_sighandler.go:36
#4  0x000000000052b305 in __morestack ()
#5  0x00000000004ade42 in runtime.sigtrampgo ... libgo/go/runtime/signal_unix.go:313
#6  0x0000000000521467 in runtime.sigtramp ... libgo/runtime/go-signal.c:131
#7  <signal handler called>
#8  main.main () at ... testprogram.go:31

The linking step for the gollvm version of the program is picking up system libgcc (as opposed to gccgo, which has its own copy of libgcc); the system libgcc has no debug info. In the gccgo case, when the libbacktrace callback is invoked, it sees the __morestack function name from the DWARF info and is able to exclude the frame immediately. In the gollvm case, the callback can't see the function name and has to include it into the trace -- this is what leads to the off-by-one error.

@ianlancetaylor
Copy link
Member

Maybe we can explicitly discard the stack frame above runtime.sigtramp.

@thanm
Copy link
Contributor Author

thanm commented Jul 26, 2018

@ianlancetaylor yes, I think that is the way to go. I played around with using a fixed skip count and then trying to detect/repair situations where the backtrace callback can't immediately prune away __morestack, but your approach is much simpler.

One new hiccup is that with the fix in, I am now seeing test failures (sporadic) in the runtime/pprof package test. Representative example:

--- FAIL: TestGoroutineSwitch (2.91s)
	pprof_test.go:357: found profile entry for runtime.gogo:
		 0x420f2c (runtime.gogo:283) 0x5160c4 (runtime.execute:2120) 0x51d294 (runtime.schedule:2495) 0x51d865 (runtime.goschedImpl:2556) 0x51d94c (runtime.gosched_m:2564) 0x421241 (runtime_mstart:527)
FAIL
FAIL: runtime/pprof

I looked at the test and the bugs it was intended to fix -- unless I am mistaken it doesn't make much sense to run this test for gccgo (let me know if you concur).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/126175 mentions this issue: libgo: prune sighandler frames in runtime.sigprof

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/126316 mentions this issue: test: skip a runtime/pprof testpoint for gccgo

hubot pushed a commit to gcc-mirror/gcc that referenced this issue Jul 27, 2018
    
    When writing stack frames to the pprof CPU profile machinery, it is
    very important to insure that the frames emitted do not contain any
    frames corresponding to artifacts of the profiling process itself
    (signal handlers, sigprof, etc). This patch changes runtime.sigprof to
    strip out those frames from the raw stack generated by
    "runtime.callers".
    
    Fixes golang/go#26595.
    
    Reviewed-on: https://go-review.googlesource.com/126175


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@263035 138bc75d-0d04-0410-961f-82ee72b054a4
gopherbot pushed a commit that referenced this issue Aug 31, 2018
In the CPU profile tests for gccgo, check to make sure that the
runtime's sigprof handler itself doesn't appear in the profile. Add a
"skip if gccgo" guard to one testpoint.

Updates #26595

Change-Id: I92a44161d61f17b9305ce09532134edd229745a7
Reviewed-on: https://go-review.googlesource.com/126316
Run-TryBot: Than McIntosh <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@golang golang locked and limited conversation to collaborators Jul 27, 2019
@rsc rsc unassigned thanm Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants