Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: heap/allocs profile with legacy format yields different results from the profile with proto format #25096

Closed
josharian opened this issue Apr 26, 2018 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@josharian
Copy link
Contributor

Reproduce:

$ compilebench -alloc -memprofile=m.p -run=Template
BenchmarkTemplate 1 199528196 ns/op 247178000 user-ns/op 34445184 B/op 321957 allocs/op
$ go tool pprof -alloc_space `go tool -n compile` m.p
File: compile
Type: alloc_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top15
Showing nodes accounting for 22332.44kB, 71.95% of 31039.35kB total
Showing top 15 nodes out of 186
      flat  flat%   sum%        cum   cum%
 3584.44kB 11.55% 11.55%  3584.44kB 11.55%  cmd/compile/internal/gc.nodl
 2094.95kB  6.75% 18.30%  2094.95kB  6.75%  runtime.makeBucketArray
 2090.29kB  6.73% 25.03%  2090.29kB  6.73%  cmd/compile/internal/gc.newliveness
 2079.07kB  6.70% 31.73%  3105.51kB 10.01%  cmd/compile/internal/ssa.(*regAllocState).init
 2048.19kB  6.60% 38.33%  2048.19kB  6.60%  cmd/compile/internal/types.New
 2003.58kB  6.45% 44.78%  2529.01kB  8.15%  cmd/compile/internal/gc.init
 1538.56kB  4.96% 49.74%  2052.06kB  6.61%  cmd/compile/internal/ssa.cse
 1536.55kB  4.95% 54.69%  2061.97kB  6.64%  runtime.mapassign
 1056.69kB  3.40% 58.10%  1056.69kB  3.40%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc
 1024.23kB  3.30% 61.39%  1024.23kB  3.30%  cmd/compile/internal/gc.newnamel
 1024.17kB  3.30% 64.69%  4145.67kB 13.36%  cmd/compile/internal/ssa.(*regAllocState).regalloc
  683.75kB  2.20% 66.90%   683.75kB  2.20%  cmd/compile/internal/gc.initssaconfig
  528.17kB  1.70% 68.60%  1040.40kB  3.35%  cmd/compile/internal/ssa.(*stackAllocState).init
  525.43kB  1.69% 70.29%   525.43kB  1.69%  cmd/compile/internal/ssa.schedule
  514.38kB  1.66% 71.95%   514.38kB  1.66%  cmd/compile/internal/ssa.(*Func).newSparseMap

Though the pprof output says "alloc_space" at the top, this looks conspicuously like "inuse_space" to me.

If you change writeLegacyFormat in cmd/compile/internal/gc/util.go from 1 to 0, so that it writes using the pb format, the results look correct again:

$ compilebench -alloc -memprofile=m.p -run=Template
BenchmarkTemplate 1 198046550 ns/op 240827000 user-ns/op 0 B/op 0 allocs/op
$ go tool pprof -alloc_space `go tool -n compile` m.p
File: compile
Type: alloc_space
Time: Apr 25, 2018 at 9:57pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top15
Showing nodes accounting for 18462.70kB, 56.27% of 32809.29kB total
Showing top 15 nodes out of 221
      flat  flat%   sum%        cum   cum%
 3072.41kB  9.36%  9.36%  3072.41kB  9.36%  cmd/compile/internal/gc.nodl
 2592.39kB  7.90% 17.27%  2592.39kB  7.90%  cmd/compile/internal/ssa.(*regAllocState).init
 2048.19kB  6.24% 23.51%  2048.19kB  6.24%  cmd/compile/internal/types.New
 2003.58kB  6.11% 29.62%  2003.58kB  6.11%  cmd/compile/internal/gc.init
 1028.18kB  3.13% 32.75%  1028.18kB  3.13%  cmd/compile/internal/ssa.schedule
    1026kB  3.13% 35.88%  1538.12kB  4.69%  cmd/internal/obj.(*LSym).WriteAddr
 1025.47kB  3.13% 39.00%  1025.47kB  3.13%  cmd/compile/internal/ssa.tighten
    1025kB  3.12% 42.13%     1025kB  3.12%  cmd/internal/obj.(*LSym).Grow
 1024.56kB  3.12% 45.25%  1024.56kB  3.12%  cmd/internal/obj.(*Link).InitTextSym
 1024.02kB  3.12% 48.37%  1024.02kB  3.12%  cmd/compile/internal/ssa.(*stackAllocState).computeLive
  532.26kB  1.62% 49.99%   532.26kB  1.62%  cmd/compile/internal/types.InternString
  516.64kB  1.57% 51.57%   516.64kB  1.57%  cmd/compile/internal/gc.newliveness
  516.01kB  1.57% 53.14%  1540.03kB  4.69%  cmd/compile/internal/ssa.(*stackAllocState).init
     514kB  1.57% 54.71%      514kB  1.57%  cmd/compile/internal/gc.(*EscState).nodeEscState
     514kB  1.57% 56.27%      514kB  1.57%  cmd/compile/internal/gc.iimport

The choice of legacy format is so that compilebench can parse the stats dumped at the top; see e8d5989.

I discovered all this because I wanted to switch the compiler to use the new "allocs" profile and was confused that it appeared not to work. :)

We could presumably change compilebench to work somehow with the new pprof format, but depending on the root cause this bug might also impact others. Is this a runtime/pprof bug? A cmd/pprof bug?

cc @hyangah

@josharian josharian added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 26, 2018
@josharian josharian added this to the Go1.11 milestone Apr 26, 2018
@hyangah
Copy link
Contributor

hyangah commented Apr 26, 2018

What output do you see when using -inuse_space flag? (with the legacy format).

The new 'alloc' profile should be identical with the 'heap' profile for the legacy format.

@josharian
Copy link
Contributor Author

What output do you see when using -inuse_space flag? (with the legacy format).

They are similar but slightly different:

$ compilebench -alloc -memprofile=m.p -run=Template
BenchmarkTemplate 1 180222981 ns/op 222468000 user-ns/op 34440864 B/op 321967 allocs/op
$ go tool pprof -alloc_space `go tool -n compile` m.p
File: compile
Type: alloc_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 16.10MB, 46.94% of 34.30MB total
Showing top 10 nodes out of 213
      flat  flat%   sum%        cum   cum%
    3.07MB  8.95%  8.95%     3.07MB  8.95%  runtime.makeBucketArray
       3MB  8.75% 17.70%        3MB  8.75%  cmd/compile/internal/gc.nodl
    1.96MB  5.70% 23.40%     1.96MB  5.70%  cmd/compile/internal/gc.init
    1.50MB  4.37% 27.78%     1.50MB  4.37%  cmd/compile/internal/ssa.(*regAllocState).computeLive
    1.50MB  4.37% 32.15%     1.50MB  4.37%  runtime.rawstringtmp
    1.04MB  3.02% 35.17%     1.04MB  3.02%  cmd/compile/internal/gc.newliveness
    1.02MB  2.99% 38.16%     2.52MB  7.36%  cmd/compile/internal/ssa.(*regAllocState).init
       1MB  2.93% 41.09%        1MB  2.93%  cmd/compile/internal/gc.scopePCs
       1MB  2.92% 44.01%     1.50MB  4.38%  cmd/compile/internal/ssa.cse
       1MB  2.92% 46.94%        1MB  2.92%  cmd/compile/internal/ssa.postorderWithNumbering
$ go tool pprof -inuse_space `go tool -n compile` m.p
File: compile
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 10946.23kB, 81.04% of 13506.54kB total
Showing top 10 nodes out of 116
      flat  flat%   sum%        cum   cum%
 2003.58kB 14.83% 14.83%  2003.58kB 14.83%  cmd/compile/internal/gc.init
 1601.68kB 11.86% 26.69%  1601.68kB 11.86%  runtime.makeBucketArray
 1536.30kB 11.37% 38.07%  1536.30kB 11.37%  cmd/compile/internal/gc.nodl
 1024.23kB  7.58% 45.65%  1024.23kB  7.58%  cmd/compile/internal/gc.newnamel
 1024.11kB  7.58% 53.23%  1024.11kB  7.58%  cmd/compile/internal/ssa.BuildFuncDebug
 1024.11kB  7.58% 60.82%  1594.15kB 11.80%  cmd/internal/obj.(*Link).LookupInit
 1024.03kB  7.58% 68.40%  1024.03kB  7.58%  runtime.rawstringtmp
  683.75kB  5.06% 73.46%   683.75kB  5.06%  cmd/compile/internal/gc.initssaconfig
  512.28kB  3.79% 77.25%  1536.31kB 11.37%  cmd/internal/obj.(*Link).InitTextSym
  512.16kB  3.79% 81.04%   512.16kB  3.79%  cmd/compile/internal/gc.expandmeth

Neither is an accurate alloc_space output.

The new 'alloc' profile should be identical with the 'heap' profile for the legacy format.

Seems like that should be documented, although that is separate than the issue here.

@hyangah
Copy link
Contributor

hyangah commented Apr 26, 2018

Why made you think alloc_space is not reporting alloc_space?

The protobuf-based alloc profile reports 32809kB total.
The last legacy alloc profile reports 34.30MB total.

What is the heap sampling rate?

@josharian
Copy link
Contributor Author

Why made you think alloc_space is not reporting alloc_space?

Years of looking at this particular profile. :)

The total number might be right but the listed functions definitely are not.

Also the results for those same commands are different if we ask for pb format. And I know from experience the pb format is correct.

I’d paste output here but I’m on my phone.

@hyangah
Copy link
Contributor

hyangah commented Apr 27, 2018

Can you try to collect profile with memprofilerate=1?
I tried with it (and changing "heap" cmd/compile/internal/gc/util.go to "allocs")
and got the similar-looking outputs from legacy "heap", "allocs" profiles, and also pb "allocs" profile.

@hyangah
Copy link
Contributor

hyangah commented May 29, 2018

Attached are the profiles collected with memprofilerate=1. (tip rev ef53de8)

  • legacy.p -> legacy format
  • proto.p -> proto format (with change of writeLegacyFormat in cmd/compile/internal/gc/util.go from 1 to 0)
    profiles.tar.gz

Proto format:

$ compilebench -alloc -memprofile=proto.p -run=Template -memprofilerate=1
BenchmarkTemplate 1 2048325873 ns/op 2412000000 user-ns/op 0 B/op 0 allocs/op
$ go tool pprof -alloc_space `go tool -n compile` proto.p
File: compile
Type: alloc_space
Time: May 29, 2018 at 2:10pm (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top15
Showing nodes accounting for 17077.06kB, 49.01% of 34844.14kB total
Dropped 798 nodes (cum <= 174.22kB)
Showing top 15 nodes out of 268
      flat  flat%   sum%        cum   cum%
 3138.25kB  9.01%  9.01%  3138.25kB  9.01%  cmd/compile/internal/gc.nodl
    1960kB  5.63% 14.63%     1960kB  5.63%  cmd/compile/internal/gc.init
 1718.83kB  4.93% 19.56%  2546.77kB  7.31%  cmd/compile/internal/ssa.(*regAllocState).init
 1565.16kB  4.49% 24.06%  1565.16kB  4.49%  cmd/compile/internal/gc.newnamel
 1443.92kB  4.14% 28.20%  1443.92kB  4.14%  cmd/compile/internal/types.New
 1404.48kB  4.03% 32.23%  1404.48kB  4.03%  cmd/compile/internal/ssa.liveValues
 1204.22kB  3.46% 35.69%  1758.28kB  5.05%  cmd/compile/internal/ssa.cse
 1010.50kB  2.90% 38.59%  1025.57kB  2.94%  cmd/compile/internal/ssa.schedule
  702.23kB  2.02% 40.60%      722kB  2.07%  cmd/compile/internal/ssa.(*regAllocState).computeLive
  586.05kB  1.68% 42.28%   586.05kB  1.68%  cmd/internal/obj.(*Link).LookupInit
  522.08kB  1.50% 43.78%  1714.98kB  4.92%  cmd/compile/internal/ssa.(*regAllocState).regalloc
  486.50kB  1.40% 45.18%   631.91kB  1.81%  cmd/internal/obj.(*LSym).WriteAddr
  483.39kB  1.39% 46.57%   483.39kB  1.39%  cmd/compile/internal/gc.scopePCs
  434.25kB  1.25% 47.81%   434.25kB  1.25%  cmd/internal/obj.(*LSym).Grow
  417.22kB  1.20% 49.01%   615.39kB  1.77%  cmd/compile/internal/gc.(*Liveness).epilogue

Legacy format:

$ compilebench -alloc -memprofile=legacy.p -run=Template -memprofilerate=1
BenchmarkTemplate 1 16227441634 ns/op 16956000000 user-ns/op 36131488 B/op 346778 allocs/op
$ go tool pprof -alloc_space `go tool -n compile` legacy.p
File: compile
Type: alloc_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top15
Showing nodes accounting for 18882.81kB, 54.19% of 34845.70kB total
Dropped 826 nodes (cum <= 174.23kB)
Showing top 15 nodes out of 281
      flat  flat%   sum%        cum   cum%
 3138.25kB  9.01%  9.01%  3138.25kB  9.01%  cmd/compile/internal/gc.nodl
 2214.36kB  6.35% 15.36%  2214.36kB  6.35%  runtime.makeBucketArray
    1960kB  5.62% 20.99%     1960kB  5.62%  cmd/compile/internal/gc.init
 1709.69kB  4.91% 25.89%  2546.75kB  7.31%  cmd/compile/internal/ssa.(*regAllocState).init
 1565.16kB  4.49% 30.38%  1565.16kB  4.49%  cmd/compile/internal/gc.newnamel
 1445.42kB  4.15% 34.53%  1445.42kB  4.15%  cmd/compile/internal/types.New
 1404.48kB  4.03% 38.56%  1404.48kB  4.03%  cmd/compile/internal/ssa.liveValues
 1121.23kB  3.22% 41.78%  1759.83kB  5.05%  cmd/compile/internal/ssa.cse
 1010.50kB  2.90% 44.68%  1025.57kB  2.94%  cmd/compile/internal/ssa.schedule
  702.23kB  2.02% 46.70%   721.98kB  2.07%  cmd/compile/internal/ssa.(*regAllocState).computeLive
  580.56kB  1.67% 48.36%   580.56kB  1.67%  runtime.rawstringtmp
  545.44kB  1.57% 49.93%   886.59kB  2.54%  runtime.mapassign_fast64ptr
  515.61kB  1.48% 51.41%  1715.38kB  4.92%  cmd/compile/internal/ssa.(*regAllocState).regalloc
  486.50kB  1.40% 52.80%   631.91kB  1.81%  cmd/internal/obj.(*LSym).WriteAddr
  483.39kB  1.39% 54.19%   483.39kB  1.39%  cmd/compile/internal/gc.scopePCs

Looked slightly different, but if we let pprof hide 'runtime' functions, the legacy profile's top15 looks same as the output from the proto format.

$ go tool pprof -alloc_space `go tool -n compile` legacy.p
File: compile
Type: alloc_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) hide=runtime.*
(pprof) top15
Active filters:
   hide=runtime.*
Showing nodes accounting for 17078.84kB, 49.01% of 34845.70kB total
Dropped 782 nodes (cum <= 174.23kB)
Showing top 15 nodes out of 267
      flat  flat%   sum%        cum   cum%
 3138.25kB  9.01%  9.01%  3138.25kB  9.01%  cmd/compile/internal/gc.nodl
    1960kB  5.62% 14.63%     1960kB  5.62%  cmd/compile/internal/gc.init
 1718.83kB  4.93% 19.56%  2546.75kB  7.31%  cmd/compile/internal/ssa.(*regAllocState).init
 1565.16kB  4.49% 24.06%  1565.16kB  4.49%  cmd/compile/internal/gc.newnamel
 1445.42kB  4.15% 28.20%  1445.42kB  4.15%  cmd/compile/internal/types.New
 1404.48kB  4.03% 32.23%  1404.48kB  4.03%  cmd/compile/internal/ssa.liveValues
 1205.77kB  3.46% 35.69%  1759.83kB  5.05%  cmd/compile/internal/ssa.cse
 1010.50kB  2.90% 38.59%  1025.57kB  2.94%  cmd/compile/internal/ssa.schedule
  702.23kB  2.02% 40.61%   721.98kB  2.07%  cmd/compile/internal/ssa.(*regAllocState).computeLive
  584.83kB  1.68% 42.29%   584.83kB  1.68%  cmd/internal/obj.(*Link).LookupInit
  522.08kB  1.50% 43.79%  1715.38kB  4.92%  cmd/compile/internal/ssa.(*regAllocState).regalloc
  486.50kB  1.40% 45.18%   631.91kB  1.81%  cmd/internal/obj.(*LSym).WriteAddr
  483.39kB  1.39% 46.57%   483.39kB  1.39%  cmd/compile/internal/gc.scopePCs
  434.22kB  1.25% 47.82%   434.22kB  1.25%  cmd/internal/obj.(*LSym).Grow
  417.20kB  1.20% 49.01%   615.27kB  1.77%  cmd/compile/internal/gc.(*Liveness).epilogue

So, this seems like a bug caused by differences in legacy/proto format profile handling (cc @aalexand)

@aalexand
Copy link
Contributor

@hyangah Is the bug here that the legacy format does not hide the runtime functions like the proto format does or am I misreading the whole thing?

@hyangah
Copy link
Contributor

hyangah commented May 29, 2018

@aalexand I don't know if that's the whole story of the issue Josh sees, but making the legacy profile handler hide the runtime functions will help investigating and verifying the original issue.

FYI, https://github.com/golang/go/blob/master/src/runtime/pprof/protomem.go#L38 is where runtime functions are skipped.

hyangah added a commit to hyangah/pprof that referenced this issue May 29, 2018
This CL fixes a long-lasting bug that prevented pprof from recognizing
Legacy heap profile produced by Go. Go reports four types of samples
at once so the profile includes alloc_objects, alloc_space,
inuse_objects, and inuse_space. The bug caused pprof to misclassify Go
heap profile data and prevent selection of correct filtering/pruning
patterns in analysis.

Update golang/go#25096

Tested with the profile samples included in golang.org/issues/25096
(pprof hides the runtime functions with this change)
aalexand pushed a commit to google/pprof that referenced this issue May 30, 2018
This CL fixes a long-lasting bug that prevented pprof from recognizing
Legacy heap profile produced by Go. Go reports four types of samples
at once so the profile includes alloc_objects, alloc_space,
inuse_objects, and inuse_space. The bug caused pprof to misclassify Go
heap profile data and prevent selection of correct filtering/pruning
patterns in analysis.

Update golang/go#25096

Tested with the profile samples included in golang.org/issues/25096
(pprof hides the runtime functions with this change)
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/115295 mentions this issue: cmd/vendor/.../pprof: sync at rev 1ddc9e2

gopherbot pushed a commit that referenced this issue May 30, 2018
This includes changes in pprof to support
 - the new -diff_base flag
 - fix for a bug in handling of legacy Go heap profiles

Update #25096

Change-Id: I826ac9244f31cc2c4415388c44a0cbe77303e460
Reviewed-on: https://go-review.googlesource.com/115295
Run-TryBot: Hyang-Ah Hana Kim <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
@hyangah
Copy link
Contributor

hyangah commented May 30, 2018

With the newly built pprof, I verified that the profiles in the legacy format and the proto format report almost identical results if we collect profiles using -memprofilerate=1.

With the default memprofile sampling rate, however, the results look different. The results are different from the results with -memprofilerate=1, too. The order in the top15 list varies a lot. This seems to me errors from sampling-based estimation, not the bug the original report suggested.

@josharian, is there anything you want to check before closing this issue? Improving the accuracy of sampling is one thing I can think of but it should be tracked separately (independent of go1.11 release).

@hyangah hyangah changed the title runtime/pprof: heap/allocs profile with legacy format yields inuse_space regardless of pprof flags runtime/pprof: heap/allocs profile with legacy format yields different results from the profile with proto format May 30, 2018
@josharian
Copy link
Contributor Author

Thanks! I’ll take a look as soon as I can.

@josharian
Copy link
Contributor Author

@hyangah the new output seems much more plausible. Thanks! And glancing over the diff, mishandling of the -alloc_space flag strikes me as exactly the sort of thing that would have explained what I was seeing.

Improving the accuracy of sampling is one thing I can think of but it should be tracked separately (independent of go1.11 release).

Indeed. I've filed #25653. I'll close this now. Thanks again!

@golang golang locked and limited conversation to collaborators May 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants