Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof,net/http/pprof: improve delta profiles efficiency and correctness #67942

Open
korniltsev opened this issue Jun 12, 2024 · 9 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance

Comments

@korniltsev
Copy link
Contributor

Proposal Details

The issue

In Golang, allocation, mutex and block profiles are cumulative. They only grow over time and show allocations/blocks that happened since the beginning of the running program.
Not only values grow, but the size of the profile itself grows as well. It could grow up to megabytes in size for long-running processes.

In many cases, it's more useful to see the differences between two points in time.
You can use delta profile from net/http/pprof package.
Using the delta profile requires passing seconds argument to the pprof endpoint query.

go tool pprof http://localhost:6060/debug/pprof/heap?seconds=30

What this does:

  1. Dump profile p0
  2. Sleep
  3. Dump profile p1
  4. Decompress and parse protobuf p0
  5. Decompress and parse protobuf p1
  6. Subtract p0 from p1
  7. Serialize protobuf and compress the result

The resulting profile is usually much smaller (p0 may be megabytes, while the compressed result is usually tens of kilobytes).

There are number of issues with this approach:

  1. Heap profile contains both allocation values and in-use values. In-use values are not cumulative. In-use values are corrupted by the subtraction.
    Note: It can be fixed if net/http/pprof package would use p0.ScaleN([]float64{-1,-1,0,0}), instead of p0.Scale(-1) for memory profiles - that would subtract allocation values and zero out in-use values in p0.
  2. It requires dumping two big profiles.
  3. It produces a lot of allocations putting pressure on GC.

DataDog's fastdelta

DataDog's fastdelta profiler uses another approach.

It improves the runtime/pprof approach by keeping a copy of the previous profile and subtracting the current profile from it.
The fastdelta profiler uses a custom protobuf pprof parser that doesn't allocate as much memory.
This approach is much more efficient, faster, and produces less presure on GC. It also doesn't require using two profiles.
However, the fastdelta profiler still parses huge profiles up to megabytes, just to discard most of it.

Grafana's godeltaprof

godeltaprof does a similar job but slightly differently.

Delta computation happens before serializing any pprof files using runtime.MemprofileRecord and BlockProfileRecord.
This way, huge profiles don't need to be parsed. The delta is computed on raw records, all zeros are rejected, and results are serialized and compressed.

The source code for godeltaprof is based (forked) on the original runtime/pprof package.
godeltaprof is modified to include delta computation before serialization and to expose the new endpoints.
godeltaprof relies on a bunch of golang runtime internal functions, specifically runtime_FrameStartLine, runtime_FrameSymbolName, runtime_expandFinalInlineFrame and runtime_cyclesPerSecond link and potentially even more internal functions . Relying on internal functions becomes harder and more dangerous due to #67401.

Proposal

We propose to to allow efficient delta memory, mutex, block profiles collection, both in runtime/pprof for push-based integrations and net/http/pprof for scraping integrations.

The key points for improvements:

  • It should not require dumping two profiles
  • It should not require gzip decompressing and pprof parsing
  • Delta memory profile should be correct. (Either the inuse values are not corrupted, or alloc_* and inuse_* values are put into separate profiles)

The specifics of API and implementation details are left to be determined during discussion of the issue after we see an agreement we need to address the issue and this is something that could be accepted into golang runtime

@korniltsev
Copy link
Contributor Author

Oh, this is likely some sort of duplicate of #57765

@prattmic
Copy link
Member

I don't believe this needs to be a proposal (no API changes, unless you think new runtime/pprof APIs are required to implement this?), so removing from the proposal process.

@prattmic prattmic changed the title proposal: runtime/pprof net/http/pprof: Improve delta profiles efficiency and corectness runtime/pprof,net/http/pprof: improve delta profiles efficiency and correctness Jun 12, 2024
@prattmic prattmic added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Jun 12, 2024
@prattmic
Copy link
Member

Heap profile contains both allocation values and in-use values. In-use values are not cumulative. In-use values are corrupted by the subtraction.
Note: It can be fixed if net/http/pprof package would use p0.ScaleN([]float64{-1,-1,0,0}), instead of p0.Scale(-1) for memory profiles - that would subtract allocation values and zero out in-use values in p0.

I recommend filing a separate issue for this, if one doesn't already exist. The rest of this issue is a performance optimization, but this is a real bug.

@prattmic
Copy link
Member

cc @golang/runtime

@rsc
Copy link
Contributor

rsc commented Jun 12, 2024

I don't know, it seems like new API to me. (It's a new URL handler variant but that's still API.) Please do kick it over to the proposal process if it ends up being a non-trivial change.

@prattmic
Copy link
Member

It's a new URL handler variant but that's still API.

Perhaps I am missing it, but I don't see what the new URL handler variant is. IIUC, the core of this issue is to make the existing /debug/pprof/heap?seconds=30 handler more efficient.

@korniltsev
Copy link
Contributor Author

Perhaps I am missing it, but I don't see what the new URL handler variant is.

I had multiple ideas:

  1. Improve /debug/pprof/heap?seconds=30 (no knew API). This only fixes the issue for net/http/pprof package, but not runtime/pprof .
  2. Maybe a new delta implementation could live behind a runtime/pprof.Profile (new API - profile name, new URL ), this way it could solve issues for both net/http/pprof and runtime/pprof package. It would be nice to have an option to avoid sleeping for profile collection, just reuse data from previous profile collections.

@korniltsev
Copy link
Contributor Author

I recommend filing a separate issue for this, if one doesn't already exist. The rest of this issue is a performance optimization, but this is a real bug.

I think there is an existing issue already #57765 . It has no bug label thoughю

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

4 participants