Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "trim path" option which can be used to relocate sources. #366

Merged
merged 3 commits into from
May 3, 2018

Conversation

aalexand
Copy link
Collaborator

When pprof is asked to show annotated source for a profile collected on
another machine for a Go program, the profile contains absolute paths
which may not exist on the local machine. In that case pprof currently
fails to locate the source with no option to help it. The new option
adds a way to specify one or several source path prefixes that should be
trimmed from the source paths in the profile before applying the search
using search path option.

For example, taking the example from the issue where the source file
path in the profile is
/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
and the local path is /home/marko/lakafka/main.go. The user may
specify
-trim-path=/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo
to make pprof find the source. The source path doesn't need to be
specified if the current working dir is anything at or under
/home/marko/.

When the trim path is not specified, it is guessed heuristically based
on the basename of configured searched paths. In the example above,
setting -search-path=/home/marko/lakafka would be sufficient to
activate the heuristic successfully. Or having the current directory as
/home/marko/lakafka since the search path is by default set to the
current working directory. Note that the heuristic currently
does not attempt to walk the configured search paths up like the search
does. This is to keep it simple, use -trim-path explicitly in more
complicated cases.

Fixes #262.

@codecov-io
Copy link

codecov-io commented Apr 27, 2018

Codecov Report

Merging #366 into master will increase coverage by 0.06%.
The diff coverage is 91.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #366      +/-   ##
==========================================
+ Coverage   66.67%   66.73%   +0.06%     
==========================================
  Files          36       36              
  Lines        7456     7462       +6     
==========================================
+ Hits         4971     4980       +9     
+ Misses       2082     2080       -2     
+ Partials      403      402       -1
Impacted Files Coverage Δ
internal/driver/commands.go 44.5% <ø> (ø) ⬆️
internal/driver/driver.go 70.43% <100%> (+0.15%) ⬆️
internal/report/report.go 30.35% <100%> (ø) ⬆️
internal/report/source.go 83.24% <90.9%> (+1.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 783adb7...ed2ba41. Read the comment docs.

Copy link

@fmstephe fmstephe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Some minor comments. The heuristic in particular looks good to me.

To give a single user data point here is how we are using it

pprof --http :8081 -source_path ${GOPATH}/src:${GOROOT}/src

This picks up all the sources on my laptop, where the binaries are built on an array of servers. I am not using the trim_path flag. I would be curious to see how many people need to use that flag or whether the heuristic covers most cases.

"/proc/self/cwd/",
// found on profiles plus configured prefixes.
func trimPath(path, trimPath, searchPath string) string {
sPath, searchPath := filepath.ToSlash(path), filepath.ToSlash(searchPath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we don't change the name of searchPath but we do change the name of path to sPath. I would prefer to just change path and not create a new variable name.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an extra thought. I think it would make sense to run filepath.ToSlash(...) over each of these inputs before this method is called. For example if we ran this function over all paths introduced here and then we could safely work with a set of uniform file paths internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to changing path.

-1 to exposing the slash change to the API. I think it simpler to leave all the slash-specific code inside the implementation of this function, unless there is a good reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like @rauls5382 says, I think I am more comfortable if the slash machinery stays fairly isolated within this function. I am not super excited about introducing a precondition that the input arguments are slashized by the caller.

For a similar reason we keep the original value of path here and keep sPath separate. The former is used for the return value so that slashes are the same as the caller gave. I added a comment to make it clearer.

for _, dir := range filepath.SplitList(searchPath) {
want := "/" + filepath.Base(dir) + "/"
if found := strings.Index(sPath, want); found != -1 {
return path[found+len(want):]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use path, but I think we should be using sPath? If we change sPath to path above then this problem goes away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want path instead of sPath, as we want to return the system path.

However, maybe we should use FromSlash(sPath). Not sure if there is a guarantee that in all platforms the path separator is a single character.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://golang.org/src/path/filepath/path.go?s=4426:4458#L155 and from ToSlash documentation that says "ToSlash returns the result of replacing each separator character in path with a slash ('/') character. Multiple separators are replaced by multiple slashes." I am pretty confident that it's safe to slice the original path using the slashized one as the index.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see FromSlash(sPath[found+len(want):]). Just because the subtle juggling of different path values makes the code harder to read.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the trimPath function could have a policy of converting path and trimPath using ToSlash operating only on these and then explicitly converting any return value using FromSlash. This would avoid having to reason about whether the indexes line up between different versions of each path on different OSes.

I agree with @aalexand that the indexes very likely do line up in every case. But I think it is undesirable to have to think so carefully about the correctness of this function. It would be preferable if it was obviously correct.

This has the downside of increasing allocations inside the trimPath function. I profiled the pprof tool using such an explicit To/FromSlash policy and found that the allocations in trimPath do not appear at all in the profile generated.

For reference my method of profiling is run

pprof --http :8081 -source_path ${GOPATH}/src:${GOROOT}/src profile_output

on an existing server. Then using the web-ui navigate to the source view and then take a heap profile of the running pprof process. The profile is then viewed with --sample_index=alloc_objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too would prefer to use FromSlash on the return, if only for symmetry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the documentation for ToSlash and FromSlash clearly says that these functions only replace character for character, so I think there is nothing to worry about. Function name is trimPath so I think it's only job is trimming. I can rename it to trimAndFromSlashPath and make the suggested change if it's everybody's preference (not mine).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that trimPath call is also used to trim the paths before displaying in the graph or source / assembly view, so ToSlash'ing the path here would make Unix paths display with backslash if someone opens on a Windows machine a profile collected on Linux. This is an edge case but I think the behavior as coded now would be more reasonable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Since running FromSlash(ToSlash(somePath)) won't always produce a string identical to somePath then I am happy with the function as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Thanks

Copy link
Contributor

@rauls5382 rauls5382 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! A few comments below.

for _, dir := range filepath.SplitList(searchPath) {
want := "/" + filepath.Base(dir) + "/"
if found := strings.Index(sPath, want); found != -1 {
return path[found+len(want):]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want path instead of sPath, as we want to return the system path.

However, maybe we should use FromSlash(sPath). Not sure if there is a guarantee that in all platforms the path separator is a single character.

"/proc/self/cwd/",
// found on profiles plus configured prefixes.
func trimPath(path, trimPath, searchPath string) string {
sPath, searchPath := filepath.ToSlash(path), filepath.ToSlash(searchPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to changing path.

-1 to exposing the slash change to the API. I think it simpler to leave all the slash-specific code inside the implementation of this function, unless there is a good reason.

// found on profiles plus configured prefixes.
func trimPath(path, trimPath, searchPath string) string {
sPath, searchPath := filepath.ToSlash(path), filepath.ToSlash(searchPath)
if trimPath == "" && searchPath != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The searchPath != "" is redundant, as filePath.SplitList("") == nil

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// example, given original path "/some/remote/path/my-project/foo/bar.c"
// and search path "/my/local/path/my-project" the heuristic will return
// "/my/local/path/my-project/foo/bar.c".
for _, dir := range filepath.SplitList(searchPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little painful doing this search for every filename,

We could remember the heuristic trimPath we've used so far and try to apply them to the next filename before trying the search. I assume in the general case we will only have one.

Up to you, though. We could leave as a TODO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into whether it's worth it, will ping the PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I collected a couple of profiles over pprof itself while generating a graph for some larger profiles passing it a number of paths in -source_path and didn't see trimPath to show up substantially (saw something like 0.3% and the number of samples was low). I am also a bit hesitant to implement simple caching just remembering last trimPath since in interactive sessions user could change source path and trim path so those need to be part of the hashing key probably. Just leaving a TODO for now.

@@ -239,7 +240,7 @@ func (rpt *Report) newGraph(nodes graph.NodeSet) *graph.Graph {
// Clean up file paths using heuristics.
prof := rpt.prof
for _, f := range prof.Function {
f.Filename = trimPath(f.Filename)
f.Filename = trimPath(f.Filename, o.TrimPath, "" /* only trim if the trim path is explicitly specified */)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pass o.SourcePath here.
Why don't we? Comment isn't very helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. My main motivation was to keep the behavior here as is since I was a little uncertain whether trimming the paths to display on the graph based on the search_path-based heuristics is a good idea or not. But passing o.SourcePath here makes the trimming here less special which is a good thing so done.

When pprof is asked to show annotated source for a profile collected on
another machine for a Go program, the profile contains absolute paths
which may not exist on the local machine. In that case pprof currently
fails to locate the source with no option to help it. The new option
adds a way to specify one or several source path prefixes that should be
trimmed from the source paths in the profile before applying the search
using search path option.

For example, taking the example from the issue where the source file
path in the profile is
/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
and the local path is /home/marko/lakafka/main.go. The user may
specify
`-trim-path=/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo`
to make pprof find the source. The source path doesn't need to be
specified if the current working dir is anything at or under
`/home/marko/`.

When the trim path is not specified, it is guessed heuristically based
on the basename of configured searched paths. In the example above,
setting `-search-path=/home/marko/lakafka` would be sufficient to
activate the heuristic successfully. Or having the current directory as
`/home/marko/lakafka` since the search path is by default set to the
current working directory. Note that the heuristic currently
does not attempt to walk the configured search paths up like the search
does. This is to keep it simple, use `-trim-path` explicitly in more
complicated cases.

Fixes google#262.
Copy link
Contributor

@rauls5382 rauls5382 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Feel free to submit as-is.

@aalexand aalexand merged commit 630d32d into google:master May 3, 2018
@chancez
Copy link

chancez commented May 10, 2018

Thanks so much for this, it's working great for me. Particularlly when the binary was compiled on a Linux host with it's stdlib somewhere different from where it is on my mac. I can now use pprof as mentioned above like this (doesn't work via go tool pprof yet until this makes it to a Go release):

pprof -source_path ${GOPATH}/src:${GOROOT}/src $binary $args

Adding this in case anyone has similar issues. I only found golang/go#13231 when searching for this and someone on gopher slack pointed me here.

@aalexand aalexand deleted the trim-path branch May 10, 2018 20:08
@aalexand
Copy link
Collaborator Author

@chancez Glad to hear it's useful!

KISSMonX pushed a commit to KISSMonX/pprof that referenced this pull request Jun 14, 2018
* 'master' of github.com:google/pprof: (32 commits)
  record diff base profile label key/value constant (google#384)
  apply additional command overrides based on report format (google#381)
  profile: fix legacy format Go heap profile parsing (google#382)
  add -diff flag for better profile comparision (google#369)
  Use -u flag in pprof installation command so that it updates if needed. (google#376)
  Add GetBase support for ASLR kernel mappings (google#371)
  Add show_from profile filter. (google#372)
  Update README.md due to 8dff45d (google#375)
  Remove stale docs, move useful ones. (google#374)
  internal/driver: skip tests requiring tcp on js (google#373)
  Add "trim path" option which can be used to relocate sources. (google#366)
  Move update_d3flamegraph.sh script from the root directory. (google#370)
  Skip unsymbolizable mapping during symbolz pass. (google#368)
  remove -positive_percentages flag (google#365)
  Render icicle graph in "Flame Graph" view (google#367)
  Add command-line editing support for interactive pprof (google#362)
  Improve profile filter unit tests, fix show filtering of mappings (google#354)
  Fake mappings should be merged into a single one during merging (google#357)
  Hack the code so that both existing Go versions and current Go master format it the same (google#358)
  moved filter tests into their own test file which matches the implementation file, filter.go. (google#356)
  ...
gmarin13 pushed a commit to gmarin13/pprof that referenced this pull request Dec 17, 2020
…#366)

Add "trim path" option which can be used to relocate sources.

When pprof is asked to show annotated source for a profile collected on
another machine for a Go program, the profile contains absolute paths
which may not exist on the local machine. In that case pprof currently
fails to locate the source with no option to help it. The new option
adds a way to specify one or several source path prefixes that should be
trimmed from the source paths in the profile before applying the search
using search path option.

For example, taking the example from the issue where the source file
path in the profile is
/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
and the local path is /home/marko/lakafka/main.go. The user may
specify
`-trim-path=/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo`
to make pprof find the source. The source path doesn't need to be
specified if the current working dir is anything at or under
`/home/marko/`.

When the trim path is not specified, it is guessed heuristically based
on the basename of configured searched paths. In the example above,
setting `-search-path=/home/marko/lakafka` would be sufficient to
activate the heuristic successfully. Or having the current directory as
`/home/marko/lakafka` since the search path is by default set to the
current working directory. Note that the heuristic currently
does not attempt to walk the configured search paths up like the search
does. This is to keep it simple, use `-trim-path` explicitly in more
complicated cases.

Fixes google#262.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to select source files directory
6 participants