-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "trim path" option which can be used to relocate sources. #366
Conversation
Codecov Report
@@ Coverage Diff @@
## master #366 +/- ##
==========================================
+ Coverage 66.67% 66.73% +0.06%
==========================================
Files 36 36
Lines 7456 7462 +6
==========================================
+ Hits 4971 4980 +9
+ Misses 2082 2080 -2
+ Partials 403 402 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Some minor comments. The heuristic in particular looks good to me.
To give a single user data point here is how we are using it
pprof --http :8081 -source_path ${GOPATH}/src:${GOROOT}/src
This picks up all the sources on my laptop, where the binaries are built on an array of servers. I am not using the trim_path
flag. I would be curious to see how many people need to use that flag or whether the heuristic covers most cases.
internal/report/source.go
Outdated
"/proc/self/cwd/", | ||
// found on profiles plus configured prefixes. | ||
func trimPath(path, trimPath, searchPath string) string { | ||
sPath, searchPath := filepath.ToSlash(path), filepath.ToSlash(searchPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we don't change the name of searchPath
but we do change the name of path
to sPath
. I would prefer to just change path
and not create a new variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an extra thought. I think it would make sense to run filepath.ToSlash(...)
over each of these inputs before this method is called. For example if we ran this function over all paths introduced here and then we could safely work with a set of uniform file paths internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to changing path.
-1 to exposing the slash change to the API. I think it simpler to leave all the slash-specific code inside the implementation of this function, unless there is a good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, like @rauls5382 says, I think I am more comfortable if the slash machinery stays fairly isolated within this function. I am not super excited about introducing a precondition that the input arguments are slashized by the caller.
For a similar reason we keep the original value of path
here and keep sPath
separate. The former is used for the return value so that slashes are the same as the caller gave. I added a comment to make it clearer.
for _, dir := range filepath.SplitList(searchPath) { | ||
want := "/" + filepath.Base(dir) + "/" | ||
if found := strings.Index(sPath, want); found != -1 { | ||
return path[found+len(want):] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we use path
, but I think we should be using sPath
? If we change sPath
to path
above then this problem goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want path instead of sPath, as we want to return the system path.
However, maybe we should use FromSlash(sPath). Not sure if there is a guarantee that in all platforms the path separator is a single character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://golang.org/src/path/filepath/path.go?s=4426:4458#L155 and from ToSlash documentation that says "ToSlash returns the result of replacing each separator character in path with a slash ('/') character. Multiple separators are replaced by multiple slashes." I am pretty confident that it's safe to slice the original path using the slashized one as the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see FromSlash(sPath[found+len(want):])
. Just because the subtle juggling of different path values makes the code harder to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trimPath
function could have a policy of converting path
and trimPath
using ToSlash
operating only on these and then explicitly converting any return value using FromSlash
. This would avoid having to reason about whether the indexes line up between different versions of each path on different OSes.
I agree with @aalexand that the indexes very likely do line up in every case. But I think it is undesirable to have to think so carefully about the correctness of this function. It would be preferable if it was obviously correct.
This has the downside of increasing allocations inside the trimPath function. I profiled the pprof tool using such an explicit To/FromSlash
policy and found that the allocations in trimPath
do not appear at all in the profile generated.
For reference my method of profiling is run
pprof --http :8081 -source_path ${GOPATH}/src:${GOROOT}/src profile_output
on an existing server. Then using the web-ui navigate to the source
view and then take a heap profile of the running pprof
process. The profile is then viewed with --sample_index=alloc_objects
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too would prefer to use FromSlash on the return, if only for symmetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the documentation for ToSlash and FromSlash clearly says that these functions only replace character for character, so I think there is nothing to worry about. Function name is trimPath
so I think it's only job is trimming. I can rename it to trimAndFromSlashPath and make the suggested change if it's everybody's preference (not mine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that trimPath call is also used to trim the paths before displaying in the graph or source / assembly view, so ToSlash'ing the path here would make Unix paths display with backslash if someone opens on a Windows machine a profile collected on Linux. This is an edge case but I think the behavior as coded now would be more reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Since running FromSlash(ToSlash(somePath))
won't always produce a string identical to somePath
then I am happy with the function as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! A few comments below.
for _, dir := range filepath.SplitList(searchPath) { | ||
want := "/" + filepath.Base(dir) + "/" | ||
if found := strings.Index(sPath, want); found != -1 { | ||
return path[found+len(want):] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want path instead of sPath, as we want to return the system path.
However, maybe we should use FromSlash(sPath). Not sure if there is a guarantee that in all platforms the path separator is a single character.
internal/report/source.go
Outdated
"/proc/self/cwd/", | ||
// found on profiles plus configured prefixes. | ||
func trimPath(path, trimPath, searchPath string) string { | ||
sPath, searchPath := filepath.ToSlash(path), filepath.ToSlash(searchPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to changing path.
-1 to exposing the slash change to the API. I think it simpler to leave all the slash-specific code inside the implementation of this function, unless there is a good reason.
internal/report/source.go
Outdated
// found on profiles plus configured prefixes. | ||
func trimPath(path, trimPath, searchPath string) string { | ||
sPath, searchPath := filepath.ToSlash(path), filepath.ToSlash(searchPath) | ||
if trimPath == "" && searchPath != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The searchPath != "" is redundant, as filePath.SplitList("") == nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// example, given original path "/some/remote/path/my-project/foo/bar.c" | ||
// and search path "/my/local/path/my-project" the heuristic will return | ||
// "/my/local/path/my-project/foo/bar.c". | ||
for _, dir := range filepath.SplitList(searchPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little painful doing this search for every filename,
We could remember the heuristic trimPath we've used so far and try to apply them to the next filename before trying the search. I assume in the general case we will only have one.
Up to you, though. We could leave as a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into whether it's worth it, will ping the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I collected a couple of profiles over pprof itself while generating a graph for some larger profiles passing it a number of paths in -source_path and didn't see trimPath to show up substantially (saw something like 0.3% and the number of samples was low). I am also a bit hesitant to implement simple caching just remembering last trimPath since in interactive sessions user could change source path and trim path so those need to be part of the hashing key probably. Just leaving a TODO for now.
internal/report/report.go
Outdated
@@ -239,7 +240,7 @@ func (rpt *Report) newGraph(nodes graph.NodeSet) *graph.Graph { | |||
// Clean up file paths using heuristics. | |||
prof := rpt.prof | |||
for _, f := range prof.Function { | |||
f.Filename = trimPath(f.Filename) | |||
f.Filename = trimPath(f.Filename, o.TrimPath, "" /* only trim if the trim path is explicitly specified */) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should pass o.SourcePath here.
Why don't we? Comment isn't very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. My main motivation was to keep the behavior here as is since I was a little uncertain whether trimming the paths to display on the graph based on the search_path-based heuristics is a good idea or not. But passing o.SourcePath here makes the trimming here less special which is a good thing so done.
When pprof is asked to show annotated source for a profile collected on another machine for a Go program, the profile contains absolute paths which may not exist on the local machine. In that case pprof currently fails to locate the source with no option to help it. The new option adds a way to specify one or several source path prefixes that should be trimmed from the source paths in the profile before applying the search using search path option. For example, taking the example from the issue where the source file path in the profile is /home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go and the local path is /home/marko/lakafka/main.go. The user may specify `-trim-path=/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo` to make pprof find the source. The source path doesn't need to be specified if the current working dir is anything at or under `/home/marko/`. When the trim path is not specified, it is guessed heuristically based on the basename of configured searched paths. In the example above, setting `-search-path=/home/marko/lakafka` would be sufficient to activate the heuristic successfully. Or having the current directory as `/home/marko/lakafka` since the search path is by default set to the current working directory. Note that the heuristic currently does not attempt to walk the configured search paths up like the search does. This is to keep it simple, use `-trim-path` explicitly in more complicated cases. Fixes google#262.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Feel free to submit as-is.
Thanks so much for this, it's working great for me. Particularlly when the binary was compiled on a Linux host with it's stdlib somewhere different from where it is on my mac. I can now use pprof as mentioned above like this (doesn't work via
Adding this in case anyone has similar issues. I only found golang/go#13231 when searching for this and someone on gopher slack pointed me here. |
@chancez Glad to hear it's useful! |
* 'master' of github.com:google/pprof: (32 commits) record diff base profile label key/value constant (google#384) apply additional command overrides based on report format (google#381) profile: fix legacy format Go heap profile parsing (google#382) add -diff flag for better profile comparision (google#369) Use -u flag in pprof installation command so that it updates if needed. (google#376) Add GetBase support for ASLR kernel mappings (google#371) Add show_from profile filter. (google#372) Update README.md due to 8dff45d (google#375) Remove stale docs, move useful ones. (google#374) internal/driver: skip tests requiring tcp on js (google#373) Add "trim path" option which can be used to relocate sources. (google#366) Move update_d3flamegraph.sh script from the root directory. (google#370) Skip unsymbolizable mapping during symbolz pass. (google#368) remove -positive_percentages flag (google#365) Render icicle graph in "Flame Graph" view (google#367) Add command-line editing support for interactive pprof (google#362) Improve profile filter unit tests, fix show filtering of mappings (google#354) Fake mappings should be merged into a single one during merging (google#357) Hack the code so that both existing Go versions and current Go master format it the same (google#358) moved filter tests into their own test file which matches the implementation file, filter.go. (google#356) ...
…#366) Add "trim path" option which can be used to relocate sources. When pprof is asked to show annotated source for a profile collected on another machine for a Go program, the profile contains absolute paths which may not exist on the local machine. In that case pprof currently fails to locate the source with no option to help it. The new option adds a way to specify one or several source path prefixes that should be trimmed from the source paths in the profile before applying the search using search path option. For example, taking the example from the issue where the source file path in the profile is /home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go and the local path is /home/marko/lakafka/main.go. The user may specify `-trim-path=/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo` to make pprof find the source. The source path doesn't need to be specified if the current working dir is anything at or under `/home/marko/`. When the trim path is not specified, it is guessed heuristically based on the basename of configured searched paths. In the example above, setting `-search-path=/home/marko/lakafka` would be sufficient to activate the heuristic successfully. Or having the current directory as `/home/marko/lakafka` since the search path is by default set to the current working directory. Note that the heuristic currently does not attempt to walk the configured search paths up like the search does. This is to keep it simple, use `-trim-path` explicitly in more complicated cases. Fixes google#262.
When pprof is asked to show annotated source for a profile collected on
another machine for a Go program, the profile contains absolute paths
which may not exist on the local machine. In that case pprof currently
fails to locate the source with no option to help it. The new option
adds a way to specify one or several source path prefixes that should be
trimmed from the source paths in the profile before applying the search
using search path option.
For example, taking the example from the issue where the source file
path in the profile is
/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
and the local path is /home/marko/lakafka/main.go. The user may
specify
-trim-path=/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo
to make pprof find the source. The source path doesn't need to be
specified if the current working dir is anything at or under
/home/marko/
.When the trim path is not specified, it is guessed heuristically based
on the basename of configured searched paths. In the example above,
setting
-search-path=/home/marko/lakafka
would be sufficient toactivate the heuristic successfully. Or having the current directory as
/home/marko/lakafka
since the search path is by default set to thecurrent working directory. Note that the heuristic currently
does not attempt to walk the configured search paths up like the search
does. This is to keep it simple, use
-trim-path
explicitly in morecomplicated cases.
Fixes #262.