-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: add pprof command #62796
roachprod: add pprof command #62796
Conversation
ff99b82
to
0a5f420
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thank you!
Note that I've skipped the pprofurl command in the original issue.
Probably fine, we can add it later if anyone really needs it. I just had to dig around a bit to figure it out, but people are likely to just use the pprof command anyway.
Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @stevendanna)
pkg/cmd/roachprod/main.go, line 1517 at r1 (raw file):
Use: "pprof <cluster>", Args: cobra.ExactArgs(1), Aliases: []string{"pprof-cpu", "pprof-profile", "pprof-heap"},
If we're using --heap
for the heap profile then maybe that's sufficient instead of cluttering up the command namespace?
pkg/cmd/roachprod/main.go, line 1551 at r1 (raw file):
mu := &syncutil.Mutex{} pprofPath := fmt.Sprintf("debug/pprof/%s?seconds=%d", profType, int(pprofOptions.duration.Seconds())) httpClient := httputil.NewClientWithTimeout(5 * pprofOptions.duration)
5x seems a bit excessive, maybe just add on a fixed 10 seconds or something?
pkg/cmd/roachprod/main.go, line 1560 at r1 (raw file):
scheme = "https" } outputFile := fmt.Sprintf("pprof-%s-%s-%04d-%d.out", profType, c.Name, i, timeutil.Now().Unix())
We should probably generate the timestamp once outside of the loop, to ensure we don't get files with different timestamps if they happen right as the clock ticks. Also, I'd probably use the time before the index in the filename so that all profiles for the same run are grouped together.
pkg/cmd/roachprod/main.go, line 1562 at r1 (raw file):
outputFile := fmt.Sprintf("pprof-%s-%s-%04d-%d.out", profType, c.Name, i, timeutil.Now().Unix()) outputDir := filepath.Dir(outputFile) file, err := ioutil.TempFile(outputDir, ".pprof")
I don't think we end up closing this file. Not that it matters that much since we'll just be exiting the process anyway, but it's the polite thing to do and the linter will probably yell at you for it anyway.
pkg/cmd/roachprod/main.go, line 1602 at r1 (raw file):
for _, s := range outputFiles { fmt.Printf("Created %s\n", s)
Maybe just move this into the function above, and output as the files are ready?
pkg/cmd/roachprod/main.go, line 1617 at r1 (raw file):
for i, file := range outputFiles { port := pprofOptions.startingPort + i cmd := exec.Command("go", "tool", "pprof",
Will ending the main process (e.g. with ctrl-C) also reap the child processes? If not, we may have to do something along these lines:
cmd.SysProcAttr = &syscall.SysProcAttr{
Pdeathsig: syscall.SIGKILL,
}
Just to make sure we don't get background processes hogging the ports.
ea9efbc
to
d348ad1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
pkg/cmd/roachprod/main.go, line 1517 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
If we're using
--heap
for the heap profile then maybe that's sufficient instead of cluttering up the command namespace?
I removed pprof-cpu
and pprof-profile
, but I left pprof-heap
just because it kinda amuses me to allow that. Perhaps in the morning I'll remove it to, I'll think about it. :D
pkg/cmd/roachprod/main.go, line 1551 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
5x seems a bit excessive, maybe just add on a fixed 10 seconds or something?
The worry I had with that is that I believe that timeout includes downloading the response, so if you've generated a very large profile, it might take a long time to download, especially if you have slower home internet.
pkg/cmd/roachprod/main.go, line 1560 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
We should probably generate the timestamp once outside of the loop, to ensure we don't get files with different timestamps if they happen right as the clock ticks. Also, I'd probably use the time before the index in the filename so that all profiles for the same run are grouped together.
Good idea. Done.
pkg/cmd/roachprod/main.go, line 1562 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
I don't think we end up closing this file. Not that it matters that much since we'll just be exiting the process anyway, but it's the polite thing to do and the linter will probably yell at you for it anyway.
Good catch. Interestingly the linter doesn't seem to catch this. Perhaps we should add a linter for this. We now close in the defer and we also close before the rename in the happy path.
pkg/cmd/roachprod/main.go, line 1602 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Maybe just move this into the function above, and output as the files are ready?
Unfortunately ParallelE write progress to the terminal by updating the same line repeatedly which means that if we write it while that is still progressing, we either need to write them to stderr despite them not being errors or we lose the output line in most cases.
pkg/cmd/roachprod/main.go, line 1617 at r1 (raw file):
Will ending the main process (e.g. with ctrl-C) also reap the child processes?
Ending the main process in general will not reap the children, but ctrl-C at the terminal will work. Control-C at the terminal sends SIGINT to all process in the foreground process group which will include all the go pprof
processes. Since this is an "interactive" type feature I thought this would probably be sufficient, but happy to go further and handle signals more gracefully in general.
Pdeathsig
doesn't work on macOS AFAIK. So I think there our best option might be a signal handler for the common signals we expect to kill us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, 1 of 1 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @stevendanna)
pkg/cmd/roachprod/main.go, line 1551 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
The worry I had with that is that I believe that timeout includes downloading the response, so if you've generated a very large profile, it might take a long time to download, especially if you have slower home internet.
Right, makes sense. I suppose we'd really want to use a timeout until the first byte of the body or something, but let's not complicate it, this is fine.
pkg/cmd/roachprod/main.go, line 1562 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Good catch. Interestingly the linter doesn't seem to catch this. Perhaps we should add a linter for this. We now close in the defer and we also close after the rename in the happy path.
Huh, that's odd, would've expected it to pick up on that.
pkg/cmd/roachprod/main.go, line 1602 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Unfortunately ParallelE write progress to the terminal by updating the same line repeatedly which means that if we write it while that is still progressing, we either need to write them to stderr despite them not being errors or we lose the output line in most cases.
Ah right, make sense.
pkg/cmd/roachprod/main.go, line 1617 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Will ending the main process (e.g. with ctrl-C) also reap the child processes?
Ending the main process will not reap the children, but ctrl-C at the terminal will work. Control-C at the terminal sends SIGINT to all process in the foreground process group which will include all the
go pprof
processes. Since this is an "interactive" type feature I thought this would probably be sufficient, but happy to go further and handle signals more gracefully in general.Because
Pdeathsig
doesn't work on macOS AFAIK so I think there our best option might be a signal handler for the common signals we expect to kill us.
As long as Control-C does the right thing that's good enough for me, thanks!
The new pprof command allows to user to collect CPU and heap profiles from nodes in a roachprod cluster using pprof's HTTP endpoints. Profiles are collected in parallel and stored in the current working directory of the machine running pprof. The command can also optionally open the profiles using `go tool pprof`. Some usage examples: # Capture CPU profile for all nodes in the cluster roachprod pprof CLUSTERNAME # Capture CPU profile for the first node in the cluster for 60 seconds roachprod pprof CLUSTERNAME:1 --duration 60s # Capture a Heap profile for the first node in the cluster roachprod pprof CLUSTERNAME:1 --heap # Same as above roachprod pprof-heap CLUSTERNAME:1 Fixes cockroachdb#62309 Release note: None
d348ad1
to
1d2b57e
Compare
bors r=erikgrinaker |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
The new pprof command allows to user to collect CPU and heap profiles
from nodes in a roachprod cluster using pprof's HTTP endpoints.
Profiles are collected in parallel and stored in the current working
directory of the machine running pprof. The command can also
optionally open the profiles using
go tool pprof
.Some usage examples:
Fixes #62309
Note that I've skipped the
pprofurl
command in the original issue. I noticed that theadminurl
command has an option to append any string you'd like to the URL, which is probably most of what one
needs from that command in most cases. But, I can add it if others feel differently.
Release note: None