-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add pprof labels based on the statement type #30930
Conversation
Here's a sample profile created with the new tags: |
https://rakyll.org/profiler-labels/ has more details on how the labels work, btw. |
Just a drive by question, what's the cost of a label? Are they cheap enough to throw in basically as we please? |
They seem very cheap, but I don't know about using them more than on the order of once per query. They do a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as Tobi. This is really neat, but I'd like to experimentally confirm that the overhead is completely negligible before making the change.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/conn_executor_exec.go, line 26 at r1 (raw file):
"github.com/pkg/errors" pprof "runtime/pprof"
Weird import.
Pretty disappointing. Not sure this is worth it. Though the |
pprof profiles will now be enhanced with the type of the statement that caused the creation of each stack. statement types are things like 'SELECT', 'INSERT', 'ALTER TABLE', etc. Release note: None
533025c
to
7f5e563
Compare
Replacing the
|
Can we statically allocate the labels for all statement types to avoid the increase in memory usage? |
We'll still have to allocate a map in |
Nevermind, there's no way we can reduce allocations here. However, I think the increase of allocations is negligible. The query benchmarked here is a worst-case - it's |
Can't we at least avoid the |
That doesn't get allocated. |
Really? Aren't there varargs and a slice in |
I ran this change through pprof and didn't see any allocations on those lines. Maybe I did something wrong? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, so up to you whether you think the added debuggability from this change is worth the minor increase in allocations.
I'm not really sure. I'm going to let this sit a while longer and revisit when I have more headspace for it. Thanks for the reviews so far. |
Per our discussion yesterday, I think that if you gated this behind a cluster setting and used the anonymized statement string (on top of a tag label) this would be a very useful addition that we could use in the 2.2 cycle to find out whether we'll want to make larger investments here like the ones you're dreaming of (like adding general byte counter profiles etc that help us profile I/O). WDYT? PS: for the cluster setting, perhaps the right pattern is to make the setting a deadline at which the labels deactivate -- intended usage would be something like |
When pprofui CPU profiling is active, add the statement tag and anonymized statement string to the goroutine labels. For example, this is what you can see when running ./bin/workload run kv --read-percent 50 --init ``` $ pprof -seconds 10 http://localhost:8080/debug/pprof/ui/profile [...] (pprof) tags stmt.anonymized: Total 7.9s 4.0s (50.57%): UPSERT INTO kv(k, v) VALUES ($1, $2) 3.9s (49.43%): SELECT k, v FROM kv WHERE k IN ($1,) stmt.tag: Total 7.9s 4.0s (50.57%): INSERT 3.9s (49.43%): SELECT ``` The dot graphs are similarly annotated, though they require `dot` to be installed on the machine and thus won't be as useful on the pprofui itself. Profile tags are not propagated across RPC boundaries. That is, a node may have high CPU as a result of SQL queries not originating at the node itself, and no labels will be available. But perusing this diff, you may notice that any moving part in the system can sniff whether profiling is active, and can add labels in itself, so in principle we could add the application name or any other information that is propagated along with the transaction on the recipient node and track down problems that way. We may also be able to add tags based on RangeIDs to identify ranges which cause high CPU load. The possibilities are endless, and with this infra in place, it's trivial to quickly iterate on what's useful. Closes cockroachdb#30930. Release note (admin ui change): Running nodes can now be CPU profiled in a way that breaks down CPU usage by query (some restrictions apply).
When pprofui CPU profiling is active, add the statement tag and anonymized statement string to the goroutine labels. For example, this is what you can see when running ./bin/workload run kv --read-percent 50 --init ``` $ pprof -seconds 10 http://localhost:8080/debug/pprof/ui/profile [...] (pprof) tags stmt.anonymized: Total 7.9s 4.0s (50.57%): UPSERT INTO kv(k, v) VALUES ($1, $2) 3.9s (49.43%): SELECT k, v FROM kv WHERE k IN ($1,) stmt.tag: Total 7.9s 4.0s (50.57%): INSERT 3.9s (49.43%): SELECT ``` The dot graphs are similarly annotated, though they require `dot` to be installed on the machine and thus won't be as useful on the pprofui itself. Profile tags are not propagated across RPC boundaries. That is, a node may have high CPU as a result of SQL queries not originating at the node itself, and no labels will be available. But perusing this diff, you may notice that any moving part in the system can sniff whether profiling is active, and can add labels in itself, so in principle we could add the application name or any other information that is propagated along with the transaction on the recipient node and track down problems that way. We may also be able to add tags based on RangeIDs to identify ranges which cause high CPU load. The possibilities are endless, and with this infra in place, it's trivial to quickly iterate on what's useful. Closes cockroachdb#30930. Release note (admin ui change): Running nodes can now be CPU profiled in a way that breaks down CPU usage by query (some restrictions apply).
35147: sql: add profiler labels during CPU profiling r=jordanlewis a=tbg When pprofui CPU profiling is active, add the statement tag and anonymized statement string to the goroutine labels. For example, this is what you can see when running ./bin/workload run kv --read-percent 50 --init ``` $ pprof -seconds 10 http://localhost:8080/debug/pprof/ui/profile [...] (pprof) tags stmt.anonymized: Total 7.9s 4.0s (50.57%): UPSERT INTO kv(k, v) VALUES ($1, $2) 3.9s (49.43%): SELECT k, v FROM kv WHERE k IN ($1,) stmt.tag: Total 7.9s 4.0s (50.57%): INSERT 3.9s (49.43%): SELECT ``` The dot graphs are similarly annotated, though they require `dot` to be installed on the machine and thus won't be as useful on the pprofui itself. Profile tags are not propagated across RPC boundaries. That is, a node may have high CPU as a result of SQL queries not originating at the node itself, and no labels will be available. But perusing this diff, you may notice that any moving part in the system can sniff whether profiling is active, and can add labels in itself, so in principle we could add the application name or any other information that is propagated along with the transaction on the recipient node and track down problems that way. We may also be able to add tags based on RangeIDs to identify ranges which cause high CPU load. The possibilities are endless, and with this infra in place, it's trivial to quickly iterate on what's useful. Closes #30930. Release note (admin ui change): Running nodes can now be CPU profiled in a way that breaks down CPU usage by query (some restrictions apply). Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
pprof profiles will now be enhanced with the type of the statement that
caused the creation of each stack. statement types are things like
'SELECT', 'INSERT', 'ALTER TABLE', etc.
This example output is produced based on a kv95 workload:
Release note: None