-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: use pprof label for Replica.Send (replace sendWithRangeID) #85948
Comments
Should we pre-condition adding the pprof labels with the server/cluster setting, like here? cockroach/pkg/sql/conn_executor_exec.go Lines 124 to 143 in e09ce91
If we do so, can we avoid adding the complex code with the pool etc? The fast path described above is sensitive to having other pprof labels up the stack. As we adopt this labelling technique more widely, it's going to be more likely that this is no longer the fast path (btw, are you sure it is now? are there no labels up the stack?). Conditioning it by a global setting seems more straightforward and predictable. |
Also, maybe it could be a per-request/tenant/etc setting. Or maybe another way is optimizing the pprof labels code itself :) It has a TODO in it to be more efficient. IMO, it can, similarly to |
We discussed on Slack. Pavel brought up the good idea of not doing any perf improvements and wrapping the ctx only if currently profiling. That's also the approach take for the other labels and since this label is a lot cheaper than the ones on the SQL side, it should be workable. We will do that instead. |
cc @cockroachdb/replication |
Is your feature request related to a problem? Please describe.
We used to pass a faux parameter to
Replica.sendWithoutRangeID
but a) this broke in a recent Go update. We should just do this properly and use a profiler label, which will then show up in the?debug=1
goroutines profile.Describe the solution you'd like
It's a tricky because the code path is allocation sensitive (it's a hot path) and the pprof API forces us to use a context. We don't want to take the incoming context and wrap it due to the allocations, but if there is a label in there already we probably want to respect it and need to allocate. What we can do is to allocate a "background" context along with the replica for use in the fast path, i.e. we'd allocate only once per Replica, and to the expensive wrapping only when there are labels already.
To do this, we'd add a new field
pprofLabelCtx
toReplica
and populate it in newUnloadedReplica:As well as some alloc avoidance infra:
and then at the top of
Replica.Send
(after inliningsendWithoutRangeID
):Describe alternatives you've considered
Nothing? Or see if
sendWithoutRangeID
can be resurrected properly.Additional context
When a request hangs on a replica, it's important to be able to see on which range. In theory, tracing infra can help here too, but I think it's still spotty.
Jira issue: CRDB-18497
The text was updated successfully, but these errors were encountered: