Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: use pprof label for Replica.Send (replace sendWithRangeID) #85948

Closed
tbg opened this issue Aug 11, 2022 · 4 comments · Fixed by #86130
Closed

kvserver: use pprof label for Replica.Send (replace sendWithRangeID) #85948

tbg opened this issue Aug 11, 2022 · 4 comments · Fixed by #86130
Assignees
Labels
A-kv-observability A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@tbg
Copy link
Member

tbg commented Aug 11, 2022

Is your feature request related to a problem? Please describe.

We used to pass a faux parameter to Replica.sendWithoutRangeID but a) this broke in a recent Go update. We should just do this properly and use a profiler label, which will then show up in the ?debug=1 goroutines profile.

Describe the solution you'd like

It's a tricky because the code path is allocation sensitive (it's a hot path) and the pprof API forces us to use a context. We don't want to take the incoming context and wrap it due to the allocations, but if there is a label in there already we probably want to respect it and need to allocate. What we can do is to allocate a "background" context along with the replica for use in the fast path, i.e. we'd allocate only once per Replica, and to the expensive wrapping only when there are labels already.

To do this, we'd add a new field pprofLabelCtx to Replica and populate it in newUnloadedReplica:

  pprofLabelCtx: pprof.WithLabels(context.Background(), pprof.Labels("rangeID", desc.RangeID.String()))

As well as some alloc avoidance infra:

type emptyChecker bool

func (c *emptyChecker) check(k, v string) bool {
	*c = true
	return false // seen enough
}

var emptyCheckerPool = sync.Pool{New: func() interface{} { return new(emptyChecker) }}

and then at the top of Replica.Send (after inlining sendWithoutRangeID):

	{
		var pprofLabelCtx context.Context
		const rangeIDLabel = "range_id"
		c := emptyCheckerPool.Get().(*emptyChecker)
		pprof.ForLabels(ctx, c.check)
		hasLabel := *c
		*c = false
		emptyCheckerPool.Put(c)
		if !hasLabel {
			// Fast path, saving allocations.
			pprof.SetGoroutineLabels(pprofLabelCtx)
		} else {
			// There are some incoming labels so we want to add our label to
			// the incoming context, which is more allocation intensive.
			pprof.SetGoroutineLabels(pprof.WithLabels(ctx, pprof.Labels(rangeIDLabel, r.RangeID.String())))
		}
		defer pprof.SetGoroutineLabels(ctx)
	}

Describe alternatives you've considered

Nothing? Or see if sendWithoutRangeID can be resurrected properly.

Additional context

When a request hangs on a replica, it's important to be able to see on which range. In theory, tracing infra can help here too, but I think it's still spotty.

Jira issue: CRDB-18497

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team labels Aug 11, 2022
@pav-kv pav-kv self-assigned this Aug 15, 2022
@pav-kv
Copy link
Collaborator

pav-kv commented Aug 15, 2022

Should we pre-condition adding the pprof labels with the server/cluster setting, like here?

if ex.server.cfg.Settings.CPUProfileType() == cluster.CPUProfileWithLabels {
remoteAddr := "internal"
if rAddr := ex.sessionData().RemoteAddr; rAddr != nil {
remoteAddr = rAddr.String()
}
var stmtNoConstants string
if prepared != nil {
stmtNoConstants = prepared.StatementNoConstants
} else {
stmtNoConstants = formatStatementHideConstants(ast)
}
labels := pprof.Labels(
"appname", ex.sessionData().ApplicationName,
"addr", remoteAddr,
"stmt.tag", ast.StatementTag(),
"stmt.no.constants", stmtNoConstants,
)
pprof.Do(ctx, labels, func(ctx context.Context) {
ev, payload, err = ex.execStmtInOpenState(ctx, parserStmt, prepared, pinfo, res, canAutoCommit)
})

If we do so, can we avoid adding the complex code with the pool etc?

The fast path described above is sensitive to having other pprof labels up the stack. As we adopt this labelling technique more widely, it's going to be more likely that this is no longer the fast path (btw, are you sure it is now? are there no labels up the stack?). Conditioning it by a global setting seems more straightforward and predictable.

@pav-kv
Copy link
Collaborator

pav-kv commented Aug 15, 2022

Also, maybe it could be a per-request/tenant/etc setting.

Or maybe another way is optimizing the pprof labels code itself :) It has a TODO in it to be more efficient. IMO, it can, similarly to Context.WithValue, maintain an incremental linked list of key/value pairs or bags, rather than rebuild it anew.

@tbg
Copy link
Member Author

tbg commented Aug 15, 2022

We discussed on Slack. Pavel brought up the good idea of not doing any perf improvements and wrapping the ctx only if currently profiling. That's also the approach take for the other labels and since this label is a lot cheaper than the ones on the SQL side, it should be workable. We will do that instead.

@blathers-crl
Copy link

blathers-crl bot commented Aug 17, 2022

cc @cockroachdb/replication

@blathers-crl blathers-crl bot added the A-kv-replication Relating to Raft, consensus, and coordination. label Aug 17, 2022
@craig craig bot closed this as completed in 39a86d3 Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants