-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rangecache: improve tracing of range descriptor lookups #70741
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nifty if we could attach the actual lookup trace (i.e. the forked span for the RPC) to the waiters' spans once the lookup completes, such that it is visible for all waiters, but I'm not sure if our tracing supports that.
I don't think there's a good reason to fork the span instead of creating a child span (EnsureChildSpan
). The forking was introduced in e05477c ; I think it should have been a child all along.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
pkg/kv/kvclient/rangecache/range_cache.go, line 637 at r1 (raw file):
} log.VEventf(ctx, 2, "looking up range descriptor: key=%s (reverse: %t)", key, useReverseScan)
pls take the opportunity to check if useReverScan
escapes just because of this logging and if it does, let's exclude it from the log. You know about scripts/goescape
, right?
2e10c36
to
b9e1cc2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a good reason to fork the span instead of creating a child span (
EnsureChildSpan
). The forking was introduced in e05477c ; I think it should have been a child all along.
Cool, done. I was thinking it might be useful if we could attach the trace to any waiters too, not just the leader, in case we're looking at a random slow query that ends up waiting for a slow, in-flight range lookup. But I'm not sure if that's possible or even if the trace would make sense?
Note that the caller may return early due to a context cancellation, I'm assuming it's still safe to use its trace during the lookup.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)
pkg/kv/kvclient/rangecache/range_cache.go, line 637 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls take the opportunity to check if
useReverScan
escapes just because of this logging and if it does, let's exclude it from the log. You know aboutscripts/goescape
, right?
Good call, it does -- removed. Thanks for the tip on goescape
, that's useful.
b9e1cc2
to
5281f79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you get rid of the AmbientCtx
.
I was thinking it might be useful if we could attach the trace to any waiters too, not just the leader, in case we're looking at a random slow query that ends up waiting for a slow, in-flight range lookup. But I'm not sure if that's possible or even if the trace would make sense?
It would be cool in principle, and it has come up elsewhere too, but currently our tracing infrastructure doesn't support this kind of reference between spans. OpenTelemetry does have a general concept of a link between spans (which we don't expose), so there is precedent, but on the other hand the OpenTelemetry story on span relationships is very much still evolving so we'll see what sticks.
Note that the caller may return early due to a context cancellation, I'm assuming it's still safe to use its trace during the lookup.
Yes; child spans can outlive the parent (although that's not the common intention).
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)
pkg/kv/kvclient/kvcoord/dist_sender.go, line 399 at r3 (raw file):
} ds.rangeCache = rangecache.NewRangeCache(ds.st, rdb, getRangeDescCacheSize, cfg.RPCContext.Stopper, cfg.AmbientCtx)
I don't think you want to pass this AmbientCtx
here. It's already bad that the DistSenderConfig
has an AmbientCtx
. The AmbientCtx
is not meant to be passed around - at least not usually. The point of the AmbientCtx
is to store log tags specific to a particular component. So whoever has tags to add should create an AmbientCtx
and hold on to it, not pass it to other people.
More on it below.
pkg/kv/kvclient/rangecache/range_cache.go, line 80 at r3 (raw file):
st *cluster.Settings stopper *stop.Stopper ambientCtx log.AmbientContext
I don't think the RangeCache
wants to have an AmbientCtx
(let alone one that's passed in, as I was saying above). It's not using it to annotate any contexts. The fact it also holds a Tracer
is incidental; that Tracer
should prolly not be exported. Just take in a Tracer
.
pkg/sql/rowexec/tablereader_test.go, line 130 at r3 (raw file):
ts.Table = *td.TableDesc() st := s.ClusterSettings()
let's hide s.ClusterSettings().Tracer
into a s.Tracer()
. It has always bothered me that the Tracer
is intermingled with the ClusterSettings
I don't remember how/why we've ended up like that. Regardless, let's not encourage this coupling.
5281f79
to
39f3341
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)
pkg/sql/rowexec/tablereader_test.go, line 130 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
let's hide
s.ClusterSettings().Tracer
into as.Tracer()
. It has always bothered me that theTracer
is intermingled with theClusterSettings
I don't remember how/why we've ended up like that. Regardless, let's not encourage this coupling.
Yeah, I think the rationale is that Settings
is this grab-bag of random stuff that we're reasonably likely to have access to in most contexts. Changed to TestServer.Tracer()
where possible, but most tests still end up using Settings.Tracer
as that's the only thing they have access to (via cluster.MakeTestingClusterSettings
).
Release note: None
39f3341
to
50a491d
Compare
TFTR! CI failure is unrelated flake. bors r=andreimatei |
Build succeeded: |
Release note: None
It would be nifty if we could attach the actual lookup trace (i.e. the forked span for the RPC) to the waiters' spans once the lookup completes, such that it is visible for all waiters, but I'm not sure if our tracing supports that.