Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rangecache: improve tracing of range descriptor lookups #70741

Merged
merged 1 commit into from
Sep 28, 2021

Conversation

erikgrinaker
Copy link
Contributor

Release note: None


It would be nifty if we could attach the actual lookup trace (i.e. the forked span for the RPC) to the waiters' spans once the lookup completes, such that it is visible for all waiters, but I'm not sure if our tracing supports that.

@erikgrinaker erikgrinaker requested a review from a team September 25, 2021 17:51
@erikgrinaker erikgrinaker self-assigned this Sep 25, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nifty if we could attach the actual lookup trace (i.e. the forked span for the RPC) to the waiters' spans once the lookup completes, such that it is visible for all waiters, but I'm not sure if our tracing supports that.

I don't think there's a good reason to fork the span instead of creating a child span (EnsureChildSpan). The forking was introduced in e05477c ; I think it should have been a child all along.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)


pkg/kv/kvclient/rangecache/range_cache.go, line 637 at r1 (raw file):

	}

	log.VEventf(ctx, 2, "looking up range descriptor: key=%s (reverse: %t)", key, useReverseScan)

pls take the opportunity to check if useReverScan escapes just because of this logging and if it does, let's exclude it from the log. You know about scripts/goescape, right?

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a good reason to fork the span instead of creating a child span (EnsureChildSpan). The forking was introduced in e05477c ; I think it should have been a child all along.

Cool, done. I was thinking it might be useful if we could attach the trace to any waiters too, not just the leader, in case we're looking at a random slow query that ends up waiting for a slow, in-flight range lookup. But I'm not sure if that's possible or even if the trace would make sense?

Note that the caller may return early due to a context cancellation, I'm assuming it's still safe to use its trace during the lookup.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)


pkg/kv/kvclient/rangecache/range_cache.go, line 637 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls take the opportunity to check if useReverScan escapes just because of this logging and if it does, let's exclude it from the log. You know about scripts/goescape, right?

Good call, it does -- removed. Thanks for the tip on goescape, that's useful.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: if you get rid of the AmbientCtx.

I was thinking it might be useful if we could attach the trace to any waiters too, not just the leader, in case we're looking at a random slow query that ends up waiting for a slow, in-flight range lookup. But I'm not sure if that's possible or even if the trace would make sense?

It would be cool in principle, and it has come up elsewhere too, but currently our tracing infrastructure doesn't support this kind of reference between spans. OpenTelemetry does have a general concept of a link between spans (which we don't expose), so there is precedent, but on the other hand the OpenTelemetry story on span relationships is very much still evolving so we'll see what sticks.

Note that the caller may return early due to a context cancellation, I'm assuming it's still safe to use its trace during the lookup.

Yes; child spans can outlive the parent (although that's not the common intention).

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)


pkg/kv/kvclient/kvcoord/dist_sender.go, line 399 at r3 (raw file):

	}
	ds.rangeCache = rangecache.NewRangeCache(ds.st, rdb, getRangeDescCacheSize,
		cfg.RPCContext.Stopper, cfg.AmbientCtx)

I don't think you want to pass this AmbientCtx here. It's already bad that the DistSenderConfig has an AmbientCtx. The AmbientCtx is not meant to be passed around - at least not usually. The point of the AmbientCtx is to store log tags specific to a particular component. So whoever has tags to add should create an AmbientCtx and hold on to it, not pass it to other people.
More on it below.


pkg/kv/kvclient/rangecache/range_cache.go, line 80 at r3 (raw file):

	st         *cluster.Settings
	stopper    *stop.Stopper
	ambientCtx log.AmbientContext

I don't think the RangeCache wants to have an AmbientCtx (let alone one that's passed in, as I was saying above). It's not using it to annotate any contexts. The fact it also holds a Tracer is incidental; that Tracer should prolly not be exported. Just take in a Tracer.


pkg/sql/rowexec/tablereader_test.go, line 130 at r3 (raw file):

				ts.Table = *td.TableDesc()

				st := s.ClusterSettings()

let's hide s.ClusterSettings().Tracer into a s.Tracer(). It has always bothered me that the Tracer is intermingled with the ClusterSettings I don't remember how/why we've ended up like that. Regardless, let's not encourage this coupling.

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)


pkg/sql/rowexec/tablereader_test.go, line 130 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

let's hide s.ClusterSettings().Tracer into a s.Tracer(). It has always bothered me that the Tracer is intermingled with the ClusterSettings I don't remember how/why we've ended up like that. Regardless, let's not encourage this coupling.

Yeah, I think the rationale is that Settings is this grab-bag of random stuff that we're reasonably likely to have access to in most contexts. Changed to TestServer.Tracer() where possible, but most tests still end up using Settings.Tracer as that's the only thing they have access to (via cluster.MakeTestingClusterSettings).

@erikgrinaker
Copy link
Contributor Author

TFTR! CI failure is unrelated flake.

bors r=andreimatei

@craig
Copy link
Contributor

craig bot commented Sep 28, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants