Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiregionccl,server: use cached sqlliveness.Reader, deflake ColdStartLatencyTest #98368

Merged

Conversation

ajwerner
Copy link
Contributor

multitenantccl: deflake ColdStartLatencyTest

This test was flakey due to the closed timestamp sometimes not leading far
for global tables due to overload, and due to a cached liveness reader
not being used in distsql. The former was fixed in previous commits. The
latter is fixed here.

Fixes: #96334

sql: use CachedReader for uses with sqlinstance and the sql builtins

The CachedReader won't block, which in multi-region clusters is good. It will
mean that in some cases, it'll state that a sessions is alive when it most
certainly is not. Currently, nobody needs synchronous semantics.

This is a major part of fixing the TestColdStartLatency as sometimes
distsql planning would block. That's not acceptable -- the idea that
query physical planning can need to wait for a cross-region RPC is
unacceptable.

sqlliveness: re-arrange APIs to clarify when the API was blocking

By "default" the implementation of Reader was blocking and there was a method
to get a handle to a non-blocking CachedReader(). This asymmetry does not aid
understandability. The non-blocking reader came later, but it is generally the
more desirable interface.

Release note: None

@ajwerner ajwerner requested a review from a team March 10, 2023 12:10
@ajwerner ajwerner requested review from a team as code owners March 10, 2023 12:10
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/fix-cold-start-latency-test branch from 674ca6a to 299babf Compare March 10, 2023 22:02
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r1, 2 of 2 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @jeffswenson)


pkg/sql/planner.go line 117 at r3 (raw file):

	evalCtx.Codec = execCfg.Codec
	evalCtx.Tracer = execCfg.AmbientCtx.Tracer
	if execCfg.SQLLiveness != nil { // nil in some tests

Why did this change?

Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r1, 2 of 2 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @jeffswenson)

By "default" the implementation of Reader was blocking and there was a method
to get a handle to a non-blocking CachedReader(). This asymmetry does not aid
understandability. The non-blocking reader came later, but it is generally the
more desirable interface.

Release note: None
The CachedReader won't block, which in multi-region clusters is good. It will
mean that in some cases, it'll state that a sessions is alive when it most
certainly is not. Currently, nobody needs synchronous semantics.

This is a major part of fixing the TestColdStartLatency as sometimes
distsql planning would block. That's not acceptable -- the idea that
query physical planning can need to wait for a cross-region RPC is
unacceptable.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/fix-cold-start-latency-test branch from 299babf to d6ca5ee Compare March 14, 2023 02:32
This test was flakey due to the closed timestamp sometimes not leading far
for global tables due to overload, and due to a cached liveness reader
not being used in distsql. The former was fixed in previous commits. The
latter is fixed here.

Fixes: cockroachdb#96334

Release note: None
`(*stop.Stopper).AddCloser` registers a closer to be called after all
async tasks have exited. The cache was running, waiting to be closed.
We instead need to hook up its context to exit when the stopper is
quiescing.

Epic: none

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/fix-cold-start-latency-test branch from d6ca5ee to 358f86d Compare March 14, 2023 04:37
@ajwerner
Copy link
Contributor Author

Latest commit is #98551

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @fqazi, and @jeffswenson)


pkg/sql/planner.go line 117 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Why did this change?

Because now we're calling a method on the object. In those tests, nothing used the nil value. The evalCtx and its rules about what needs a concrete implementation are a disaster.

@craig
Copy link
Contributor

craig bot commented Mar 14, 2023

Build succeeded:

@craig craig bot merged commit ca5ae38 into cockroachdb:master Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multiregionccl: flake in TestColdStartLatency due to timeout during user login
4 participants