Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: use channel for DisableAutomaticVersionUpgrade #76598

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

RaduBerinde
Copy link
Member

DisableAutomaticVersionUpgrade is an atomic integer which is rechecked
in a retry loop. This is not a very clean mechanism, and can lead to
issues where you're unknowingly dealing with a copy of the knobs and
setting the wrong atomic. The retry loop can also add unnecessary
delays in tests.

This commit changes DisableAutomaticVersionUpgrade from an atomic
integer to a channel. If the channel is set, auto-upgrade waits until
the channel is closed.

Release note: None

DisableAutomaticVersionUpgrade is an atomic integer which is rechecked
in a retry loop. This is not a very clean mechanism, and can lead to
issues where you're unknowingly dealing with a copy of the knobs and
setting the wrong atomic. The retry loop can also add unnecessary
delays in tests.

This commit changes DisableAutomaticVersionUpgrade from an atomic
integer to a channel. If the channel is set, auto-upgrade waits until
the channel is closed.

Release note: None
@RaduBerinde RaduBerinde requested review from ajwerner and a team February 15, 2022 17:53
@RaduBerinde RaduBerinde requested review from a team as code owners February 15, 2022 17:53
@RaduBerinde RaduBerinde requested review from a team and shermanCRL and removed request for a team February 15, 2022 17:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @shermanCRL)

@shermanCRL shermanCRL requested review from adityamaru and removed request for shermanCRL February 15, 2022 18:00
@RaduBerinde
Copy link
Member Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Feb 15, 2022
73586: rfc: optimize the draining process with connection_wait and relevant reporting systems r=ZhouXing19 a=ZhouXing19

This is a proposal for optimizing the draining process to be more legible for
customers, and introduce a new step, connection_wait, to the draining process,
which allows the server to early exit when all connections are closed.

Release note: None

76410: kv: disallow GC requests that bump GC threshold and GC expired versions r=nvanbenschoten a=nvanbenschoten

Related to #55293.

This commit adds a safeguard to GC requests that prevents them from
bumping the GC threshold at the same time that they GC individual MVCC
versions. This was found to be unsafe in #55293 because performing both
of these actions at the same time could lead to a race where a read
request is allowed to evaluate without error while also failing to see
MVCC versions that are concurrently GCed.

This race is possible because foreground traffic consults the in-memory
version of the GC threshold (`r.mu.state.GCThreshold`), which is updated
after (in `handleGCThresholdResult`), not atomically with, the application
of the GC request's WriteBatch to the LSM (in `ApplyToStateMachine`). This
allows a read request to see the effect of a GC on MVCC state without seeing
its effect on the in-memory GC threshold.

The latches acquired by GC quests look like it will help with this race,
but in practice they do not for two reasons:
1. the latches do not protect timestamps below the GC request's batch
   timestamp. This means that they only conflict with concurrent writes,
   but not all concurrent reads.
2. the read could be served off a follower, which could be applying the
   GC request's effect from the raft log. Latches held on the leaseholder
   would have no impact on a follower read.

Thankfully, the GC queue has split these two steps for the past few
releases, at least since 87e85eb, so we do not have a bug today.

The commit also adds a test that reliably exercises the bug with a few
well-placed calls to `time.Sleep`. The test contains a variant where the
read is performed on the leaseholder and a variant where it is performed
on a follower. Both fail by default. If we switch the GC request to
acquire non-MVCC latches then the leaseholder variant passes, but the
follower read variant still fails.

76417: ccl/sqlproxyccl: add connector component and support for session revival token r=JeffSwenson a=jaylim-crl

Informs #76000.

Previously, all the connection establishment logic is coupled with the handler
function within proxy_handler.go. This makes connecting to a new SQL pod during
connection migration difficult. This commit refactors all of those connection
logic out of the proxy handler into a connector component, as described in the
connection migration RFC. At the same time, we also add support for the session
revival token within this connector component.

Note that the overall behavior of the SQL proxy should be unchanged with this
commit.

Release note: None

76545: cmd/reduce: add -tlp option r=yuzefovich a=yuzefovich

**cmd/reduce: remove stdin option and require -file argument**

We tend to not use the option of passing input SQL via stdin, so this
commit removes it. An additional argument in favor of doing that is that
the follow-up commit will introduce another mode of behavior that
requires `-file` argument to be specified, so it's just cleaner to
always require it now.

Release note: None

**cmd/reduce: add -tlp option**

This commit adds `-tlp` boolean flag that changes the behavior of
`reduce`. It is required that `-file` is specified whenever the `-tlp`
flag is used. The behavior is such that the last two queries (delimited
by empty lines) in the file contain unpartitioned and partitioned queries
that return different results although they are equivalent.

If TLP check is requested, then we remove the last two queries from the
input which we use then to construct a special TLP check query that
results in an error if two removed queries return different results.

We do not just include the TLP check query into the input string because
the reducer would then reduce the check query itself, making the
reduction meaningless.

Release note: None

76598: server: use channel for DisableAutomaticVersionUpgrade r=RaduBerinde a=RaduBerinde

DisableAutomaticVersionUpgrade is an atomic integer which is rechecked
in a retry loop. This is not a very clean mechanism, and can lead to
issues where you're unknowingly dealing with a copy of the knobs and
setting the wrong atomic. The retry loop can also add unnecessary
delays in tests.

This commit changes DisableAutomaticVersionUpgrade from an atomic
integer to a channel. If the channel is set, auto-upgrade waits until
the channel is closed.

Release note: None

Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Jay <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build succeeded:

@craig craig bot merged commit 1452123 into cockroachdb:master Feb 15, 2022
@RaduBerinde RaduBerinde deleted the disable-upgrade-ch branch February 16, 2022 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants