kvserver: synchronize admin commands on the leaseholder #41392

ajwerner · 2019-10-07T18:18:33Z

Is your feature request related to a problem? Please describe.

In 19.2 and beyond, with the addition of learners and atomic rebalancing, our admin commands step generally now step through multiple range descriptor changes all of which are backed by some form of CPut. When issued concurrently these commands can step on each other leading to the need for more retries. We recently (#41385) just increased the maximum number of retries to infinity from 10 retries [1] before boiling up to the client for splits.

[1]

cockroach/pkg/storage/replica_command.go

Line 519 in 1696e57

retryOpts.MaxRetries = 10

Not all admin commands can be retried as many run with a client-provided range descriptor. When these commands interfere they always boil back up to the caller which often is a queue. These queues then act as another retry loop above the admin command. The simple retry loop worked when it was just a single round (so once you laid down an intent on the descriptor you'd be able to succeed), but now there's no reason to allow admin commands on the same range (which are all issued by the same leaseholder) to race against each other.

Describe the solution you'd like

Admin commands always attempt to run on the leaseholder. We could eliminate the vast majority of races by using in-memory synchronization on the leaseholder. A simple solution would be to add an adminMu to the Replica which is acquired when processing AdminSplit, AdminUnsplit and AdminChangeReplicas commands. Note that we do not acquire this mutex in AdminRelocateRange or AdminMerge as those commands issue AdminChangeReplicas underneath.

It will be important to unblock these waiters when the lease changes.

Relates to #41028.

Jira issue: CRDB-5453

The text was updated successfully, but these errors were encountered:

github-actions · 2023-09-19T11:10:59Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

ajwerner added the A-kv-distribution Relating to rebalancing and leasing. label Oct 7, 2019

ajwerner mentioned this issue Oct 28, 2019

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #41876

Closed

nvanbenschoten added this to the Later milestone Mar 12, 2020

nvanbenschoten mentioned this issue Mar 26, 2020

kv: kvnemesis can thrash and livelock on multiple concurrent Range merges #46639

Closed

ajwerner mentioned this issue Dec 4, 2020

kvserver: properly sequence replication changes #57563

Closed

ajwerner mentioned this issue Apr 14, 2021

multiregionccl: deflake TestIndexCleanupAfterAlterFromRegionalByRow #63635

Merged

jlinder added the T-kv KV Team label Jun 16, 2021

ajwerner changed the title ~~storage: synchronize admin commands on the leaseholder~~ kvserver: synchronize admin commands on the leaseholder Nov 8, 2021

ajwerner mentioned this issue Nov 8, 2021

kvserver: make config change failures less scary #72546

Closed

github-actions bot added the no-issue-activity label Sep 19, 2023

github-actions bot added the X-stale label Oct 2, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023

exalate-issue-sync bot closed this as completed Oct 2, 2023

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Closed in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: synchronize admin commands on the leaseholder #41392

kvserver: synchronize admin commands on the leaseholder #41392

ajwerner commented Oct 7, 2019 •

edited by cockroach-jira-scripts

Loading

github-actions bot commented Sep 19, 2023

kvserver: synchronize admin commands on the leaseholder #41392

kvserver: synchronize admin commands on the leaseholder #41392

Comments

ajwerner commented Oct 7, 2019 • edited by cockroach-jira-scripts Loading

github-actions bot commented Sep 19, 2023

ajwerner commented Oct 7, 2019 •

edited by cockroach-jira-scripts

Loading