kv: Don't automatically rerun replicate_queue #85219

andrewbaptist · 2022-07-28T13:30:31Z

Previously after the replicate queue completed processing a
change, it would immediately see if there were other changes
that could be done to a range. This bypassed the normal
priority order and meant that a replica might process a low
priority, expensive job before a high priority one. The intented
behavior is to check if there is additional work for the replica
and requeue at the correct priority.

Release note: None

cockroach-teamcity · 2022-07-28T13:30:46Z

This change is

pkg/kv/kvserver/replicate_queue.go

andrewbaptist

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)

pkg/kv/kvserver/replicate_queue.go line 550 at r1 (raw file):

Previously, kvoli (Austen) wrote…

I think this should go within the if statement below.

Done

andrewbaptist · 2022-07-28T19:44:51Z

This changes the replicate_queue does not automatically run the next action, but instead queues it up to run at the appropriate priority.

Doing this created 2 unit test failures, that I dealt with differently, but I wanted to see if anyone had concerns on this:

TestLeasePreferencesDuringOutage - In this case there was an expectation that after a voter was added that we would check lease preferences immediately and move the lease to the correct node if possible. This seems like a reasonable expectation, but the way it was getting there was a little convoluted. It expected a AllocatorReplaceDeadVoter call followed immediately by a AllocatorConsiderRebalance call to move the lease. With the new structure, the second call would be low priority and in a real system what would happen is that all the dead voters would be replaced before any of the rebalances happen. Adding a lease preference check after the add voter call fixed this.

TestReplicateQueueSeesLearnerOrJointConfig - In this case, the test is creating a situation where voters are partial complete, and it expects the replicate queue to “fix everything”. I don’t understand all the transitions in this test, especially the second one where the replicate queue has to call AllocatorFinalizeAtomicReplicationChange twice. I didn’t change the behavior of this test other than requiring enqueue to be called multiple times, and it might be because of the withStopAfterLearnerAtomic parameter being why the AddVoter is not successful the first time.

kvoli

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)

-- commits line 6 at r2:
nit: done to a range, rather than replica.

-- commits line 7 at r2:
nit: replic -> replica.

pkg/kv/kvserver/replicate_queue.go line 925 at r2 (raw file):

		// We require the lease in order to process replicas, so
		// repl.store.StoreID() corresponds to the lease-holder's store ID.
		_, err := rq.shedLease(

If we did shed the lease here, then there shouldn't be a requeue for this replica because it won't have a lease. Can we set !lhBeingRemoved=true if rq.ShedLease returns true.

The replica in the queue will try and acquire the lease and (likely) fail the raft leader check its processed again, we can avoid the added processing.

pkg/kv/kvserver/replica_learner_test.go line 751 at r2 (raw file):

	tc := testcluster.StartTestCluster(t, 3, base.TestClusterArgs{
		ServerArgs:      base.TestServerArgs{Knobs: knobs},
		ReplicationMode: base.ReplicationManual,

since it’s in replication manual mode, the requeues in the process loop are being ignored. So it won't be testing your patch completely here.

cockroach/pkg/kv/kvserver/queue.go

Lines 639 to 641 in 9b5a2b7

    
           bq.mu.Lock() 
        
           stopped := bq.mu.stopped || bq.mu.disabled 
        
           bq.mu.Unlock()

pkg/kv/kvserver/replica_learner_test.go line 773 at r2 (raw file):

		}
		for _, message := range expectedMessages {
			trace, processErr, err := store.Enqueue(

See below on manually requeuing.

pkg/kv/kvserver/replica_learner_test.go line 797 at r2 (raw file):

			allocatorimpl.AllocatorFinalizeAtomicReplicationChange,
			allocatorimpl.AllocatorAddVoter,
			allocatorimpl.AllocatorFinalizeAtomicReplicationChange,

This is in the scope for the second stop, ltk.withStopAfterJointConfig, it will never finalize the atomic change itself until another replicate queue does it. So there are two AllocatorFinalizeAtomicReplicationChanges here.

pkg/kv/kvserver/replica_learner_test.go line 801 at r2 (raw file):

		}
		for _, message := range expectedMessages {
			trace, processErr, err := store.Enqueue(

It would be good to test the requeuing functionality does what is expected for this test, rather than manually requeuing.

You could try disabling the replica scanner:

cockroach/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go

Line 583 in 9b5a2b7

Then after ltk.withStopAfterLearnerAtomic you then enable the replicate queue.

Then assert on the action in processOneChange:

cockroach/pkg/kv/kvserver/replicate_queue.go

Line 602 in 9b5a2b7

log.VEventf(ctx, 1, "next replica action: %s", action)

Likewise for the other stop.

andrewbaptist

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)

pkg/kv/kvserver/replicate_queue.go line 925 at r2 (raw file):

Previously, kvoli (Austen) wrote…

If we did shed the lease here, then there shouldn't be a requeue for this replica because it won't have a lease. Can we set !lhBeingRemoved=true if rq.ShedLease returns true.

The replica in the queue will try and acquire the lease and (likely) fail the raft leader check its processed again, we can avoid the added processing.

OK, good call on this.

pkg/kv/kvserver/replica_learner_test.go line 751 at r2 (raw file):

Previously, kvoli (Austen) wrote…

since it’s in replication manual mode, the requeues in the process loop are being ignored. So it won't be testing your patch completely here.

cockroach/pkg/kv/kvserver/queue.go

Lines 639 to 641 in 9b5a2b7

bq.mu.Lock()

stopped := bq.mu.stopped || bq.mu.disabled

bq.mu.Unlock()

OK, changed the test

pkg/kv/kvserver/replica_learner_test.go line 773 at r2 (raw file):

Previously, kvoli (Austen) wrote…

See below on manually requeuing.

OK, done

pkg/kv/kvserver/replica_learner_test.go line 797 at r2 (raw file):

Previously, kvoli (Austen) wrote…

This is in the scope for the second stop, ltk.withStopAfterJointConfig, it will never finalize the atomic change itself until another replicate queue does it. So there are two AllocatorFinalizeAtomicReplicationChanges here.

OK, makes sense

pkg/kv/kvserver/replica_learner_test.go line 801 at r2 (raw file):

Previously, kvoli (Austen) wrote…

It would be good to test the requeuing functionality does what is expected for this test, rather than manually requeuing.

You could try disabling the replica scanner:

cockroach/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go

Line 583 in 9b5a2b7

Then after ltk.withStopAfterLearnerAtomic you then enable the replicate queue.

Then assert on the action in processOneChange:

cockroach/pkg/kv/kvserver/replicate_queue.go

Line 602 in 9b5a2b7

log.VEventf(ctx, 1, "next replica action: %s", action)

Likewise for the other stop.

Done, this idea worked well

kvoli

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, and @kvoli)

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 628 at r3 (raw file):

	}

	return a.computeAction(ctx, conf, desc.Replicas().VoterDescriptors(),

Why did you drop the internal method computeAction here?

andrewbaptist

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @kvoli)

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 628 at r3 (raw file):

Previously, kvoli (Austen) wrote…

Why did you drop the internal method computeAction here?

The method was only called in one case by the method right above it and artificially split between what was checked in the first vs the second. I should probably not have included this change in this PR though (or maybe as a separate commit) as it is unrelated to some of the other changes. It just threw me off first when I was looking at ComputeAction vs computeAction and I was trying to understand why we needed some checks in the first method and some in the second. When I realized there was no technical reason for the split, it seemed cleaner.

kvoli

This change seems significant enough to warrant some small roachprod testing to see the effects "overall" and make sure everything works as expected.

You mentioned @AlexTalks has a good test. You could run the largest existing roachtest (compare to a control).

./build/builder.sh mkrelease
dev build roachtest
roachtest run decommissionBench/nodes=8/cpu=16/warehouses=3000
roachprod list
# find your cluster to get the $cluster_name - it will be your username-xxxx
roachprod adminui $cluster_name
# monitor the test.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, and @kvoli)

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 628 at r3 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

The method was only called in one case by the method right above it and artificially split between what was checked in the first vs the second. I should probably not have included this change in this PR though (or maybe as a separate commit) as it is unrelated to some of the other changes. It just threw me off first when I was looking at ComputeAction vs computeAction and I was trying to understand why we needed some checks in the first method and some in the second. When I realized there was no technical reason for the split, it seemed cleaner.

Given this pattern is used for most of the allocator methods, where there is a mostly empty exported method calling an unexported method - It would be better to leave it out of this patch, then reconsider all of them at once rather than adhoc.

AlexTalks

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, and @kvoli)

pkg/kv/kvserver/replicate_queue.go line 925 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

OK, good call on this.

Discussed in person - want to take a look to see if this is the correct place to do this (or what else needs to/should change).

AlexTalks

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, and @kvoli)

pkg/kv/kvserver/replicate_queue.go line 532 at r4 (raw file):

			// few seconds. By retrying immediately we will choose the "second best"
			// node to send to.
			// TODO: This is probably suboptimal behavior. In the case where there is

nit: TODO(andrewbaptist) (or, for that matter, put TODO(sarkesian) if needed).

I think the linter would catch this in any case.

andrewbaptist

Tested with Alex, still not "ideal" behavior because stores finish at different times, but better. Alex is analyzing results

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @kvoli)

pkg/kv/kvserver/replicate_queue.go line 925 at r2 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Discussed in person - want to take a look to see if this is the correct place to do this (or what else needs to/should change).

OK, The test TestLeasePreferencesDuringOutage fails without this change. Specifically, it is expecting a lease change to happen after we addOrReplaceVoters is called. This test passed before because after we re-ran the replicate queue, we would end up calling considerRebalance which would then rebalance and as a fortunate result also move the lease. This test was a little broken in doing what it is trying to do since if there wasn't a rebalance, this wouldn't have worked. Now the test is a little more accurate. It might make sense long term to always call maybeTransferLeaseAway after any change is made since normally it won't do anything if lease preference is already correct, but that could be overkill.

pkg/kv/kvserver/replicate_queue.go line 532 at r4 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: TODO(andrewbaptist) (or, for that matter, put TODO(sarkesian) if needed).

I think the linter would catch this in any case.

Done

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 628 at r3 (raw file):

Previously, kvoli (Austen) wrote…

Given this pattern is used for most of the allocator methods, where there is a mostly empty exported method calling an unexported method - It would be better to leave it out of this patch, then reconsider all of them at once rather than adhoc.

Done, reverted in this PR

This naming was incorrect after cockroachdb#85219. The replicate queue will re-queue , rather than immediately re-process a replica after a successful process. Epic: none Release note: None

This naming was incorrect after #85219. The replicate queue will re-queue , rather than immediately re-process a replica after a successful process. Epic: none Release note: None

Previously, `TestLeasePreferenceDuringOutage` would force replication queue processing of the test range, then assert that the range up-replicated and lease transferred to a preferred locality. This test was skipped, and two of the assumptions it relied on to pass were no longer true. After cockroachdb#85219, the replicate queue no longer re-processes replicas. Instead, the queue requeues replicas after processing, at the appropriate priority. This broke the test due to the replicate queue being disabled, making the re-queue a no-op. After cockroachdb#94023, the replicate queue no longer looked for lease transfers, after processing a replication action. Combined with cockroachdb#85219, the queue would now be guaranteed to not process both up-replication and lease transfers from a single enqueue. Update the test to not require a manual process, instead using a queue range filter, which allows tests which disable automatic replication, to still process filtered ranges via the various replica queues. Also, ensure that the non-stopped stores are considered live targets, after simulating an outage (bumping manual clocks, stopping servers) -- so that the expected up-replication, then lease transfer can proceed. Fixes: cockroachdb#88769 Release note: None

108175: kvserver: unskip lease preferences during outage r=andrewbaptist a=kvoli Previously, `TestLeasePreferenceDuringOutage` would force replication queue processing of the test range, then assert that the range up-replicated and lease transferred to a preferred locality. This test was skipped, and two of the assumptions it relied on to pass were no longer true. After #85219, the replicate queue no longer re-processes replicas. Instead, the queue requeues replicas after processing, at the appropriate priority. This broke the test due to the replicate queue being disabled, making the re-queue a no-op. After #94023, the replicate queue no longer looked for lease transfers, after processing a replication action. Combined with #85219, the queue would now be guaranteed to not process both up-replication and lease transfers from a single enqueue. Update the test to not require a manual process, instead using a queue range filter, which allows tests which disable automatic replication, to still process filtered ranges via the various replica queues. Also, ensure that the non-stopped stores are considered live targets, after simulating an outage (bumping manual clocks, stopping servers) -- so that the expected up-replication, then lease transfer can proceed. Fixes: #88769 Release note: None 109432: cluster-ui: handle partial response errors on the database details page r=THardy98 a=THardy98 Part of: #102386 **Demos** (Note: these demos show this same logic applied to both the databases and database table pages as well): DB-Console - https://www.loom.com/share/5108dd655ad342f28323e72eaf68219c - https://www.loom.com/share/1973383dacd7494a84e10bf39e5b85a3 This change applies the same error handling ideas from #109245 to the database details page, enabling non-admin users to use the database details page and providing better transparency to data fetching issues. Errors encountered while fetching table details can be viewed via the tooltip provided by the `Caution` icon at the table's name. `unavailable` cells also provide a tooltip that displays the error impacting that exact cell. Release note (ui change): Non-admin users are able to use the database details page. 110292: c2c: use seperate spanConfigEventStreamSpec in the span config event stream r=stevendanna a=msbutler Previously, the spanConfigEventStream used a streamPartitionSpec, which contained a bunch of fields unecessary for span config streaming. This patch creates a new spanConfigEventStreamSpec which contains the fields only necessary for span config event streaming. Informs #109059 Release note: None 110309: teamcity-trigger: ensure that `race` tag is only passed once r=healthy-pod a=healthy-pod By running under `-race`, the go command defines the `race` build tag for us [1]. Previously, we defined it under `TAGS` to let the issue poster know that this is a failure under `race` and indicate that in the issue. At the time, defining the tag twice didn't cause issues but after #109773, it led to build failures [2]. To reproduce locally: ``` bazel test -s --config=race pkg/util/ctxgroup:all --test_env=GOTRACEBACK=all --define gotags=bazel,gss,race ``` As a follow-up, we should find another way to let the issue poster know that a failure was running under `race`. [1] https://go.dev/doc/articles/race_detector#Excluding_Tests [2] #109994 (comment) Epic: none Release note: None Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Thomas Hardy <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: healthy-pod <[email protected]>

andrewbaptist force-pushed the requeue_allocator branch from 40731c7 to c19954a Compare July 28, 2022 14:18

kvoli reviewed Jul 28, 2022

View reviewed changes

pkg/kv/kvserver/replicate_queue.go Outdated Show resolved Hide resolved

andrewbaptist force-pushed the requeue_allocator branch from c19954a to e783fc4 Compare July 28, 2022 14:39

andrewbaptist commented Jul 28, 2022

View reviewed changes

andrewbaptist force-pushed the requeue_allocator branch 2 times, most recently from 0215bd5 to 6d38e46 Compare July 28, 2022 19:09

andrewbaptist linked an issue Jul 28, 2022 that may be closed by this pull request

kv/kvserver: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic failed #84058

Closed

andrewbaptist removed a link to an issue Jul 28, 2022

kv/kvserver: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic failed #84058

Closed

andrewbaptist marked this pull request as ready for review July 28, 2022 20:10

andrewbaptist requested a review from a team as a code owner July 28, 2022 20:10

kvoli requested changes Jul 29, 2022

View reviewed changes

andrewbaptist force-pushed the requeue_allocator branch 2 times, most recently from c7c6bc5 to 8713e58 Compare July 29, 2022 16:08

andrewbaptist commented Jul 29, 2022

View reviewed changes

andrewbaptist force-pushed the requeue_allocator branch 2 times, most recently from 13d1e6c to 6fe2a3c Compare August 1, 2022 15:22

kvoli requested a review from AlexTalks August 1, 2022 19:05

kvoli reviewed Aug 1, 2022

View reviewed changes

andrewbaptist commented Aug 1, 2022

View reviewed changes

kvoli reviewed Aug 1, 2022

View reviewed changes

andrewbaptist force-pushed the requeue_allocator branch from 6fe2a3c to 3833958 Compare August 1, 2022 20:42

AlexTalks reviewed Aug 3, 2022

View reviewed changes

andrewbaptist mentioned this pull request Aug 3, 2022

kvserver: prioritize replication activity #81935

Closed

andrewbaptist force-pushed the requeue_allocator branch from 3833958 to 7b042bc Compare August 3, 2022 20:01

andrewbaptist commented Aug 3, 2022

View reviewed changes

andrewbaptist force-pushed the requeue_allocator branch 2 times, most recently from 9c249ec to b1e2680 Compare August 3, 2022 22:35

kvoli mentioned this pull request Aug 11, 2023

kvserver: unskip lease preferences during outage #108175

Merged

andrewbaptist deleted the requeue_allocator branch December 15, 2023 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: Don't automatically rerun replicate_queue #85219

kv: Don't automatically rerun replicate_queue #85219

andrewbaptist commented Jul 28, 2022 •

edited

Loading

cockroach-teamcity commented Jul 28, 2022

andrewbaptist left a comment

andrewbaptist commented Jul 28, 2022

kvoli left a comment

andrewbaptist left a comment

kvoli left a comment

andrewbaptist left a comment

kvoli left a comment •

edited

Loading

AlexTalks left a comment

AlexTalks left a comment

andrewbaptist left a comment

	bq.mu.Lock()
	stopped := bq.mu.stopped \|\| bq.mu.disabled
	bq.mu.Unlock()

kv: Don't automatically rerun replicate_queue #85219

kv: Don't automatically rerun replicate_queue #85219

Conversation

andrewbaptist commented Jul 28, 2022 • edited Loading

cockroach-teamcity commented Jul 28, 2022

andrewbaptist left a comment

Choose a reason for hiding this comment

andrewbaptist commented Jul 28, 2022

kvoli left a comment

Choose a reason for hiding this comment

andrewbaptist left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

andrewbaptist left a comment

Choose a reason for hiding this comment

kvoli left a comment • edited Loading

Choose a reason for hiding this comment

AlexTalks left a comment

Choose a reason for hiding this comment

AlexTalks left a comment

Choose a reason for hiding this comment

andrewbaptist left a comment

Choose a reason for hiding this comment

andrewbaptist commented Jul 28, 2022 •

edited

Loading

kvoli left a comment •

edited

Loading