Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: Don't automatically rerun replicate_queue #85219

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

andrewbaptist
Copy link
Collaborator

@andrewbaptist andrewbaptist commented Jul 28, 2022

Previously after the replicate queue completed processing a
change, it would immediately see if there were other changes
that could be done to a range. This bypassed the normal
priority order and meant that a replica might process a low
priority, expensive job before a high priority one. The intented
behavior is to check if there is additional work for the replica
and requeue at the correct priority.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replicate_queue.go line 550 at r1 (raw file):

Previously, kvoli (Austen) wrote…

I think this should go within the if statement below.

Done

@andrewbaptist andrewbaptist force-pushed the requeue_allocator branch 2 times, most recently from 0215bd5 to 6d38e46 Compare July 28, 2022 19:09
@andrewbaptist
Copy link
Collaborator Author

This changes the replicate_queue does not automatically run the next action, but instead queues it up to run at the appropriate priority.

Doing this created 2 unit test failures, that I dealt with differently, but I wanted to see if anyone had concerns on this:

TestLeasePreferencesDuringOutage - In this case there was an expectation that after a voter was added that we would check lease preferences immediately and move the lease to the correct node if possible. This seems like a reasonable expectation, but the way it was getting there was a little convoluted. It expected a AllocatorReplaceDeadVoter call followed immediately by a AllocatorConsiderRebalance call to move the lease. With the new structure, the second call would be low priority and in a real system what would happen is that all the dead voters would be replaced before any of the rebalances happen. Adding a lease preference check after the add voter call fixed this.

TestReplicateQueueSeesLearnerOrJointConfig - In this case, the test is creating a situation where voters are partial complete, and it expects the replicate queue to “fix everything”. I don’t understand all the transitions in this test, especially the second one where the replicate queue has to call AllocatorFinalizeAtomicReplicationChange twice. I didn’t change the behavior of this test other than requiring enqueue to be called multiple times, and it might be because of the withStopAfterLearnerAtomic parameter being why the AddVoter is not successful the first time.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)


-- commits line 6 at r2:
nit: done to a range, rather than replica.


-- commits line 7 at r2:
nit: replic -> replica.


pkg/kv/kvserver/replicate_queue.go line 925 at r2 (raw file):

		// We require the lease in order to process replicas, so
		// repl.store.StoreID() corresponds to the lease-holder's store ID.
		_, err := rq.shedLease(

If we did shed the lease here, then there shouldn't be a requeue for this replica because it won't have a lease. Can we set !lhBeingRemoved=true if rq.ShedLease returns true.

The replica in the queue will try and acquire the lease and (likely) fail the raft leader check its processed again, we can avoid the added processing.


pkg/kv/kvserver/replica_learner_test.go line 751 at r2 (raw file):

	tc := testcluster.StartTestCluster(t, 3, base.TestClusterArgs{
		ServerArgs:      base.TestServerArgs{Knobs: knobs},
		ReplicationMode: base.ReplicationManual,

since it’s in replication manual mode, the requeues in the process loop are being ignored. So it won't be testing your patch completely here.

bq.mu.Lock()
stopped := bq.mu.stopped || bq.mu.disabled
bq.mu.Unlock()


pkg/kv/kvserver/replica_learner_test.go line 773 at r2 (raw file):

		}
		for _, message := range expectedMessages {
			trace, processErr, err := store.Enqueue(

See below on manually requeuing.


pkg/kv/kvserver/replica_learner_test.go line 797 at r2 (raw file):

			allocatorimpl.AllocatorFinalizeAtomicReplicationChange,
			allocatorimpl.AllocatorAddVoter,
			allocatorimpl.AllocatorFinalizeAtomicReplicationChange,

This is in the scope for the second stop, ltk.withStopAfterJointConfig, it will never finalize the atomic change itself until another replicate queue does it. So there are two AllocatorFinalizeAtomicReplicationChanges here.


pkg/kv/kvserver/replica_learner_test.go line 801 at r2 (raw file):

		}
		for _, message := range expectedMessages {
			trace, processErr, err := store.Enqueue(

It would be good to test the requeuing functionality does what is expected for this test, rather than manually requeuing.

You could try disabling the replica scanner:

Then after ltk.withStopAfterLearnerAtomic you then enable the replicate queue.

Then assert on the action in processOneChange:

log.VEventf(ctx, 1, "next replica action: %s", action)

Likewise for the other stop.

@andrewbaptist andrewbaptist force-pushed the requeue_allocator branch 2 times, most recently from c7c6bc5 to 8713e58 Compare July 29, 2022 16:08
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replicate_queue.go line 925 at r2 (raw file):

Previously, kvoli (Austen) wrote…

If we did shed the lease here, then there shouldn't be a requeue for this replica because it won't have a lease. Can we set !lhBeingRemoved=true if rq.ShedLease returns true.

The replica in the queue will try and acquire the lease and (likely) fail the raft leader check its processed again, we can avoid the added processing.

OK, good call on this.


pkg/kv/kvserver/replica_learner_test.go line 751 at r2 (raw file):

Previously, kvoli (Austen) wrote…

since it’s in replication manual mode, the requeues in the process loop are being ignored. So it won't be testing your patch completely here.

bq.mu.Lock()
stopped := bq.mu.stopped || bq.mu.disabled
bq.mu.Unlock()

OK, changed the test


pkg/kv/kvserver/replica_learner_test.go line 773 at r2 (raw file):

Previously, kvoli (Austen) wrote…

See below on manually requeuing.

OK, done


pkg/kv/kvserver/replica_learner_test.go line 797 at r2 (raw file):

Previously, kvoli (Austen) wrote…

This is in the scope for the second stop, ltk.withStopAfterJointConfig, it will never finalize the atomic change itself until another replicate queue does it. So there are two AllocatorFinalizeAtomicReplicationChanges here.

OK, makes sense


pkg/kv/kvserver/replica_learner_test.go line 801 at r2 (raw file):

Previously, kvoli (Austen) wrote…

It would be good to test the requeuing functionality does what is expected for this test, rather than manually requeuing.

You could try disabling the replica scanner:

Then after ltk.withStopAfterLearnerAtomic you then enable the replicate queue.

Then assert on the action in processOneChange:

log.VEventf(ctx, 1, "next replica action: %s", action)

Likewise for the other stop.

Done, this idea worked well

@andrewbaptist andrewbaptist force-pushed the requeue_allocator branch 2 times, most recently from 13d1e6c to 6fe2a3c Compare August 1, 2022 15:22
@kvoli kvoli requested a review from AlexTalks August 1, 2022 19:05
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, and @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 628 at r3 (raw file):

	}

	return a.computeAction(ctx, conf, desc.Replicas().VoterDescriptors(),

Why did you drop the internal method computeAction here?

Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 628 at r3 (raw file):

Previously, kvoli (Austen) wrote…

Why did you drop the internal method computeAction here?

The method was only called in one case by the method right above it and artificially split between what was checked in the first vs the second. I should probably not have included this change in this PR though (or maybe as a separate commit) as it is unrelated to some of the other changes. It just threw me off first when I was looking at ComputeAction vs computeAction and I was trying to understand why we needed some checks in the first method and some in the second. When I realized there was no technical reason for the split, it seemed cleaner.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems significant enough to warrant some small roachprod testing to see the effects "overall" and make sure everything works as expected.

You mentioned @AlexTalks has a good test. You could run the largest existing roachtest (compare to a control).

./build/builder.sh mkrelease
dev build roachtest
roachtest run decommissionBench/nodes=8/cpu=16/warehouses=3000
roachprod list
# find your cluster to get the $cluster_name - it will be your username-xxxx
roachprod adminui $cluster_name
# monitor the test.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, and @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 628 at r3 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

The method was only called in one case by the method right above it and artificially split between what was checked in the first vs the second. I should probably not have included this change in this PR though (or maybe as a separate commit) as it is unrelated to some of the other changes. It just threw me off first when I was looking at ComputeAction vs computeAction and I was trying to understand why we needed some checks in the first method and some in the second. When I realized there was no technical reason for the split, it seemed cleaner.

Given this pattern is used for most of the allocator methods, where there is a mostly empty exported method calling an unexported method - It would be better to leave it out of this patch, then reconsider all of them at once rather than adhoc.

Copy link
Contributor

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, and @kvoli)


pkg/kv/kvserver/replicate_queue.go line 925 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

OK, good call on this.

Discussed in person - want to take a look to see if this is the correct place to do this (or what else needs to/should change).

Copy link
Contributor

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, and @kvoli)


pkg/kv/kvserver/replicate_queue.go line 532 at r4 (raw file):

			// few seconds. By retrying immediately we will choose the "second best"
			// node to send to.
			// TODO: This is probably suboptimal behavior. In the case where there is

nit: TODO(andrewbaptist) (or, for that matter, put TODO(sarkesian) if needed).

I think the linter would catch this in any case.

Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with Alex, still not "ideal" behavior because stores finish at different times, but better. Alex is analyzing results

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @kvoli)


pkg/kv/kvserver/replicate_queue.go line 925 at r2 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Discussed in person - want to take a look to see if this is the correct place to do this (or what else needs to/should change).

OK, The test TestLeasePreferencesDuringOutage fails without this change. Specifically, it is expecting a lease change to happen after we addOrReplaceVoters is called. This test passed before because after we re-ran the replicate queue, we would end up calling considerRebalance which would then rebalance and as a fortunate result also move the lease. This test was a little broken in doing what it is trying to do since if there wasn't a rebalance, this wouldn't have worked. Now the test is a little more accurate. It might make sense long term to always call maybeTransferLeaseAway after any change is made since normally it won't do anything if lease preference is already correct, but that could be overkill.


pkg/kv/kvserver/replicate_queue.go line 532 at r4 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: TODO(andrewbaptist) (or, for that matter, put TODO(sarkesian) if needed).

I think the linter would catch this in any case.

Done


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 628 at r3 (raw file):

Previously, kvoli (Austen) wrote…

Given this pattern is used for most of the allocator methods, where there is a mostly empty exported method calling an unexported method - It would be better to leave it out of this patch, then reconsider all of them at once rather than adhoc.

Done, reverted in this PR

@andrewbaptist andrewbaptist force-pushed the requeue_allocator branch 2 times, most recently from 9c249ec to b1e2680 Compare August 3, 2022 22:35
kvoli added a commit to kvoli/cockroach that referenced this pull request Jul 21, 2023
This naming was incorrect after cockroachdb#85219. The replicate queue will
re-queue , rather than immediately re-process a replica after a
successful process.

Epic: none

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this pull request Jul 25, 2023
This naming was incorrect after cockroachdb#85219. The replicate queue will
re-queue , rather than immediately re-process a replica after a
successful process.

Epic: none

Release note: None
blathers-crl bot pushed a commit that referenced this pull request Jul 26, 2023
This naming was incorrect after #85219. The replicate queue will
re-queue , rather than immediately re-process a replica after a
successful process.

Epic: none

Release note: None
blathers-crl bot pushed a commit that referenced this pull request Jul 26, 2023
This naming was incorrect after #85219. The replicate queue will
re-queue , rather than immediately re-process a replica after a
successful process.

Epic: none

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this pull request Aug 11, 2023
Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After cockroachdb#85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After cockroachdb#94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with cockroachdb#85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: cockroachdb#88769
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this pull request Aug 29, 2023
Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After cockroachdb#85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After cockroachdb#94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with cockroachdb#85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: cockroachdb#88769
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this pull request Sep 7, 2023
Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After cockroachdb#85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After cockroachdb#94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with cockroachdb#85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: cockroachdb#88769
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this pull request Sep 11, 2023
Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After cockroachdb#85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After cockroachdb#94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with cockroachdb#85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: cockroachdb#88769
Release note: None
craig bot pushed a commit that referenced this pull request Sep 11, 2023
108175: kvserver: unskip lease preferences during outage  r=andrewbaptist a=kvoli

Previously, `TestLeasePreferenceDuringOutage` would force replication
queue processing of the test range, then assert that the range
up-replicated and lease transferred to a preferred locality.

This test was skipped, and two of the assumptions it relied on to pass
were no longer true.

After #85219, the replicate queue no longer
re-processes replicas. Instead, the queue requeues replicas after
processing, at the appropriate priority. This broke the test due to the
replicate queue being disabled, making the re-queue a no-op.

After #94023, the replicate queue no longer looked for lease transfers,
after processing a replication action. Combined with #85219, the queue
would now be guaranteed to not process both up-replication and lease
transfers from a single enqueue.

Update the test to not require a manual process, instead using a queue
range filter, which allows tests which disable automatic replication, to
still process filtered ranges via the various replica queues. Also,
ensure that the non-stopped stores are considered live targets, after
simulating an outage (bumping manual clocks, stopping servers) -- so
that the expected up-replication, then lease transfer can proceed.

Fixes: #88769
Release note: None

109432: cluster-ui: handle partial response errors on the database details page r=THardy98 a=THardy98

Part of: #102386

**Demos** (Note: these demos show this same logic applied to both the databases and database table pages as well):
DB-Console
- https://www.loom.com/share/5108dd655ad342f28323e72eaf68219c
- https://www.loom.com/share/1973383dacd7494a84e10bf39e5b85a3

This change applies the same error handling ideas from #109245 to the
database details page, enabling non-admin users to use the database
details page and providing better transparency to data fetching issues.

Errors encountered while fetching table details can be viewed via the
tooltip provided by the `Caution` icon at the table's name.
`unavailable` cells also provide a tooltip that displays the error
impacting that exact cell.

Release note (ui change): Non-admin users are able to use the database
details page.

110292: c2c: use seperate spanConfigEventStreamSpec in the span config event stream r=stevendanna a=msbutler

Previously, the spanConfigEventStream used a streamPartitionSpec, which contained a bunch of fields unecessary for span config streaming. This patch creates a new spanConfigEventStreamSpec which contains the fields only necessary for span config event streaming.

Informs #109059

Release note: None

110309: teamcity-trigger: ensure that `race` tag is only passed once r=healthy-pod a=healthy-pod

By running under `-race`, the go command defines the `race` build tag for us [1].

Previously, we defined it under `TAGS` to let the issue poster know that this is a failure under `race` and indicate that in the issue.

At the time, defining the tag twice didn't cause issues but after #109773, it led to build failures [2].

To reproduce locally:
```
bazel test -s --config=race pkg/util/ctxgroup:all --test_env=GOTRACEBACK=all --define gotags=bazel,gss,race
```

As a follow-up, we should find another way to let the issue poster know that a failure was running under `race`.

[1] https://go.dev/doc/articles/race_detector#Excluding_Tests
[2] #109994 (comment)

Epic: none
Release note: None

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: healthy-pod <[email protected]>
@andrewbaptist andrewbaptist deleted the requeue_allocator branch December 15, 2023 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants