Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: honor priority on send and receive snapshots #86701

Merged
merged 1 commit into from
Sep 2, 2022

Conversation

AlexTalks
Copy link
Contributor

@AlexTalks AlexTalks commented Aug 23, 2022

Previously on both the send and receive snapshot side, the
various places that sent snapshots were uncoordinated, so
the choice of which snapshot was sent was somewhat arbitrary.
This PR uses the new multiqueue to prioritize them correctly.

Release note (performance improvement): Snapshots use a fair
round-robin approach for choosing which one to send next. This
allows decommissioning to complete much faster.

Release Justification: A number of customer support cases
are caused by incorrect snapshot prioritization.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andrewbaptist andrewbaptist force-pushed the mq_snapshot_prio branch 4 times, most recently from 9fedf4c to 1d259f7 Compare August 25, 2022 01:14
@andrewbaptist andrewbaptist force-pushed the mq_snapshot_prio branch 4 times, most recently from 636782a to e0e3799 Compare August 25, 2022 21:36
@andrewbaptist
Copy link
Collaborator

Screenshot of a run of decommissionBench/nodes=8/cpu=16/warehouses=3000/drain-first/while-upreplicating after this change:

screencapture-104-196-61-150-3000-d-J-yAVzkVk2-decommissions-full-2022-08-25-17_38_57

@AlexTalks AlexTalks changed the title kvserver: prioritize snapshot sending and application kv: Honor priority on send and receive snapshots Aug 26, 2022
@AlexTalks AlexTalks changed the title kv: Honor priority on send and receive snapshots kvserver: honor priority on send and receive snapshots Aug 26, 2022
@AlexTalks
Copy link
Contributor Author

This can be rebased onto master now, removing the first two commits (or we can switch to a new branch - either way is fine with me!)

@andrewbaptist
Copy link
Collaborator

Rebased onto master.

@andrewbaptist andrewbaptist force-pushed the mq_snapshot_prio branch 2 times, most recently from 2646918 to 1ec7bbe Compare August 26, 2022 20:42
@andrewbaptist andrewbaptist marked this pull request as ready for review August 26, 2022 20:46
@andrewbaptist andrewbaptist requested a review from a team as a code owner August 26, 2022 20:46
@andrewbaptist
Copy link
Collaborator

From a testing perspective, the decommissioning tests were all run. With a combination of this and Lidor's changes, the time to decommission has dropped from ~55 minutes to ~24 minutes. This will be even faster if drain is called before decommission.
We also tested version compatibility (v22.1 and this branch) and the results were positive (no errors and similar behavior in the mixed cluster to v22.1).
Finally the moving hotkey (DD) test https://docs.google.com/document/d/1AMvxf7uXve8rV21DXEeKYx--bswDy2OSSw_aKenG9c8/edit#heading=h.8ssuvtfzpdk7 was run and worked "well" even during a decommission of a node. Ranges continued to move as expected.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were asking about adding a cluster version flag in https://cockroachlabs.slack.com/archives/C02H8JCC7DM/p1661545679399539. That seems apt to me, as there is a possibility that without such a flag, mixed version clusters will prioritize snapshots from v22.2 nodes. You could avoid that behavior by adding a sender-side cluster version check that strips the source and priority fields before sending a SnapshotRequest if the cluster is not yet all running v22.2 processes.

We also talked about adding a cluster setting to disable this. Did you still want to do that? If so, the logic could live in the same place as the cluster version check.

Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @AlexTalks)


pkg/kv/kvserver/replica_learner_test.go line 246 at r1 (raw file):

	// This test assumes that the snapshot send limit is set to 1 so the second
	// snapshot send will block waiting for the first to complete.
	cleanup := envutil.TestSetEnv(t, "COCKROACH_CONCURRENT_SNAPSHOT_SEND_LIMIT", "1")

The use of env vars to control a TestCluster feels like a bit of an anti-pattern. Can we copy concurrentSnapshotApplyLimit and concurrentSnapshotSendLimit to KVConfig and TestServerArgs and then configure it directly in this test? See ScanMinIdleTime for an example of how that plumbing would work.


pkg/kv/kvserver/store.go line 1161 at r1 (raw file):

	if sc.concurrentSnapshotSendLimit == 0 {
		sc.concurrentSnapshotSendLimit =
			envutil.EnvOrDefaultInt("COCKROACH_CONCURRENT_SNAPSHOT_SEND_LIMIT", 2)

Let's comment here (or server/config.go if we do my suggestion from above) about why we want an apply-size limit of 1 but a send-side limit of 2.


pkg/kv/kvserver/store_snapshot.go line 691 at r1 (raw file):

		fn()
	}
	requestSource := int32(req.SenderQueueName)

nit: there seems to be a good amount of repetition between this function and the previous one. That's not something to solve here, but then let's make the exact same changes to each so it doesn't look like there's a difference. For instance, define these variables in both or in neither.


pkg/kv/kvserver/store_snapshot.go line 715 at r1 (raw file):

	waitingSnapshotMetric, inProgressSnapshotMetric, totalInProgressSnapshotMetric *metric.Gauge,
) (cleanup func(), _err error) {
	// TODO: Is this necessary

Yes, it is necessary if you want the caller to be able to call the function on error. However, this contract that the cleanup function must be called even on error is not standard. Usually, functions will handle this themselves on error paths and not hand the responsibility to their caller. I think here that would just mean calling semaphore.Cancel(task) directly in the case <-queueCtx.Done(): case. Consider switching to that approach.


pkg/kv/kvserver/store_test.go line 2929 at r1 (raw file):

}

func TestSendSnapshotConcurrency(t *testing.T) {

nit: Could you give this test a comment explaining what it is testing.

@andrewbaptist andrewbaptist requested review from a team as code owners August 29, 2022 22:09
Copy link
Collaborator

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added enable/disable flag.

Reviewed 1 of 10 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andrewbaptist, and @nvanbenschoten)


pkg/kv/kvserver/store_snapshot.go line 693 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Don't we want this in Replica.sendSnapshot, so that the name and priority won't make it to the receiver (who might be a v22.2 node)?

Done, moved


pkg/kv/kvserver/store_snapshot.go line 709 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think this comment is now stale, as are the comments on the preceding two methods.

Done

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r5, 2 of 2 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/replica_command.go line 2671 at r7 (raw file):

	// queue name or priority if the receiver may not understand them or the
	// setting is disabled.
	if !r.store.ClusterSettings().Version.IsActive(ctx, clusterversion.PrioritizeSnapshots) ||

nit: Might be worth clarifying that the only thing we for sure want to remove in 23.1 is the cluster version.


pkg/kv/kvserver/store.go line 259 at r7 (raw file):

		ProtectedTimestampReader:    spanconfig.EmptyProtectedTSReader(clock),
		SnapshotSendLimit:           2,
		SnapshotApplyLimit:          1,

I think (as we discussed) this should not use hard-coded values. Apologies for making you keep moving these around, but maybe it would make sense to move the defaults into kvserver and that way you can reference them both here and in the server pkg. You'll just have to make the defaults public however.

Code quote:

		SnapshotSendLimit:           2,
		SnapshotApplyLimit:          1,

pkg/kv/kvserver/store_snapshot.go line 67 at r7 (raw file):

var snapshotPrioritizationEnabled = settings.RegisterBoolSetting(
	settings.SystemOnly,
	"kv.snapshot.prioritization_enabled",

nit: kv.snapshot_prioritization.enabled


pkg/kv/kvserver/store_snapshot.go line 69 at r7 (raw file):

	"kv.snapshot.prioritization_enabled",
	"if true, then prioritize enqueued snapshots on both the send or receive sides",
	true)

nit: does this pass lint? I'd assume we'd need:

  true,
)

pkg/kv/kvserver/store_snapshot.go line 676 at r7 (raw file):

	return s.throttleSnapshot(ctx, s.snapshotApplySem,
		int(header.SenderQueueName), header.SenderQueuePriority,

nit: I still don't love the casting to ints, it feels like we should at least have had a multiqueue.QueueID type or something. This isn't something I'd worry about changing now, but just wanted to note it.


pkg/kv/kvserver/store_snapshot.go line 723 at r7 (raw file):

	// getting stuck behind large snapshots managed by the replicate queue.
	if rangeSize != 0 || s.cfg.TestingKnobs.ThrottleEmptySnapshots {
		task := semaphore.Add(requestSource, requestPriority)

should we still call this semaphore? It seems inaccurate - can we rename to something like snapshotQueue?


pkg/kv/kvserver/store_test.go line 2933 at r7 (raw file):

// prioritization of the snapshots as that is covered by the multi-queue
// testing.
func TestSendSnapshotConcurrency(t *testing.T) {

This test is certainly useful, though I think if at all possible it would be useful for us to have a test where we validate the entire snapshot flow (on sender/receiver) running concurrently w/ multiple senders.


pkg/kv/kvserver/kvserverpb/raft.proto line 207 at r7 (raw file):

    // The sending queue's name, to be utilized to ensure fairness across
    // different snapshot sending sources. OTHER is a reserved name(0) and all
The default queue name, OTHER, is reserved for any uncategorized
and unprioritized snapshots, and requests with sender queue name OTHER
may not specify a non-zero sender_queue_priority.  To prioritize snapshots
categorized as OTHER, first...

pkg/server/config.go line 66 at r7 (raw file):

	defaultScanMaxIdleTime    = 1 * time.Second
	defaultSnapshotSendLimit  = 2
	defaultSnapshotApplyLimit = 1

These are the ones I was thinking would be good to move. I think you may have added a comment elsewhere, it might make sense to add a comment here that says

// ... See server.KVConfig for more info.

(or wherever else the comment lives).


pkg/server/config.go line 791 at r7 (raw file):

	cfg.ScanMaxIdleTime = envutil.EnvOrDefaultDuration("COCKROACH_SCAN_MAX_IDLE_TIME", cfg.ScanMaxIdleTime)
	cfg.SnapshotSendLimit = envutil.EnvOrDefaultInt64("COCKROACH_SNAPSHOT_SEND_LIMIT", cfg.SnapshotSendLimit)
	cfg.SnapshotApplyLimit = envutil.EnvOrDefaultInt64("COCKROACH_SNAPSHOT_APPLY_LIMIT", cfg.SnapshotApplyLimit)

Let's make sure these environment variables match the existing ones we had (I doubt anything relies on this, but I don't see much benefit in changing it).

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the testing discussion, in replica_learner_test.go something like the operations within TestRaftSnapshotQueueSeesLearner concurrently with some of the operations of TestAddReplicaWithReceiverThrottling or even TestLearnerAdminChangeReplicasRace could be useful - particularly if we can inject testing knobs for some snapshots to fail while others succeed. testRaftSnapshotsToNonVoters seems particularly useful.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

@andrewbaptist andrewbaptist force-pushed the mq_snapshot_prio branch 4 times, most recently from e086547 to 9a70dff Compare August 31, 2022 13:59
Copy link
Collaborator

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working, looking at options for how to do this best.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @AlexTalks, and @nvanbenschoten)


pkg/kv/kvserver/replica_command.go line 2671 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: Might be worth clarifying that the only thing we for sure want to remove in 23.1 is the cluster version.

Done


pkg/kv/kvserver/store.go line 259 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

I think (as we discussed) this should not use hard-coded values. Apologies for making you keep moving these around, but maybe it would make sense to move the defaults into kvserver and that way you can reference them both here and in the server pkg. You'll just have to make the defaults public however.

Done


pkg/kv/kvserver/store_snapshot.go line 67 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: kv.snapshot_prioritization.enabled

Done


pkg/kv/kvserver/store_snapshot.go line 69 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: does this pass lint? I'd assume we'd need:

  true,
)

Done (it did pass lint, but probably shouldn't have)


pkg/kv/kvserver/store_snapshot.go line 676 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: I still don't love the casting to ints, it feels like we should at least have had a multiqueue.QueueID type or something. This isn't something I'd worry about changing now, but just wanted to note it.

I don't love this either, but it does help in the case of version compatibility (it should definitely be strings over the wire, so it needs to cast to "something" - ints are just as good as strings from that perspective?


pkg/kv/kvserver/store_snapshot.go line 723 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

should we still call this semaphore? It seems inaccurate - can we rename to something like snapshotQueue?

Done, Renamed everywhere


pkg/kv/kvserver/kvserverpb/raft.proto line 207 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…
The default queue name, OTHER, is reserved for any uncategorized
and unprioritized snapshots, and requests with sender queue name OTHER
may not specify a non-zero sender_queue_priority.  To prioritize snapshots
categorized as OTHER, first...

Done


pkg/server/config.go line 66 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

These are the ones I was thinking would be good to move. I think you may have added a comment elsewhere, it might make sense to add a comment here that says

// ... See server.KVConfig for more info.

(or wherever else the comment lives).

Done


pkg/server/config.go line 791 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Let's make sure these environment variables match the existing ones we had (I doubt anything relies on this, but I don't see much benefit in changing it).

Done, I hadn't thought about the compatibility aspect here.

@andrewbaptist andrewbaptist requested a review from a team as a code owner August 31, 2022 20:48
@andrewbaptist andrewbaptist requested review from herkolategan and removed request for a team August 31, 2022 20:48
@andrewbaptist andrewbaptist linked an issue Sep 1, 2022 that may be closed by this pull request
Previously on both the send and receive snapshot side, the
various places that sent snapshots were uncoordinated, so
the choice of which snapshot was sent was somewhat arbitrary.
This PR uses the new multiqueue to prioritize them correctly.

Release note (performance improvement): Snapshots use a fair
round-robin approach for choosing which one to send next. This
allows decommissioning to complete much faster.

Release justification: A number of customer support cases
are caused by incorrect snapshot prioritization.
Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @herkolategan, and @nvanbenschoten)


pkg/kv/kvserver/store_test.go line 2933 at r7 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

This test is certainly useful, though I think if at all possible it would be useful for us to have a test where we validate the entire snapshot flow (on sender/receiver) running concurrently w/ multiple senders.

Part of #87319

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @herkolategan, and @nvanbenschoten)

@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 2, 2022

👎 Rejected by code reviews

@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 2, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: fairly prioritize snapshot application
4 participants