-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: honor priority on send and receive snapshots #86701
Conversation
9fedf4c
to
1d259f7
Compare
636782a
to
e0e3799
Compare
e0e3799
to
e586410
Compare
This can be rebased onto master now, removing the first two commits (or we can switch to a new branch - either way is fine with me!) |
e586410
to
9738028
Compare
Rebased onto master. |
2646918
to
1ec7bbe
Compare
1ec7bbe
to
2db3573
Compare
From a testing perspective, the decommissioning tests were all run. With a combination of this and Lidor's changes, the time to decommission has dropped from ~55 minutes to ~24 minutes. This will be even faster if drain is called before decommission. |
2db3573
to
666ca01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were asking about adding a cluster version flag in https://cockroachlabs.slack.com/archives/C02H8JCC7DM/p1661545679399539. That seems apt to me, as there is a possibility that without such a flag, mixed version clusters will prioritize snapshots from v22.2 nodes. You could avoid that behavior by adding a sender-side cluster version check that strips the source and priority fields before sending a SnapshotRequest
if the cluster is not yet all running v22.2 processes.
We also talked about adding a cluster setting to disable this. Did you still want to do that? If so, the logic could live in the same place as the cluster version check.
Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @AlexTalks)
pkg/kv/kvserver/replica_learner_test.go
line 246 at r1 (raw file):
// This test assumes that the snapshot send limit is set to 1 so the second // snapshot send will block waiting for the first to complete. cleanup := envutil.TestSetEnv(t, "COCKROACH_CONCURRENT_SNAPSHOT_SEND_LIMIT", "1")
The use of env vars to control a TestCluster feels like a bit of an anti-pattern. Can we copy concurrentSnapshotApplyLimit
and concurrentSnapshotSendLimit
to KVConfig
and TestServerArgs
and then configure it directly in this test? See ScanMinIdleTime
for an example of how that plumbing would work.
pkg/kv/kvserver/store.go
line 1161 at r1 (raw file):
if sc.concurrentSnapshotSendLimit == 0 { sc.concurrentSnapshotSendLimit = envutil.EnvOrDefaultInt("COCKROACH_CONCURRENT_SNAPSHOT_SEND_LIMIT", 2)
Let's comment here (or server/config.go
if we do my suggestion from above) about why we want an apply-size limit of 1 but a send-side limit of 2.
pkg/kv/kvserver/store_snapshot.go
line 691 at r1 (raw file):
fn() } requestSource := int32(req.SenderQueueName)
nit: there seems to be a good amount of repetition between this function and the previous one. That's not something to solve here, but then let's make the exact same changes to each so it doesn't look like there's a difference. For instance, define these variables in both or in neither.
pkg/kv/kvserver/store_snapshot.go
line 715 at r1 (raw file):
waitingSnapshotMetric, inProgressSnapshotMetric, totalInProgressSnapshotMetric *metric.Gauge, ) (cleanup func(), _err error) { // TODO: Is this necessary
Yes, it is necessary if you want the caller to be able to call the function on error. However, this contract that the cleanup function must be called even on error is not standard. Usually, functions will handle this themselves on error paths and not hand the responsibility to their caller. I think here that would just mean calling semaphore.Cancel(task)
directly in the case <-queueCtx.Done():
case. Consider switching to that approach.
pkg/kv/kvserver/store_test.go
line 2929 at r1 (raw file):
} func TestSendSnapshotConcurrency(t *testing.T) {
nit: Could you give this test a comment explaining what it is testing.
666ca01
to
3b68364
Compare
1367ae5
to
81a270e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added enable/disable flag.
Reviewed 1 of 10 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andrewbaptist, and @nvanbenschoten)
pkg/kv/kvserver/store_snapshot.go
line 693 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Don't we want this in
Replica.sendSnapshot
, so that the name and priority won't make it to the receiver (who might be a v22.2 node)?
Done, moved
pkg/kv/kvserver/store_snapshot.go
line 709 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think this comment is now stale, as are the comments on the preceding two methods.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5, 2 of 2 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/replica_command.go
line 2671 at r7 (raw file):
// queue name or priority if the receiver may not understand them or the // setting is disabled. if !r.store.ClusterSettings().Version.IsActive(ctx, clusterversion.PrioritizeSnapshots) ||
nit: Might be worth clarifying that the only thing we for sure want to remove in 23.1 is the cluster version.
pkg/kv/kvserver/store.go
line 259 at r7 (raw file):
ProtectedTimestampReader: spanconfig.EmptyProtectedTSReader(clock), SnapshotSendLimit: 2, SnapshotApplyLimit: 1,
I think (as we discussed) this should not use hard-coded values. Apologies for making you keep moving these around, but maybe it would make sense to move the defaults into kvserver
and that way you can reference them both here and in the server
pkg. You'll just have to make the defaults public however.
Code quote:
SnapshotSendLimit: 2,
SnapshotApplyLimit: 1,
pkg/kv/kvserver/store_snapshot.go
line 67 at r7 (raw file):
var snapshotPrioritizationEnabled = settings.RegisterBoolSetting( settings.SystemOnly, "kv.snapshot.prioritization_enabled",
nit: kv.snapshot_prioritization.enabled
pkg/kv/kvserver/store_snapshot.go
line 69 at r7 (raw file):
"kv.snapshot.prioritization_enabled", "if true, then prioritize enqueued snapshots on both the send or receive sides", true)
nit: does this pass lint? I'd assume we'd need:
true,
)
pkg/kv/kvserver/store_snapshot.go
line 676 at r7 (raw file):
return s.throttleSnapshot(ctx, s.snapshotApplySem, int(header.SenderQueueName), header.SenderQueuePriority,
nit: I still don't love the casting to int
s, it feels like we should at least have had a multiqueue.QueueID
type or something. This isn't something I'd worry about changing now, but just wanted to note it.
pkg/kv/kvserver/store_snapshot.go
line 723 at r7 (raw file):
// getting stuck behind large snapshots managed by the replicate queue. if rangeSize != 0 || s.cfg.TestingKnobs.ThrottleEmptySnapshots { task := semaphore.Add(requestSource, requestPriority)
should we still call this semaphore
? It seems inaccurate - can we rename to something like snapshotQueue
?
pkg/kv/kvserver/store_test.go
line 2933 at r7 (raw file):
// prioritization of the snapshots as that is covered by the multi-queue // testing. func TestSendSnapshotConcurrency(t *testing.T) {
This test is certainly useful, though I think if at all possible it would be useful for us to have a test where we validate the entire snapshot flow (on sender/receiver) running concurrently w/ multiple senders.
pkg/kv/kvserver/kvserverpb/raft.proto
line 207 at r7 (raw file):
// The sending queue's name, to be utilized to ensure fairness across // different snapshot sending sources. OTHER is a reserved name(0) and all
The default queue name, OTHER, is reserved for any uncategorized
and unprioritized snapshots, and requests with sender queue name OTHER
may not specify a non-zero sender_queue_priority. To prioritize snapshots
categorized as OTHER, first...
pkg/server/config.go
line 66 at r7 (raw file):
defaultScanMaxIdleTime = 1 * time.Second defaultSnapshotSendLimit = 2 defaultSnapshotApplyLimit = 1
These are the ones I was thinking would be good to move. I think you may have added a comment elsewhere, it might make sense to add a comment here that says
// ... See server.KVConfig for more info.
(or wherever else the comment lives).
pkg/server/config.go
line 791 at r7 (raw file):
cfg.ScanMaxIdleTime = envutil.EnvOrDefaultDuration("COCKROACH_SCAN_MAX_IDLE_TIME", cfg.ScanMaxIdleTime) cfg.SnapshotSendLimit = envutil.EnvOrDefaultInt64("COCKROACH_SNAPSHOT_SEND_LIMIT", cfg.SnapshotSendLimit) cfg.SnapshotApplyLimit = envutil.EnvOrDefaultInt64("COCKROACH_SNAPSHOT_APPLY_LIMIT", cfg.SnapshotApplyLimit)
Let's make sure these environment variables match the existing ones we had (I doubt anything relies on this, but I don't see much benefit in changing it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the testing discussion, in replica_learner_test.go something like the operations within TestRaftSnapshotQueueSeesLearner
concurrently with some of the operations of TestAddReplicaWithReceiverThrottling
or even TestLearnerAdminChangeReplicasRace
could be useful - particularly if we can inject testing knobs for some snapshots to fail while others succeed. testRaftSnapshotsToNonVoters
seems particularly useful.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
e086547
to
9a70dff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working, looking at options for how to do this best.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @AlexTalks, and @nvanbenschoten)
pkg/kv/kvserver/replica_command.go
line 2671 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: Might be worth clarifying that the only thing we for sure want to remove in 23.1 is the cluster version.
Done
pkg/kv/kvserver/store.go
line 259 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
I think (as we discussed) this should not use hard-coded values. Apologies for making you keep moving these around, but maybe it would make sense to move the defaults into
kvserver
and that way you can reference them both here and in theserver
pkg. You'll just have to make the defaults public however.
Done
pkg/kv/kvserver/store_snapshot.go
line 67 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit:
kv.snapshot_prioritization.enabled
Done
pkg/kv/kvserver/store_snapshot.go
line 69 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: does this pass lint? I'd assume we'd need:
true, )
Done (it did pass lint, but probably shouldn't have)
pkg/kv/kvserver/store_snapshot.go
line 676 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: I still don't love the casting to
int
s, it feels like we should at least have had amultiqueue.QueueID
type or something. This isn't something I'd worry about changing now, but just wanted to note it.
I don't love this either, but it does help in the case of version compatibility (it should definitely be strings over the wire, so it needs to cast to "something" - ints are just as good as strings from that perspective?
pkg/kv/kvserver/store_snapshot.go
line 723 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
should we still call this
semaphore
? It seems inaccurate - can we rename to something likesnapshotQueue
?
Done, Renamed everywhere
pkg/kv/kvserver/kvserverpb/raft.proto
line 207 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
The default queue name, OTHER, is reserved for any uncategorized and unprioritized snapshots, and requests with sender queue name OTHER may not specify a non-zero sender_queue_priority. To prioritize snapshots categorized as OTHER, first...
Done
pkg/server/config.go
line 66 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
These are the ones I was thinking would be good to move. I think you may have added a comment elsewhere, it might make sense to add a comment here that says
// ... See server.KVConfig for more info.
(or wherever else the comment lives).
Done
pkg/server/config.go
line 791 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Let's make sure these environment variables match the existing ones we had (I doubt anything relies on this, but I don't see much benefit in changing it).
Done, I hadn't thought about the compatibility aspect here.
9a70dff
to
3f5a0ea
Compare
3f5a0ea
to
d848c78
Compare
Previously on both the send and receive snapshot side, the various places that sent snapshots were uncoordinated, so the choice of which snapshot was sent was somewhat arbitrary. This PR uses the new multiqueue to prioritize them correctly. Release note (performance improvement): Snapshots use a fair round-robin approach for choosing which one to send next. This allows decommissioning to complete much faster. Release justification: A number of customer support cases are caused by incorrect snapshot prioritization.
d848c78
to
2dc2da8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @herkolategan, and @nvanbenschoten)
pkg/kv/kvserver/store_test.go
line 2933 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
This test is certainly useful, though I think if at all possible it would be useful for us to have a test where we validate the entire snapshot flow (on sender/receiver) running concurrently w/ multiple senders.
Part of #87319
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @herkolategan, and @nvanbenschoten)
bors r+ |
👎 Rejected by code reviews |
bors r+ |
Build succeeded: |
Previously on both the send and receive snapshot side, the
various places that sent snapshots were uncoordinated, so
the choice of which snapshot was sent was somewhat arbitrary.
This PR uses the new multiqueue to prioritize them correctly.
Release note (performance improvement): Snapshots use a fair
round-robin approach for choosing which one to send next. This
allows decommissioning to complete much faster.
Release Justification: A number of customer support cases
are caused by incorrect snapshot prioritization.