-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: re-enqueue Raft groups on paginated application #31568
storage: re-enqueue Raft groups on paginated application #31568
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you stressed this new test? Mucking with proposal application like that seems fragile. I don't have a better suggestion, though.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/replica_test.go, line 9622 at r1 (raw file):
repl := tc.repl // Propose a command to Raft and block its application.
Technically, you block application and then propose the command.
88f4361
to
6666008
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR.
Have you stressed this new test?
I stressed it for 10 minutes without issue.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/replica_test.go, line 9622 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Technically, you block application and then propose the command.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
pkg/storage/replica_test.go, line 9586 at r2 (raw file):
// TestApplyPaginatedCommittedEntries tests that a Raft group's committed // entries are quickly applied, even if their application is paginated due to // the RaftMaxSizePerMsg configuration. This is a regession test for #31330.
nit: regession
pkg/storage/replica_test.go, line 9659 at r2 (raw file):
// small RaftMaxSizePerMsg. close(blockRaftApplication) const maxWait = 5 * time.Second
5s
might be too aggressive to apply 50 commands when stressed sufficiently. Perhaps bump this a bit. Or we'll just leave it and find out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with putting this in 2.1RC2.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale)
Fixes cockroachdb#31330. This change re-enqueues Raft groups for processing immediately if they still have more to do after a Raft ready iteration. This comes up in practice when a Range has sufficient load to force Raft application pagination. See cockroachdb#31330 for a discussion on the symptoms this can cause. Release note (bug fix): Fix bug where Raft followers could fall behind leaders will entry application, causing stalls during splits.
6666008
to
38b4827
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs.
@bdarnell This is currently based on #31408, so the backport won't be completely clean because of the test. I'm assuming you're fine with that and still want to keep #31408 for 2.1.1.
Reviewable status: complete! 2 of 0 LGTMs obtained (and 1 stale)
pkg/storage/replica_test.go, line 9586 at r2 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
nit: regession
Done.
pkg/storage/replica_test.go, line 9659 at r2 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
5s
might be too aggressive to apply 50 commands when stressed sufficiently. Perhaps bump this a bit. Or we'll just leave it and find out.
Bumped to 10s.
bors r+ |
31568: storage: re-enqueue Raft groups on paginated application r=nvanbenschoten a=nvanbenschoten Fixes #31330. This change re-enqueues Raft groups for processing immediately if they still have more to do after a Raft ready iteration. This comes up in practice when a Range has sufficient load to force Raft application pagination. See #31330 for a discussion on the symptoms this can cause. Release note (bug fix): Fix bug where Raft followers could fall behind leaders will entry application, causing stalls during splits. Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
@bdarnell how do you feel about this being backported to 2.0? |
I think it's a good idea. |
Fixes #31330.
This change re-enqueues Raft groups for processing immediately if they
still have more to do after a Raft ready iteration. This comes up in
practice when a Range has sufficient load to force Raft application
pagination. See #31330 for a discussion on the symptoms this can
cause.
Release note (bug fix): Fix bug where Raft followers could fall behind
leaders will entry application, causing stalls during splits.