-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add to perf comment in handleRaftReady #38380
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/storage/replica_raft.go, line 653 at r1 (raw file):
// etcd/raft does not support commit quorums that do not include the leader, // even though the Raft thesis states that this would technically be safe: // > The leader may even commit an entry before it has been written to its
If we're going to change the indentation for quotes, do it here too.
pkg/storage/replica_raft.go, line 663 at r1 (raw file):
or for which a corresponding Entry is not previously persisted
I'm not sure I understand this case. When would a leader be sending out a commit index above the index of entries which it has not previously persisted? The paragraph before this one presents an argument for why this would never happen. Am I missing something? Perhaps around leadership changes?
Or are you talking more abstractly here and then saying that this isn't actually a problem given the implementation. If so, I'd soften the language around "the committed index may require persisting Entries from the current Ready". That's only true in the single-node Raft group, right? In which case, there are no messages.
pkg/storage/replica_raft.go, line 716 at r1 (raw file):
// committed index larger than the last index, which will set off // alarms. Right now everything is in the same batch and so that problem // does not exist, but should that change we can lower Committed before
s/can lower/can manually lower/
to make it clear that this is an action we'd have to take in this code path. Also maybe talk about why it's ok to do so.
Also, wouldn't appending the entries before writing the HardState be another solution to this if we no longer had atomic batch semantics for the writes? Or is that not true given some of the changes you're making at the moment?
73a5555
to
816bcc3
Compare
I pulled on a thread in etcd/raft today where I was worried that something was off with our perf improvements around sending MsgApp before persisting HardState and Entries. Everything ended up being OK but I updated a comment. Release note: None
816bcc3
to
0635f83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_raft.go, line 663 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
or for which a corresponding Entry is not previously persisted
I'm not sure I understand this case. When would a leader be sending out a commit index above the index of entries which it has not previously persisted? The paragraph before this one presents an argument for why this would never happen. Am I missing something? Perhaps around leadership changes?
Or are you talking more abstractly here and then saying that this isn't actually a problem given the implementation. If so, I'd soften the language around "the committed index may require persisting Entries from the current Ready". That's only true in the single-node Raft group, right? In which case, there are no messages.
I think the answers to your questions were all in the comment, but I agree that it was hard to parse. I took a stab at reworking this whole block more holistically, PTAL
pkg/storage/replica_raft.go, line 716 at r1 (raw file):
Also, wouldn't appending the entries before writing the HardState be another solution to this if we no longer had atomic batch semantics for the writes? Or is that not true given some of the changes you're making at the moment?
Yes, see later in the same sentence
(or take care to persist the HardState after appending)
However, when we make the changes for etcd-io/etcd#7625 (comment) we want the opposite: we require that certain appends won't be persisted without updating HardState.Commit first. Even worse, when you're a follower catching up on a whole chunk of historical log (that may contain config changes), you'll get commit indexes pointing at the end of each chunk of entries, and you must persist that together with the chunk (not before because otherwise referring to unknown entries, not after because otherwise violate joint quorum invariant). Hmm, we might have to just lower the committed index to lastIndex if necessary when starting up a group. Anyway, certainly something to keep looking at as perf work happens on the Raft ready loop.
I'm also planning to copy this commentary in this file upstream so that by the time all of these questions become real there's documentation on the allowed concurrency patterns in etcd/raft
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
bors r=nvanbenschoten |
38380: storage: add to perf comment in handleRaftReady r=nvanbenschoten a=tbg I pulled on a thread in etcd/raft today where I was worried that something was off with our perf improvements around sending MsgApp before persisting HardState and Entries. Everything ended up being OK but I updated a comment. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
Build failed |
In the above flake (unrelated to this PR), acceptance/version-upgrade failed on cockroach/pkg/cmd/roachtest/upgrade.go Line 431 in 7e2ceae
and the harness never manages to quite shut down, eventually hitting the 10 minute timeout and then getting stuck until teamcity comes in for a mercy killing (artifacts lost). @andreimatei are you touching this at all in #30977? This seems to be a bug in roachtest (on top of a bug in the test/crdb).
|
Filed #38428 bors r=nvanbenschoten |
38380: storage: add to perf comment in handleRaftReady r=nvanbenschoten a=tbg I pulled on a thread in etcd/raft today where I was worried that something was off with our perf improvements around sending MsgApp before persisting HardState and Entries. Everything ended up being OK but I updated a comment. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
Build succeeded |
Hopefully... But I don't entirely follow. Do you understand what's going on? |
No, let's see if it happens again. |
I pulled on a thread in etcd/raft today where I was worried that
something was off with our perf improvements around sending MsgApp
before persisting HardState and Entries. Everything ended up being
OK but I updated a comment.
Release note: None