-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replica_rac2: clean up admitted vector sending #130461
Conversation
25cac91
to
fa1256b
Compare
@sumeerbhola This is a follow-up to the discussion in https://reviewable.io/reviews/cockroachdb/cockroach/130171#-O62rc_u938rycg-FTGw. |
Sending to the stale leader sounds right. It will return tokens that can
help other ranges.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)
-- commits
line 17 at r3:
btw, I looked at the last 2 commits as a unit since they seem to touch the same code.
pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go
line 620 at r1 (raw file):
log.Fatalf(ctx, "term regressed from %d to %d", p.mu.term, term) } termChanged := term > p.mu.term
Can Raft transition from the leader being known to being unknown and regress the term to 0?
pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go
line 789 at r3 (raw file):
// skip if the term does not match - the message is a no-op. Note that av.Term // can only lag behind p.mu.term here, and never leads it. If we are the // leader, it always matches.
How about asserting for "If we are the leader, it always matches."
pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go
line 790 at r3 (raw file):
we could choose to send this vector to it - the stale leader will admit everything as a result, because there is a new leader somewhere.
+1 to doing this
The receiver could also choose to step down, though it might be safer not to mess with raft state for now - the leader will learn about the new term via standard raft messages anyway.
+1 to letting Raft handle this.
Epic: none Release note: none
fa1256b
to
1d00c8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
Previously, sumeerbhola wrote…
btw, I looked at the last 2 commits as a unit since they seem to touch the same code.
Np, good that the tool allows that. For clarity, I split it in two because the first commit changes behaviour / test outputs, so I tried to keep it minimal. And the other commit then refactors the code without changing behaviour.
pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go
line 620 at r1 (raw file):
Previously, sumeerbhola wrote…
Can Raft transition from the leader being known to being unknown and regress the term to 0?
The term never regresses, regardless of whether we're leader/follower, or the leader is known/unknown. It's crucial for raft correctness (the term is a promise to not accept lower-term log appends), so we can rely on it.
Raft can transition to unknown leader, see ForgetLeader
stack in RawNode
:
Lines 1895 to 1909 in eb30404
case pb.MsgForgetLeader: | |
// TODO(nvanbenschoten): remove MsgForgetLeader once the sole caller in | |
// Replica.maybeForgetLeaderOnVoteRequestLocked is removed. This will | |
// accompany the removal of epoch-based leases. | |
if r.lead == None { | |
r.logger.Infof("%x no leader at term %d; dropping forget leader msg", r.id, r.Term) | |
return nil | |
} | |
if r.supportingFortifiedLeader() && r.lead != m.From { | |
r.logger.Infof("%x [term %d] ignored MsgForgetLeader from %x due to leader fortification", r.id, r.Term, m.From) | |
return nil | |
} | |
r.logger.Infof("%x forgetting leader %x at term %d", r.id, r.lead, r.Term) | |
r.lead = None | |
r.leadEpoch = 0 |
lead
field is reset to 0, so our leaderID
will too. There is a TODO to remove this ForgetLeader
behaviour but in the meantime it exists.
Epic: none Release note: none
1d00c8f
to
fb69429
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go
line 789 at r3 (raw file):
Previously, sumeerbhola wrote…
How about asserting for "If we are the leader, it always matches."
I figured this is not actually true when we are in a Ready
cycle that changes raft term. Since maybeSendAdmittedRaftMuLocked
is called from HandleRaftReadyRaftMuLocked
which sits at the top of Ready handler, the logTracker
hasn't learned about the term bump (and potential log truncation that comes with it) yet. So we'll have av.Term < p.term
here, because p.term
is up-to-date with raft thanks to makeStateConsistentRaftMuLocked
.
I reworded the comments to reflect what's true. I think we need to improve on this staleness issue. Generally, in this function we are one Ready
behind. We can improve this by e.g. putting another maybeSendAdmittedRaftMuLocked
in AdmitRaftEntriesRaftMuLocked
, so that it can take advantage of knowing the latest term, and potentially knowing a bunch of entries that were just admitted. I'll send a PR for this, and let's discuss there.
pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go
line 790 at r3 (raw file):
Previously, sumeerbhola wrote…
we could choose to send this vector to it - the stale leader will admit everything as a result, because there is a new leader somewhere.
+1 to doing this
The receiver could also choose to step down, though it might be safer not to mess with raft state for now - the leader will learn about the new term via standard raft messages anyway.
+1 to letting Raft handle this.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @pav-kv)
pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go
line 789 at r3 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
I figured this is not actually true when we are in a
Ready
cycle that changes raft term. SincemaybeSendAdmittedRaftMuLocked
is called fromHandleRaftReadyRaftMuLocked
which sits at the top of Ready handler, thelogTracker
hasn't learned about the term bump (and potential log truncation that comes with it) yet. So we'll haveav.Term < p.term
here, becausep.term
is up-to-date with raft thanks tomakeStateConsistentRaftMuLocked
.I reworded the comments to reflect what's true. I think we need to improve on this staleness issue. Generally, in this function we are one
Ready
behind. We can improve this by e.g. putting anothermaybeSendAdmittedRaftMuLocked
inAdmitRaftEntriesRaftMuLocked
, so that it can take advantage of knowing the latest term, and potentially knowing a bunch of entries that were just admitted. I'll send a PR for this, and let's discuss there.
Do you want to make the change we discussed on slack (call registerLogAppend
in HandleRaftReadyRaftMuLocked
just before we return early due to if !p.isLeaderUsingV2ProcLocked()
) in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go
line 789 at r3 (raw file):
Previously, sumeerbhola wrote…
Do you want to make the change we discussed on slack (call
registerLogAppend
inHandleRaftReadyRaftMuLocked
just before we return early due toif !p.isLeaderUsingV2ProcLocked()
) in this PR?
No, it can be done separately: #130688.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r4, 1 of 1 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)
bors r=sumeerbhola |
This PR makes
processorImpl
track the raft nodeterm
unconditionally. It then factors out the admitted vector sending into a well-documented function, and uses the newterm
field to filter out no-op admitted vectors that don't match the leader's term.Part of #129508