Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replica_rac2: clean up admitted vector sending #130461

Merged
merged 2 commits into from
Sep 13, 2024

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Sep 11, 2024

This PR makes processorImpl track the raft node term unconditionally. It then factors out the admitted vector sending into a well-documented function, and uses the new term field to filter out no-op admitted vectors that don't match the leader's term.

Part of #129508

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv marked this pull request as ready for review September 11, 2024 11:17
@pav-kv pav-kv requested a review from a team as a code owner September 11, 2024 11:17
@pav-kv pav-kv requested a review from sumeerbhola September 11, 2024 11:17
@pav-kv
Copy link
Collaborator Author

pav-kv commented Sep 11, 2024

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Sep 11, 2024 via email

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)


-- commits line 17 at r3:
btw, I looked at the last 2 commits as a unit since they seem to touch the same code.


pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go line 620 at r1 (raw file):

		log.Fatalf(ctx, "term regressed from %d to %d", p.mu.term, term)
	}
	termChanged := term > p.mu.term

Can Raft transition from the leader being known to being unknown and regress the term to 0?


pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go line 789 at r3 (raw file):

	// skip if the term does not match - the message is a no-op. Note that av.Term
	// can only lag behind p.mu.term here, and never leads it. If we are the
	// leader, it always matches.

How about asserting for "If we are the leader, it always matches."


pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go line 790 at r3 (raw file):

we could choose to send this vector to it - the stale leader will admit everything as a result, because there is a new leader somewhere.

+1 to doing this

The receiver could also choose to step down, though it might be safer not to mess with raft state for now - the leader will learn about the new term via standard raft messages anyway.

+1 to letting Raft handle this.

Epic: none
Release note: none
Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


-- commits line 17 at r3:

Previously, sumeerbhola wrote…

btw, I looked at the last 2 commits as a unit since they seem to touch the same code.

Np, good that the tool allows that. For clarity, I split it in two because the first commit changes behaviour / test outputs, so I tried to keep it minimal. And the other commit then refactors the code without changing behaviour.


pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go line 620 at r1 (raw file):

Previously, sumeerbhola wrote…

Can Raft transition from the leader being known to being unknown and regress the term to 0?

The term never regresses, regardless of whether we're leader/follower, or the leader is known/unknown. It's crucial for raft correctness (the term is a promise to not accept lower-term log appends), so we can rely on it.

Raft can transition to unknown leader, see ForgetLeader stack in RawNode:

cockroach/pkg/raft/raft.go

Lines 1895 to 1909 in eb30404

case pb.MsgForgetLeader:
// TODO(nvanbenschoten): remove MsgForgetLeader once the sole caller in
// Replica.maybeForgetLeaderOnVoteRequestLocked is removed. This will
// accompany the removal of epoch-based leases.
if r.lead == None {
r.logger.Infof("%x no leader at term %d; dropping forget leader msg", r.id, r.Term)
return nil
}
if r.supportingFortifiedLeader() && r.lead != m.From {
r.logger.Infof("%x [term %d] ignored MsgForgetLeader from %x due to leader fortification", r.id, r.Term, m.From)
return nil
}
r.logger.Infof("%x forgetting leader %x at term %d", r.id, r.lead, r.Term)
r.lead = None
r.leadEpoch = 0
. It only means the lead field is reset to 0, so our leaderID will too. There is a TODO to remove this ForgetLeader behaviour but in the meantime it exists.

Epic: none
Release note: none
Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go line 789 at r3 (raw file):

Previously, sumeerbhola wrote…

How about asserting for "If we are the leader, it always matches."

I figured this is not actually true when we are in a Ready cycle that changes raft term. Since maybeSendAdmittedRaftMuLocked is called from HandleRaftReadyRaftMuLocked which sits at the top of Ready handler, the logTracker hasn't learned about the term bump (and potential log truncation that comes with it) yet. So we'll have av.Term < p.term here, because p.term is up-to-date with raft thanks to makeStateConsistentRaftMuLocked.

I reworded the comments to reflect what's true. I think we need to improve on this staleness issue. Generally, in this function we are one Ready behind. We can improve this by e.g. putting another maybeSendAdmittedRaftMuLocked in AdmitRaftEntriesRaftMuLocked, so that it can take advantage of knowing the latest term, and potentially knowing a bunch of entries that were just admitted. I'll send a PR for this, and let's discuss there.


pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go line 790 at r3 (raw file):

Previously, sumeerbhola wrote…

we could choose to send this vector to it - the stale leader will admit everything as a result, because there is a new leader somewhere.

+1 to doing this

The receiver could also choose to step down, though it might be safer not to mess with raft state for now - the leader will learn about the new term via standard raft messages anyway.

+1 to letting Raft handle this.

Done

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @pav-kv)


pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go line 789 at r3 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

I figured this is not actually true when we are in a Ready cycle that changes raft term. Since maybeSendAdmittedRaftMuLocked is called from HandleRaftReadyRaftMuLocked which sits at the top of Ready handler, the logTracker hasn't learned about the term bump (and potential log truncation that comes with it) yet. So we'll have av.Term < p.term here, because p.term is up-to-date with raft thanks to makeStateConsistentRaftMuLocked.

I reworded the comments to reflect what's true. I think we need to improve on this staleness issue. Generally, in this function we are one Ready behind. We can improve this by e.g. putting another maybeSendAdmittedRaftMuLocked in AdmitRaftEntriesRaftMuLocked, so that it can take advantage of knowing the latest term, and potentially knowing a bunch of entries that were just admitted. I'll send a PR for this, and let's discuss there.

Do you want to make the change we discussed on slack (call registerLogAppend in HandleRaftReadyRaftMuLocked just before we return early due to if !p.isLeaderUsingV2ProcLocked()) in this PR?

Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go line 789 at r3 (raw file):

Previously, sumeerbhola wrote…

Do you want to make the change we discussed on slack (call registerLogAppend in HandleRaftReadyRaftMuLocked just before we return early due to if !p.isLeaderUsingV2ProcLocked()) in this PR?

No, it can be done separately: #130688.

@pav-kv pav-kv requested a review from sumeerbhola September 13, 2024 16:52
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r4, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)

@pav-kv
Copy link
Collaborator Author

pav-kv commented Sep 13, 2024

bors r=sumeerbhola

@craig craig bot merged commit 0d1dca5 into cockroachdb:master Sep 13, 2024
23 checks passed
@pav-kv pav-kv deleted the track-raft-term branch September 13, 2024 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants