Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: send de-fortification requests with the correct term #131828

Merged
merged 1 commit into from
Oct 8, 2024

Conversation

arulajmani
Copy link
Collaborator

First 2 commits from #131668


Addresses a TODO, where we were plopping an incorrect term on
MsgDeFortifyLeader. A MsgDeFortifyLeader should include the leadership
term that's being de-fortified, not the currently known term to old
leader that's trying to de-fortify. To enable this, we track the term on
the FortificationTracker and ensure it's only reset when a peer steps up
to become leader again.

Informs #129098

Release note: None

@arulajmani arulajmani requested a review from a team as a code owner October 3, 2024 15:18
Copy link

blathers-crl bot commented Oct 3, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this came out well. :lgtm: once the commentary does a little better job explaining why we have this somewhat surprising state tracking.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @iskettaneh)


-- commits line 49 at r3:
"to the old leader"


pkg/raft/raft.go line 1186 at r3 (raw file):
nit: since this is a little surprising, consider driving the point home even further in the second sentence:

We reset it if we ever step up to become leader again, but not when stepping down as leader and not even when learning of a new leader in a later term.


pkg/raft/tracker/fortificationtracker.go line 30 at r3 (raw file):

	fortification map[pb.PeerID]pb.Epoch

	// term is the leadership term associated with fortification tracking.

This would be a good place to add a few more words around how this differs from raft.term and why we need it to.


pkg/raft/tracker/fortificationtracker.go line 31 at r3 (raw file):

	// term is the leadership term associated with fortification tracking.
	term uint64

tiny nit here and in Reset: move term above fortification to match their respective scopes. The fortification state is scoped to a single leadership term.


pkg/raft/tracker/fortificationtracker.go line 61 at r3 (raw file):

// Reset clears out any previously tracked fortification and prepares the
// fortification tracker to be used at the supplied term.

"used by a newly elected leader"

Copy link
Contributor

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 21 of 21 files at r1, 14 of 14 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/raft/raft.go line 854 at r2 (raw file):

func (r *raft) sendDeFortify(to pb.PeerID) {
	if to == r.id {
		// We handle the case where the leader is trying to de-fortify itself

maybe keep the comment before the TODO just to help future readers know why we have this special case


pkg/raft/testdata/de_fortification_basic.txt line 985 at r3 (raw file):

INFO 2 [logterm: 4, index: 5] sent MsgVote request to 3 at term 5

stabilize

Curious if we really need the debug level logging in this test, or can we get away with info?

@arulajmani arulajmani force-pushed the ll-msg-defortify-correct-term branch from f774409 to c18dc54 Compare October 7, 2024 17:45
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFAL

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @iskettaneh)


pkg/raft/raft.go line 1186 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: since this is a little surprising, consider driving the point home even further in the second sentence:

We reset it if we ever step up to become leader again, but not when stepping down as leader and not even when learning of a new leader in a later term.

Done.


pkg/raft/tracker/fortificationtracker.go line 30 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This would be a good place to add a few more words around how this differs from raft.term and why we need it to.

Done.


pkg/raft/tracker/fortificationtracker.go line 31 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

tiny nit here and in Reset: move term above fortification to match their respective scopes. The fortification state is scoped to a single leadership term.

Done.


pkg/raft/testdata/de_fortification_basic.txt line 985 at r3 (raw file):

Previously, iskettaneh wrote…

Curious if we really need the debug level logging in this test, or can we get away with info?

Not for the one we added in this patch, but some of the de-fortification related log lines from the previous PR are behind a debug log.

@arulajmani arulajmani force-pushed the ll-msg-defortify-correct-term branch from c18dc54 to a264fb0 Compare October 7, 2024 17:51
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @iskettaneh and @nvanbenschoten)


pkg/raft/raft.go line 854 at r2 (raw file):

Previously, iskettaneh wrote…

maybe keep the comment before the TODO just to help future readers know why we have this special case

That was accidental. Fixed.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 30 of 30 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @iskettaneh)


pkg/raft/tracker/fortificationtracker.go line 33 at r4 (raw file):

	// The term differs from raft.term in that raft.term is the highest term known
	// to a peer, whereas FortificationTracker.term is the highest term a peer was
	// a leader at.

"was a leader at since the last time it restarted."

Addresses a TODO, where we were plopping an incorrect term on
MsgDeFortifyLeader. A MsgDeFortifyLeader should include the leadership
term that's being de-fortified, not the currently known term to the old
leader that's trying to de-fortify. To enable this, we track the term on
the FortificationTracker and ensure it's only reset when a peer steps up
to become leader again.

Informs cockroachdb#129098

Release note: None
@arulajmani arulajmani force-pushed the ll-msg-defortify-correct-term branch from a264fb0 to 7e28f1e Compare October 8, 2024 14:22
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r=nvanbenschoten,iskettaneh

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @iskettaneh and @nvanbenschoten)


pkg/raft/tracker/fortificationtracker.go line 33 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"was a leader at since the last time it restarted."

Good catch. Done.

@craig craig bot merged commit 779f089 into cockroachdb:master Oct 8, 2024
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants