Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: forget the leader when stepping down if it's safe #134990

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

iskettaneh
Copy link
Contributor

@iskettaneh iskettaneh commented Nov 12, 2024

First commit is based on #136025

This PR makes the leader forget that it was the leader when stepping down if it's safe to do so. The places this PR changes:

  1. When stepping down due to CheckQuorum.
  2. When stepping down due to leadership transfer.
  3. When stepping down due to becoming a learner.
  4. After a restart.
  5. After a VoteLoss.

Epic: none

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@iskettaneh iskettaneh marked this pull request as ready for review November 12, 2024 19:12
@iskettaneh iskettaneh requested a review from a team as a code owner November 12, 2024 19:12
Copy link
Contributor Author

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/raft/raft.go line 1274 at r6 (raw file):

	r.tick = r.tickElection
	if lead == None {
		r.resetLead()

The only downside I see here is that defortify() now wouldn't advance electionElapsed because we are already defortified.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 3 of 3 files at r4, 3 of 3 files at r5, 2 of 2 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @iskettaneh)


pkg/raft/raft.go line 1257 at r2 (raw file):

		// Once this state is stabilized (within and above pkg/raft), we can remove
		// this special case.
		r.lead = None

We can remove this manual r.lead assignment now.


pkg/raft/raft.go line 1263 at r2 (raw file):

	r.reset(term)
	r.tick = r.tickElection
	if lead == None {

This second commit is essentially a reversion of b1053d4. Should we just undo that commit?


pkg/raft/raft.go line 2714 at r5 (raw file):

		// and propose that change. Therefore, it should be safe at this point to
		// forget that we were the leader at this term and step down.
		r.becomeFollower(r.Term, None)

Should we make the same change in the VoteLost case of stepCandidate?

Copy link
Contributor Author

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @iskettaneh, and @nvanbenschoten)


pkg/raft/raft.go line 1263 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This second commit is essentially a reversion of b1053d4. Should we just undo that commit?

The second commit kinda assumed that if you explicitly step down with lead==None, you want to DeFortify (setting LeadEpoch to 0). This helps avoid assertions where we know that it's safe to step down while forgetting the leader (for example due to becoming a learner, or due to CheckQuorum). In those cases, we might be in a situation where r.lead == 0 but r.leadEpoch != 0.

Shall we explicitly call DeFortify() before stepping down in those cases?

Copy link
Contributor Author

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/raft/raft.go line 1257 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We can remove this manual r.lead assignment now.

I left a TODO to call r.resetLead() inside r.reset() once we verified that it's safe to do so (all places that stepdown to the same term can defortify at that point). Once we do that, we canm remove this manual assignment


pkg/raft/raft.go line 1263 at r2 (raw file):

Previously, iskettaneh wrote…

The second commit kinda assumed that if you explicitly step down with lead==None, you want to DeFortify (setting LeadEpoch to 0). This helps avoid assertions where we know that it's safe to step down while forgetting the leader (for example due to becoming a learner, or due to CheckQuorum). In those cases, we might be in a situation where r.lead == 0 but r.leadEpoch != 0.

Shall we explicitly call DeFortify() before stepping down in those cases?

I left a TODO to call r.resetLead() inside r.reset() once we verified that it's safe to do so (all places that stepdown to the same term can defortify at that point). What do you think?

Copy link
Contributor Author

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/raft/raft.go line 1257 at r2 (raw file):

Previously, iskettaneh wrote…

I left a TODO to call r.resetLead() inside r.reset() once we verified that it's safe to do so (all places that stepdown to the same term can defortify at that point). Once we do that, we canm remove this manual assignment

Nvm, it's safe to remove it now since I am calling resetLead() if lead==none, and we already know that leadEpoch==0 inside that if-statement. I removed it.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intermediate comments. Flushing them so you can address them while you make the changes we discussed synchronously.

Reviewed 4 of 4 files at r1, 8 of 8 files at r7, 1 of 1 files at r8, 3 of 3 files at r9, 3 of 3 files at r10.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @nvanbenschoten)


pkg/raft/raft.go line 1267 at r1 (raw file):

	r.logger.Infof("%x became follower at term %d", r.id, r.Term)

	// If we should broadcast a DeFortify message, we will do it in the next

This comment doesn't seem great -- it's written in the context of the code that existed previously, not what the code does/will do after this patch merges. I would reword this as:

Start de-fortifying eagerly so we don't have to wait out a full heartbeat timeout before sending the first de-fortification message.

Copy link
Contributor Author

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/raft/raft.go line 1267 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

This comment doesn't seem great -- it's written in the context of the code that existed previously, not what the code does/will do after this patch merges. I would reword this as:

Start de-fortifying eagerly so we don't have to wait out a full heartbeat timeout before sending the first de-fortification message.

Done.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM

Reviewed 16 of 16 files at r21, 4 of 4 files at r22, 1 of 1 files at r23, 2 of 2 files at r24, 4 of 4 files at r25, 3 of 3 files at r26, 2 of 2 files at r27, 1 of 1 files at r28, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @nvanbenschoten)


pkg/raft/raft.go line 1276 at r22 (raw file):

	r.reset(term)
	r.setLead(lead)
	r.logger.Infof("%x became follower at term %d", r.id, r.Term)

nit: consider adding a r.logger.Debugf("%x reset election elapsed to %d", r.id, r.electionElapsed) just so we're able to see that the election elapsed is reset back to 0 after the call to de-fortify in the data driven tests?


pkg/raft/raft.go line 491 at r27 (raw file):

	if r.lead == r.id {
		// If we were the leader, we must have waited out the leaderMaxSupported

nit: leadMaxSupported.

Seperately, could you add words for how we know that we've waited out the leadMaxSupported at this point? Also, worth adding words about why we forget the leader when becoming a follower (even though it is safe, it's unclear to a reader why we're doing so).


pkg/raft/raft.go line 498 at r27 (raw file):

	} else {
		// If we weren't the leader, we should NOT forget who the leader is to avoid
		// regressing the leaderMaxSupported.

nit: leadMaxSupported.

Separately, can you add words on how the leadMaxSupported could regress if we were to forget the leader?

Right now, we start bcasting defortificaton messages at heartbeat
timeouts. However, there is nothing that prevents us from starting this
process even earlier. This commit starts this process when stepping down
immediately after becoming a follower.

Epic: None

Release notes: None
This commit allows us to forget the leader when stepping down. We can't
just call `r.resetLead()` inside `r.reset()` because it would defortify
in all places where we stepdown to the same term. Once we make it safe
in all places to stepdown to the same term and forget the leader, we
can call r.resetLead() inside r.reset().

Epic: None

Release note: None
In checkQuorum, if we decide to step down, it means that a majority
of replicas don't support us. Therefore, it's safe to step down and
forget that we were the leader at this term. We can safely campaign
and participate in elections.

Epic: None

Release note: None
Since leadership transfer means that the current leader instructs
another replica to campaign without performing safety checks. It
should be safe to forget the leader when stepping down.

Epic: None

Release note: None
When a current leader becomes a learner, it should be safe to step down
and forget that we were the leader at the current term because:
(1) learners can't campaign or participate in elections, and (2) in
order for a learner to become a voter again, a new leader needs to get
elected and propose that change, which means that the replica would be
able to safely participate in elections then.

Epic: None

Release note: None
This commit makes us forget that we were the leader at this term after
a restart.

Epic: None

Release notes: None
We already forget the leader when attempting to campaign. This commit
doesn't change any current behaviour.

Epic: None

Release notes: None
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo comment.

Reviewed 35 of 35 files at r29, 1 of 1 files at r30, 2 of 2 files at r31, 4 of 4 files at r32, 3 of 3 files at r33, 2 of 2 files at r34, 1 of 1 files at r35, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @iskettaneh and @nvanbenschoten)


pkg/raft/raft.go line 501 at r34 (raw file):

	} else {
		// If we weren't the leader, we should NOT forget who the leader is to avoid
		// regressing the leadMaxSupported. We can't just forget the leader because

Something about this looks off. I'd say something like:

// If we weren't the leader, we should NOT forget who the leader is to avoid
// regressing the leadMaxSupported. We can't just forget the leader because
// we might have been fortifying a leader before the restart and need to 
// uphold our fortification promise after the restart as well. To do so, we need
// to remember the leader and the fortified epoch, otherwise we may vote for
// a different candidate or prematurely call an election, either of which could
// regress the LeadSupportUntil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants