Skip to content

Commit

Permalink
raft: ensure leaderMaxSupported does not regress when bumping terms
Browse files Browse the repository at this point in the history
Previously, it was possible for a leader to regress leaderMaxSupported
by calling an election at a higher term. This was because we weren't
careful to recognize this case on the leader, and followers had special
case handling that allowed a leader to override `inFortifyLease`.
Together, this could cause lease regressions for LeaderLeases in some
rare cases.

This patch fixes this issue by removing special case handling in
`inFortifyLease`. We also remove a special case in the handling of
`MsgHup` which allowed a leader to step down and campaign without
de-fortifying itself.

Fixes cockroachdb#133764

Release note: None
  • Loading branch information
arulajmani committed Oct 31, 2024
1 parent 6ddc9e8 commit e2c01fb
Show file tree
Hide file tree
Showing 3 changed files with 286 additions and 111 deletions.
35 changes: 20 additions & 15 deletions pkg/raft/raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -1263,6 +1263,9 @@ func (r *raft) becomeCandidate() {
if r.state == pb.StateLeader {
panic("invalid transition [leader -> candidate]")
}
// We vote for ourselves when we become a candidate. We shouldn't do so if
// we're already fortified.
assertTrue(!r.supportingFortifiedLeader(), "shouldn't become a candidate if we're supporting a fortified leader")
r.step = stepCandidate
r.reset(r.Term + 1)
r.tick = r.tickElection
Expand All @@ -1276,9 +1279,7 @@ func (r *raft) becomePreCandidate() {
if r.state == pb.StateLeader {
panic("invalid transition [leader -> pre-candidate]")
}
assertTrue(!r.supportingFortifiedLeader() || r.lead == r.id,
"should not be supporting a fortified leader when becoming pre-candidate; leader exempted",
)
assertTrue(!r.supportingFortifiedLeader(), "should not be fortifying a leader when becoming pre-candidate")
// Becoming a pre-candidate changes our step functions and state,
// but doesn't change anything else. In particular it does not increase
// r.Term or change r.Vote.
Expand Down Expand Up @@ -1347,13 +1348,23 @@ func (r *raft) hup(t CampaignType) {
r.logger.Infof("%x is unpromotable and can not campaign", r.id)
return
}
// NB: The leader is allowed to bump its term by calling an election. Note that
// we must take care to ensure the leader's support expiration doesn't regress.
// NB: Even an old leader that has since stepped down needs to ensure it is
// no longer fortifying itself before campaigning at a higher term. This is
// because candidates always vote for themselves, and casting a vote isn't
// allowed if the candidate is fortifying a leader. So, before campaigning,
// the old leader needs to either de-fortify itself[1] or withdraw store
// liveness support for its own store for the fortified epoch[2].
//
// [1] The old leader that's since stepped down will only de-fortify itself
// once its leadMaxSupported is in the past. Waiting until we're
// de-fortified before campaigning ensures we don't regress the
// leadMaxSupported.
//
// TODO(arul): add special handling for the r.lead == r.id case with an
// assertion to ensure the LeaderSupportExpiration is in the past before
// campaigning.
if r.supportingFortifiedLeader() && r.lead != r.id {
// [2] While rare, this can happen if it experiences a disk stall. In this
// case, the leadMaxSupported may be in the future by the time we campaign.
// However, if this is the case, we won't be able to win the election, as a
// majority quorum will not vote for us because they'll be in fortify lease.
if r.supportingFortifiedLeader() {
r.logger.Debugf("%x ignoring MsgHup due to leader fortification", r.id)
return
}
Expand Down Expand Up @@ -1478,12 +1489,6 @@ func (r *raft) Step(m pb.Message) error {
force := bytes.Equal(m.Context, []byte(campaignTransfer))
inHeartbeatLease := r.checkQuorum && r.lead != None && r.electionElapsed < r.electionTimeout
inFortifyLease := r.supportingFortifiedLeader() &&
// NB: A fortified leader is allowed to bump its term. It'll need to
// re-fortify once if it gets elected at the higher term though, so the
// leader must take care to not regress its supported expiration.
// However, at the follower, we grant the fortified leader our vote at
// the higher term.
r.lead != m.From &&
// NB: If the peer that's campaigning has an entry in its log with a
// higher term than what we're aware of, then this conclusively proves
// that a new leader was elected at a higher term. We never heard from
Expand Down
151 changes: 119 additions & 32 deletions pkg/raft/testdata/fortification_followers_dont_prevote.txt
Original file line number Diff line number Diff line change
Expand Up @@ -362,13 +362,30 @@ raft-state
2: StateLeader (Voter) Term:2 Lead:2 LeadEpoch:1
3: StateFollower (Voter) Term:2 Lead:2 LeadEpoch:1

# Lastly, ensure that the leader is able to successfully campaign at a higher
# term. We'll need to step down to set this up properly, as otherwise attempts
# to campaign will no-op.
# Lastly, ensure that even the leader is not able to campaign at a higher term.
# We'll need to step down to set this up properly, as otherwise attempts to
# campaign will no-op.
step-down 2
----
INFO 2 became follower at term 2

campaign 2
----
DEBUG 2 ignoring MsgHup due to leader fortification

# Have 2 withdraw support from itself so it can move past the fortification
# for MsgHup. This allows it to campaign, but it will not be able to win the
# election as {1,3} still support the fortified leader. This simulates the case
# where 2's store experiences a disk stall, as a result of which its store
# liveness epoch expires, but any maxLeaderSupported value returned to the
# leasing layer isn't regressed by a new leader.
withdraw-support 2 2
----
1 2 3
1 1 1 1
2 x x 1
3 x 1 1

campaign 2
----
INFO 2 is starting a new election at term 2
Expand All @@ -389,18 +406,103 @@ stabilize
INFO 2 has received 1 MsgPreVoteResp votes and 0 vote rejections
> 1 receiving messages
2->1 MsgPreVote Term:3 Log:2/12
INFO 1 [logterm: 2, index: 12, vote: 0] cast MsgPreVote for 2 [logterm: 2, index: 12] at term 2
INFO 1 [logterm: 2, index: 12, vote: 0] ignored MsgPreVote from 2 [logterm: 2, index: 12] at term 2: supporting fortified leader 2 at epoch 1
> 3 receiving messages
2->3 MsgPreVote Term:3 Log:2/12
INFO 3 [logterm: 2, index: 12, vote: 2] cast MsgPreVote for 2 [logterm: 2, index: 12] at term 2
INFO 3 [logterm: 2, index: 12, vote: 2] ignored MsgPreVote from 2 [logterm: 2, index: 12] at term 2: supporting fortified leader 2 at epoch 1

# Note that 2 wasn't able to win the pre-vote election. At this point, the
# leaderMaxSupported is in the future. This demonstrates that the leader isn't
# able to regress leaderMaxSupported by calling an election at a higher term.
raft-state
----
1: StateFollower (Voter) Term:2 Lead:2 LeadEpoch:1
2: StatePreCandidate (Voter) Term:2 Lead:0 LeadEpoch:0
3: StateFollower (Voter) Term:2 Lead:2 LeadEpoch:1


# Let's de-fortify {1,3}. First, sanity check that simply ticking doesn't send
# out a de-fortification request, as leaderMaxSupported isn't in the past yet.
tick-heartbeat 2
----
ok

stabilize
----
ok

# Expired leaderMaxSupported and tick again, which should de-fortify {1,3}.
support-expired 2
----
ok

tick-heartbeat 2
----
ok

stabilize
----
> 2 handling Ready
Ready MustSync=false:
Messages:
2->1 MsgDeFortifyLeader Term:2 Log:0/0
2->3 MsgDeFortifyLeader Term:2 Log:0/0
> 1 receiving messages
2->1 MsgDeFortifyLeader Term:2 Log:0/0
> 3 receiving messages
2->3 MsgDeFortifyLeader Term:2 Log:0/0
> 1 handling Ready
Ready MustSync=true:
HardState Term:2 Commit:12 Lead:2 LeadEpoch:0
Messages:
1->2 MsgPreVoteResp Term:3 Log:0/0
> 3 handling Ready
Ready MustSync=true:
HardState Term:2 Vote:2 Commit:12 Lead:2 LeadEpoch:0

# At this point, 2 can campaign and win the election. Reset store liveness by
# bumping the epoch before campaigning though.
bump-epoch 2
----
1 2 3
1 1 2 1
2 x x 1
3 x 2 1

grant-support 2 2
----
1 2 3
1 1 2 1
2 x 3 1
3 x 2 1


campaign 2
----
INFO 2 is starting a new election at term 2
INFO 2 became pre-candidate at term 2
INFO 2 [logterm: 2, index: 12] sent MsgPreVote request to 1 at term 2
INFO 2 [logterm: 2, index: 12] sent MsgPreVote request to 3 at term 2

stabilize
----
> 2 handling Ready
Ready MustSync=false:
Messages:
2->1 MsgPreVote Term:3 Log:2/12
2->3 MsgPreVote Term:3 Log:2/12
INFO 2 received MsgPreVoteResp from 2 at term 2
INFO 2 has received 1 MsgPreVoteResp votes and 0 vote rejections
> 1 receiving messages
2->1 MsgPreVote Term:3 Log:2/12
INFO 1 [logterm: 2, index: 12, vote: 0] cast MsgPreVote for 2 [logterm: 2, index: 12] at term 2
> 3 receiving messages
2->3 MsgPreVote Term:3 Log:2/12
INFO 3 [logterm: 2, index: 12, vote: 2] cast MsgPreVote for 2 [logterm: 2, index: 12] at term 2
> 1 handling Ready
Ready MustSync=false:
Messages:
1->2 MsgPreVoteResp Term:3 Log:0/0
> 3 handling Ready
Ready MustSync=false:
Messages:
3->2 MsgPreVoteResp Term:3 Log:0/0
> 2 receiving messages
Expand Down Expand Up @@ -445,85 +547,70 @@ stabilize
INFO 2 received MsgVoteResp from 1 at term 3
INFO 2 has received 2 MsgVoteResp votes and 0 vote rejections
INFO 2 became leader at term 3
INFO 2 leader at term 3 does not support itself in the liveness fabric
3->2 MsgVoteResp Term:3 Log:0/0
> 2 handling Ready
Ready MustSync=true:
State:StateLeader
HardState Term:3 Vote:2 Commit:12 Lead:2 LeadEpoch:1
HardState Term:3 Vote:2 Commit:12 Lead:2 LeadEpoch:0
Entries:
3/13 EntryNormal ""
Messages:
2->1 MsgFortifyLeader Term:3 Log:0/0
2->3 MsgFortifyLeader Term:3 Log:0/0
2->1 MsgApp Term:3 Log:2/12 Commit:12 Entries:[3/13 EntryNormal ""]
2->3 MsgApp Term:3 Log:2/12 Commit:12 Entries:[3/13 EntryNormal ""]
> 1 receiving messages
2->1 MsgFortifyLeader Term:3 Log:0/0
2->1 MsgApp Term:3 Log:2/12 Commit:12 Entries:[3/13 EntryNormal ""]
> 3 receiving messages
2->3 MsgFortifyLeader Term:3 Log:0/0
2->3 MsgApp Term:3 Log:2/12 Commit:12 Entries:[3/13 EntryNormal ""]
> 1 handling Ready
Ready MustSync=true:
HardState Term:3 Vote:2 Commit:12 Lead:2 LeadEpoch:1
HardState Term:3 Vote:2 Commit:12 Lead:2 LeadEpoch:0
Entries:
3/13 EntryNormal ""
Messages:
1->2 MsgFortifyLeaderResp Term:3 Log:0/0 LeadEpoch:1
1->2 MsgAppResp Term:3 Log:0/13 Commit:12
> 3 handling Ready
Ready MustSync=true:
HardState Term:3 Vote:2 Commit:12 Lead:2 LeadEpoch:1
HardState Term:3 Vote:2 Commit:12 Lead:2 LeadEpoch:0
Entries:
3/13 EntryNormal ""
Messages:
3->2 MsgFortifyLeaderResp Term:3 Log:0/0 LeadEpoch:1
3->2 MsgAppResp Term:3 Log:0/13 Commit:12
> 2 receiving messages
1->2 MsgFortifyLeaderResp Term:3 Log:0/0 LeadEpoch:1
1->2 MsgAppResp Term:3 Log:0/13 Commit:12
3->2 MsgFortifyLeaderResp Term:3 Log:0/0 LeadEpoch:1
3->2 MsgAppResp Term:3 Log:0/13 Commit:12
> 2 handling Ready
Ready MustSync=true:
HardState Term:3 Vote:2 Commit:13 Lead:2 LeadEpoch:1
HardState Term:3 Vote:2 Commit:13 Lead:2 LeadEpoch:0
CommittedEntries:
3/13 EntryNormal ""
Messages:
2->1 MsgApp Term:3 Log:2/12 Commit:12 Entries:[3/13 EntryNormal ""]
2->1 MsgApp Term:3 Log:3/13 Commit:13
2->3 MsgApp Term:3 Log:2/12 Commit:13 Entries:[3/13 EntryNormal ""]
2->3 MsgApp Term:3 Log:3/13 Commit:13
> 1 receiving messages
2->1 MsgApp Term:3 Log:2/12 Commit:12 Entries:[3/13 EntryNormal ""]
2->1 MsgApp Term:3 Log:3/13 Commit:13
> 3 receiving messages
2->3 MsgApp Term:3 Log:2/12 Commit:13 Entries:[3/13 EntryNormal ""]
2->3 MsgApp Term:3 Log:3/13 Commit:13
> 1 handling Ready
Ready MustSync=true:
HardState Term:3 Vote:2 Commit:13 Lead:2 LeadEpoch:1
HardState Term:3 Vote:2 Commit:13 Lead:2 LeadEpoch:0
CommittedEntries:
3/13 EntryNormal ""
Messages:
1->2 MsgAppResp Term:3 Log:0/13 Commit:12
1->2 MsgAppResp Term:3 Log:0/13 Commit:13
> 3 handling Ready
Ready MustSync=true:
HardState Term:3 Vote:2 Commit:13 Lead:2 LeadEpoch:1
HardState Term:3 Vote:2 Commit:13 Lead:2 LeadEpoch:0
CommittedEntries:
3/13 EntryNormal ""
Messages:
3->2 MsgAppResp Term:3 Log:0/13 Commit:13
3->2 MsgAppResp Term:3 Log:0/13 Commit:13
> 2 receiving messages
1->2 MsgAppResp Term:3 Log:0/13 Commit:12
1->2 MsgAppResp Term:3 Log:0/13 Commit:13
3->2 MsgAppResp Term:3 Log:0/13 Commit:13
3->2 MsgAppResp Term:3 Log:0/13 Commit:13

raft-state
----
1: StateFollower (Voter) Term:3 Lead:2 LeadEpoch:1
2: StateLeader (Voter) Term:3 Lead:2 LeadEpoch:1
3: StateFollower (Voter) Term:3 Lead:2 LeadEpoch:1
1: StateFollower (Voter) Term:3 Lead:2 LeadEpoch:0
2: StateLeader (Voter) Term:3 Lead:2 LeadEpoch:0
3: StateFollower (Voter) Term:3 Lead:2 LeadEpoch:0
Loading

0 comments on commit e2c01fb

Please sign in to comment.