raft: assert on HardState field modifications #133169

nvanbenschoten · 2024-10-22T17:50:44Z

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:

term regressions
vote changes
lead changes in a single term
leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP:

The 2nd, 3rd, and 4th commits in the PR address these issues in raft by not assuming that a MsgSnap as always coming from the leader of Message.Term.

Release note: None

cockroach-teamcity · 2024-10-22T17:50:55Z

This change is

pkg/raft/raft.go

pav-kv · 2024-10-22T19:58:21Z

pkg/raft/raft.go

@@ -2076,8 +2068,6 @@ func stepCandidate(r *raft, m pb.Message) error {
 			// m.Term > r.Term; reuse r.Term
 			r.becomeFollower(r.Term, r.lead)
 		}
-	case pb.MsgTimeoutNow:


This removal seems unrelated to this change. Is it removed because the message is not useful? LGTM, but just checking the reason for it.

It is removed because MsgTimeoutNow is caught by IsMsgFromLeader. With this change, a candidate will now handle the MsgTimeoutNow as an instruction to force campaign at the next term, instead of ignoring it.

Sounds like a behaviour change (previously we would stay candidate because we've already started election and don't need to do it again; now we will step back to follower and re-campaign again?). I don't have a good sense whether it's a benign change. Would it be possible to retain the old behaviour?

Also, do we need to assume MsgTimeoutNow came from the leader? It's a forceful campaign, and in theory any node could randomly decide to do that without the leader's involvement. Perhaps the uncertainty comes from the fact that it's used both for forced campaigns and leader transfers (and in the latter case it makes sense to say that the message is made on behalf of this leader).

Yes, this is new behavior. But it's desirable behavior. I've extracted it into #133490.

Also, do we need to assume MsgTimeoutNow came from the leader? It's a forceful campaign, and in theory any node could randomly decide to do that without the leader's involvement. Perhaps the uncertainty comes from the fact that it's used both for forced campaigns and leader transfers (and in the latter case it makes sense to say that the message is made on behalf of this leader).

It used to be disruptive but safe to allow any node to decide to perform a force campaign. With leader fortification and leader leases, it is now very unsafe. A force campaign will instruct any voter to ignore the fortification promise that it has made to the leader, which could cause a lease expiration regression (overlapping leases). Force campaigns are only safe if the leader is the one initiating them, because it will step down to a follower at the same time.

used both for forced campaigns and leader transfers

The notion of a "forced campaign", separate from a leader transfer is relatively new. It came from #104969 and I'd like to eliminate the concept with something like #133308.

pav-kv · 2024-10-22T20:04:55Z

pkg/raft/raft.go

+}
+
+func (r *raft) resetLeadEpoch() {
+	r.leadEpoch = 0


Non-actionable, but just curious about the fortification protocol: is it safe to reset leadEpoch to 0 in the de-fortify case?

Can the leadEpoch be set to non-zero again during the course of the term? If yes, can it be set to a value below / different than the last non-zero leadEpoch? As a thought experiment: what happens when messages are delayed / reordered / duplicated: can we get in and out of leadEpoch == 0 / in what ways, and is it fine?

It is safe to reset to 0 when if either (1) the leader has told us to defortify, or (2) we have withdrawn support for the lead epoch in the store liveness network.

leadEpoch can then be set to a non-zero value again if a new MsgForgetLeader is received, which is by design. However, the value must be equal or greater than it was before it was reset to 0, because it comes from a call to storeLiveness.SupportFor, which is guaranteed to be monotonic.

pav-kv · 2024-10-23T13:34:41Z

pkg/kv/kvserver/replica_raft.go

+	msg := raftpb.Message{
+		To: raftpb.PeerID(r.replicaID),
+		// We pretend that the message was sent from the leader.
+		From: lead,
+		Type: raftpb.MsgTimeoutNow,
+	}


Is this related to the change in this commit, or a drive-by? I see that From is used in the handler for logging, is that why?

Also, is it safe/canonical to communicate a leader without communicating the term? I imagine if a follower receives a MsgTimeoutNow for a long gone term, it should ignore.

I know this message is synchronous/local here (so there can't be a term mismatch), but it isn't when this message is send from leader to a peer.

Is this related to the change in this commit, or a drive-by? I see that From is used in the handler for logging, is that why?

It's related to the change and to our thread above. The use of MsgTimeoutNow like this is a hack and is hardly safe, it's only maybe ok because of when this code is run (see shouldCampaignAfterConfChange). I'd like to get rid of this, but in the meantime, I'm ok making it even more clear that cockroach is lying to raft about this message coming from the leader, which in some ways it is because the leader is the one committing the entry which demotes itself to a learner.

Also, is it safe/canonical to communicate a leader without communicating the term? I imagine if a follower receives a MsgTimeoutNow for a long gone term, it should ignore.

Good point. I added the Term, along with a better comment.

pav-kv · 2024-10-23T13:47:25Z

pkg/raft/raft.go

@@ -2076,8 +2068,6 @@ func stepCandidate(r *raft, m pb.Message) error {
 			// m.Term > r.Term; reuse r.Term
 			r.becomeFollower(r.Term, r.lead)
 		}
-	case pb.MsgTimeoutNow:


Sounds like a behaviour change (previously we would stay candidate because we've already started election and don't need to do it again; now we will step back to follower and re-campaign again?). I don't have a good sense whether it's a benign change. Would it be possible to retain the old behaviour?

Also, do we need to assume MsgTimeoutNow came from the leader? It's a forceful campaign, and in theory any node could randomly decide to do that without the leader's involvement. Perhaps the uncertainty comes from the fact that it's used both for forced campaigns and leader transfers (and in the latter case it makes sense to say that the message is made on behalf of this leader).

nvanbenschoten · 2024-10-25T21:08:08Z

@pav-kv PTAL at #133490 and then give this PR one more pass when you get a change. Thanks.

Informs cockroachdb#125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * cockroachdb#127349 * cockroachdb#127348 Release note: None

nvanbenschoten · 2024-10-28T15:13:57Z

@pav-kv now that #133490 merged, this has now been rebased on the latest master.

Group the handling of learning about the leader in stepFollower. Epic: None Release note: None

Group the handling of the candidate -> follower state transition when hearing from the leader while a candidate. Epic: None Release note: None

We can't consider MsgSnap to be from the leader of Message.Term until we address cockroachdb#127348 and cockroachdb#127349. Epic: None Release note: None

nvanbenschoten · 2024-10-28T16:40:49Z

TFTR!

bors r+

craig · 2024-10-28T17:35:57Z

Build succeeded:

blathers-crl · 2024-10-28T17:36:06Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from 18fb66b to blathers/backport-release-24.3-133169: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.3.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

nvanbenschoten · 2024-10-28T18:16:08Z

blathers backport 24.3

132242: raft: test TestCannotCommitWithoutNewTermEntry w/ & wo/ store liveness r=iskettaneh a=iskettaneh This commit changes TestCannotCommitWithoutNewTermEntry to allow testing it when store liveness is both enabled and disabled. In order to achieve that, it adds a mockStoreLiveness for raft unit tests. It acts like AlwaysLive store liveness by default, but it could be to change the supportFor/SupportFrom between any two nodes, also, it allows to bump the support epochs. References: #132241 Release note: None 133291: raft,kvserver: handle MsgSnap with outdated term r=nvanbenschoten a=pav-kv A snapshot message may arrive under an outdated term. Since it carries committed state, we can safely process it regardless of the term. The message term means the snapshot is committed as of this term (or could be committed earlier). By raft invariants, all committed state under a particular term will be committed under later terms as well. Part of #127348 Related to #133169 Co-authored-by: Ibrahim Kettaneh <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>

nvanbenschoten requested review from pav-kv and arulajmani October 22, 2024 17:50

nvanbenschoten requested a review from a team as a code owner October 22, 2024 17:50

nvanbenschoten added the backport-24.3.x Flags PRs that need to be backported to 24.3 label Oct 22, 2024

pav-kv reviewed Oct 22, 2024

View reviewed changes

pkg/raft/raft.go Outdated Show resolved Hide resolved

nvanbenschoten force-pushed the nvanbenschoten/raftAssert branch from a3a2b5b to d85677c Compare October 22, 2024 19:47

pav-kv approved these changes Oct 22, 2024

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/raftAssert branch from d85677c to a5cb07c Compare October 23, 2024 01:29

pav-kv approved these changes Oct 23, 2024

View reviewed changes

pav-kv mentioned this pull request Oct 23, 2024

raft,kvserver: handle MsgSnap with outdated term #133291

Merged

nvanbenschoten force-pushed the nvanbenschoten/raftAssert branch from a5cb07c to 395f778 Compare October 25, 2024 21:08

nvanbenschoten force-pushed the nvanbenschoten/raftAssert branch from 395f778 to eb5d56d Compare October 28, 2024 15:13

pav-kv approved these changes Oct 28, 2024

View reviewed changes

nvanbenschoten added 3 commits October 28, 2024 11:26

raft: use IsMsgFromLeader is stepFollower

4fba5d0

Group the handling of learning about the leader in stepFollower. Epic: None Release note: None

raft: use IsMsgFromLeader in stepCandidate

18fb66b

Group the handling of the candidate -> follower state transition when hearing from the leader while a candidate. Epic: None Release note: None

raft: don't assume MsgSnap comes from leader

58a9f53

We can't consider MsgSnap to be from the leader of Message.Term until we address cockroachdb#127348 and cockroachdb#127349. Epic: None Release note: None

nvanbenschoten force-pushed the nvanbenschoten/raftAssert branch from eb5d56d to 58a9f53 Compare October 28, 2024 15:26

craig bot merged commit 9d63774 into cockroachdb:master Oct 28, 2024
22 of 23 checks passed

blathers-crl bot mentioned this pull request Oct 28, 2024

release-24.3: raft: assert on HardState field modifications #133581

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft: assert on HardState field modifications #133169

raft: assert on HardState field modifications #133169

nvanbenschoten commented Oct 22, 2024 •

edited

Loading

cockroach-teamcity commented Oct 22, 2024

pav-kv Oct 22, 2024

nvanbenschoten Oct 23, 2024

pav-kv Oct 23, 2024

nvanbenschoten Oct 25, 2024

pav-kv Oct 22, 2024 •

edited

Loading

nvanbenschoten Oct 23, 2024

pav-kv Oct 23, 2024

nvanbenschoten Oct 25, 2024

pav-kv Oct 23, 2024

nvanbenschoten commented Oct 25, 2024

nvanbenschoten commented Oct 28, 2024

nvanbenschoten commented Oct 28, 2024

craig bot commented Oct 28, 2024

blathers-crl bot commented Oct 28, 2024

nvanbenschoten commented Oct 28, 2024

raft: assert on HardState field modifications #133169

raft: assert on HardState field modifications #133169

Conversation

nvanbenschoten commented Oct 22, 2024 • edited Loading

cockroach-teamcity commented Oct 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pav-kv Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvanbenschoten commented Oct 25, 2024

nvanbenschoten commented Oct 28, 2024

nvanbenschoten commented Oct 28, 2024

craig bot commented Oct 28, 2024

blathers-crl bot commented Oct 28, 2024

nvanbenschoten commented Oct 28, 2024

nvanbenschoten commented Oct 22, 2024 •

edited

Loading

pav-kv Oct 22, 2024 •

edited

Loading