Skip to content

Commit

Permalink
base: enable Raft CheckQuorum by default
Browse files Browse the repository at this point in the history
This patch enables Raft CheckQuorum by default. In etcd/raft, this also
has the effect of fully enabling PreVote, such that followers won't
grant prevotes if they've heard from a leader in the past election
timeout interval.

This is more robust against partial and asymmetric network partitions.
Otherwise, a partitioned node may be able to hold spurious elections and
steal leadership away from an established leader. This can cause the
leader to become unreachable by the leaseholder, resulting in permanent
range unavailability.

We are still able to hold immediate elections, e.g. when unquiescing a
range to find a dead leader. If a quorum of followers consider the
leader dead and forget it (becoming leaderless followers), they will
grant prevotes despite having seen the leader recently (i.e. before
quiescing), and can hold an election immediately.

This is compatibile with 23.1 in mixed-version clusters:

* Leaders with mixed `CheckQuorum` settings is fine: they only apply
  the step-down logic to themselves, and register follower activity
  regardless of the followers' settings.

* Voters with mixed `CheckQuorum` settings if fine: the leader recency
  criterion is only applied to their own vote, so either they'll
  enforce it or not.

* Campaigning on leader removal is fine-ish: before 23.2 finalization,
  the first range replica will campaign -- if this replica is 23.2 it will
  bypass pre-vote and call an immediate election, if it is 23.1 then it
  will use pre-vote. However, upon receiving the 23.1 pre-vote request,
  23.2 nodes will check if the leader is still in the descriptor, and if
  it isn't they will forget it and grant the pre-vote. A quorum will
  likely apply the leader removal before receiving pre-vote requests.
  Otherwise, we will recover after an election timeout.

* Campaigning after unquiescing is fine: the logic remains unchanged,
  and 23.2 nodes will forget the leader and grant prevotes if they
  find the leader dead according to liveness.

* Campaigning during lease acquisitions is fine: this is needed to
  steal leadership away from an active leader that can't itself acquire
  an epoch lease because it's failing liveness heartbeats. If a 23.2 node
  also finds the leader dead in liveness, it will forget it and grant
  the prevote.

Epic: none
Release note (bug fix): The Raft PreVote and CheckQuorum mechanisms are
now fully enabled. These prevent spurious elections when followers
already have an active leader, and cause leaders to step down if they
don't hear back from a quorum of followers. This improves reliability
under partial and asymmetric network partitions, by avoiding spurious
elections and preventing unavailability where a partially partitioned
node could steal leadership away from an established leaseholder who
would then no longer be able to reach the leader and submit writes.
  • Loading branch information
erikgrinaker committed Jun 27, 2023
1 parent dca6ef2 commit de9b2b2
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 8 deletions.
8 changes: 1 addition & 7 deletions pkg/base/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -266,14 +266,8 @@ var (
// etcd/raft does register MsgHeartbeatResp as follower activity, and these
// are sent across the low-latency system RPC class, so it shouldn't be
// affected by RPC head-of-line blocking for MsgApp traffic.
//
// Note that time will not appear to progress on a quiesced range, so when
// unquiescing it may take some time before a leader steps down or a candidate
// is able to obtain prevotes.
//
// For now, we disable this by default, due to liveness concerns.
defaultRaftEnableCheckQuorum = envutil.EnvOrDefaultBool(
"COCKROACH_RAFT_ENABLE_CHECKQUORUM", false)
"COCKROACH_RAFT_ENABLE_CHECKQUORUM", true)

// defaultRaftLogTruncationThreshold specifies the upper bound that a single
// Range's Raft log can grow to before log truncations are triggered while at
Expand Down
2 changes: 1 addition & 1 deletion pkg/base/testdata/raft_config
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ echo
RaftHeartbeatIntervalTicks: (int) 2,
RangeLeaseDuration: (time.Duration) 6s,
RangeLeaseRenewalFraction: (float64) 0.5,
RaftEnableCheckQuorum: (bool) false,
RaftEnableCheckQuorum: (bool) true,
RaftLogTruncationThreshold: (int64) 16777216,
RaftProposalQuota: (int64) 8388608,
RaftMaxUncommittedEntriesSize: (uint64) 16777216,
Expand Down

0 comments on commit de9b2b2

Please sign in to comment.