Get rid of pre-vote #15
drmingdrmer
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem
Isolated sub set of a cluster(e.g.,
{A, B}
) keeps increasing term and retrying election.When communication is restored, one of the isolated node(e.g., A) will take the leadership
from the running sub cluster(e.g.,
{C, D, E}
), with a higher term.Thus in an unstable network environment, the leader changes very frequently.
Raft solution: pre-vote
Raft solves this problem by introducing a
pre-vote
RPC:A candidate tries to communicate with a quorum of replicas to see if there is a living
leader, without modifying the state of other replicas, before voting itself.
Our solution: without pre-vote.
Our solution would be simpler:
Just do not increase the term of a candidate unless it has to.
Without pre-vote,
handle_vote_request()
behaves as below:granting a vote
According to raft spec, a node reject a vote if any one of the following
conditions is met:
reject if
req.term < local.term
.reject if
req.term == local.term && req.leader != local.leader
i.e., a node has voted for other leader.
reject if
req.effective_leader != local.effective_leader
i.e., there is still a valid leader that has not yet timed out:
in other word, an heartbeat has received in the past interval.
reject if
rea.log_id < local.log_id
.These conditions implies an partially ordered data structure:
To simplify these conditions
We defines an
order
foreffective_leader
andleader
Partial order
Leader
The field
leader
andeffective_leader
can be considered as aset
, define:The
order
of twoLeader
is defined by set containment:a > b ↔ a ⊃ b
Partial order
Vote
If we capsulate fields into a struct
Vote
, it is obvious that it is also partial order and the order is avector-order
.∴ A node grants a vote iff
request.vote >= local.vote
and we compareVote
in a vector order,i.e., given X = [x₁, x₂ ... ] and Y = [y₁, y₂ ... ],
X >= Y ↔ ∀i xᵢ >= yᵢ
Pseudo code of voting
Example
With the above isolated network:
{A, B} | {C, D, E}
, whileC
is the leader,what will happen is described as below:
A
becomes aCandidate
, start election with term=10.At that time,
A
did not increase term yet.A
send vote request toB
, whileB.effective_leader
is not timed out,A.vote = [(10, {A}), {A}]
is not greater thanB.vote = [(10, {C}), {C}]
.A.vote > B.vote
does not hold, thusA.vote
is rejected.A
will revert to aFollower
.When
B.effective_leader
timed out,B.effective_leader
becomes an empty set{}
.In the next round of voting,
A
will find thatA.vote = [(11, {A}), {A}]
is greater thanB.vote = [(10, {C}), {}]
.But
A
did not receive enough responses(at least 2),A
does not know if there is another valid leader or not.Thus
A
does not increase its term.When the network isolation is restored,
A send
A.vote = [(10, {A}), {A}]
toC
,C
will rejectA.vote
if is still the leader, i.e.,C.effective_leader = {C}
.A
will revert to aFollower
.Beta Was this translation helpful? Give feedback.
All reactions