Get rid of pre-vote #15
drmingdrmer
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Isolated sub set of a cluster(e.g.,
{A, B}
) keeps increasing term and retrying election.When communication is restored, one of the isolated node(e.g., A) will take the leadership
from the running sub cluster(e.g.,
{C, D, E}
), with a higher term.Thus in an unstable network environment, the leader changes very frequently.
Raft solution: pre-vote
Raft solves this problem by introducing a
pre-vote
RPC:A candidate tries to communicate with a quorum of replicas to see if there is a living
leader, without modifying the state of other replicas, before voting itself.
Our solution: without pre-vote.
Our solution would be simpler:
Just do not increase the term of a candidate unless it has to.
Without pre-vote,
handle_vote_request()
behaves as below:granting a vote
According to raft spec, a node reject a vote if any one of the following
conditions is met:
reject if
req.term < local.term
.reject if
req.term == local.term && req.leader != local.leader
i.e., a node has voted for other leader.
reject if
req.effective_leader != local.effective_leader
i.e., there is still a valid leader that has not yet timed out:
in other word, an heartbeat has received in the past interval.
reject if
rea.log_id < local.log_id
.These conditions implies an partially ordered data structure:
To simplify these conditions
We defines an
order
foreffective_leader
andleader
Partial order
Leader
The field
leader
andeffective_leader
can be considered as aset
, define:The
order
of twoLeader
is defined by set containment:a > b ↔ a ⊃ b
Partial order
Vote
If we capsulate fields into a struct
Vote
, it is obvious that it is also partial order and the order is avector-order
.∴ A node grants a vote iff
request.vote >= local.vote
and we compareVote
in a vector order,i.e., given X = [x₁, x₂ ... ] and Y = [y₁, y₂ ... ],
X >= Y ↔ ∀i xᵢ >= yᵢ
Pseudo code of voting
Example
With the above isolated network:
{A, B} | {C, D, E}
, whileC
is the leader,what will happen is described as below:
A
becomes aCandidate
, start election with term=10.At that time,
A
did not increase term yet.A
send vote request toB
, whileB.effective_leader
is not timed out,A.vote = [(10, {A}), {A}]
is not greater thanB.vote = [(10, {C}), {C}]
.A.vote > B.vote
does not hold, thusA.vote
is rejected.A
will revert to aFollower
.When
B.effective_leader
timed out,B.effective_leader
becomes an empty set{}
.In the next round of voting,
A
will find thatA.vote = [(11, {A}), {A}]
is greater thanB.vote = [(10, {C}), {}]
.But
A
did not receive enough responses(at least 2),A
does not know if there is another valid leader or not.Thus
A
does not increase its term.When the network isolation is restored,
A send
A.vote = [(10, {A}), {A}]
toC
,C
will rejectA.vote
if is still the leader, i.e.,C.effective_leader = {C}
.A
will revert to aFollower
.Beta Was this translation helpful? Give feedback.
All reactions