#SWIM
This is an attempt to implement SWIM in GO.
I recommend you to read the paper in References sector first, i don't think i can explain better than the paper itself.
Here is the TL;DR version:
In distributed computing, a failure detector is an application or a subsystem that is responsible for detection of node failures or crashes in a distributed system.
A Traditional way to implement it is using heartbeat protocol: which either impose network loads that grow quadratically with group size, or compromise response times or false positive frequency w.r.t. detecting process crashes
The new system, called SWIM, provides a membership substrate that:
- imposes a constant message load per group member;
- detects a process failure in an (expected) constant time at some non-faulty process in the group;
- provides a deterministic bound (as a function of group size) on the local time that a non-faulty process takes to detect failure of another process;
- propagates membership updates, including information about failures, in infection-style (also gossip-style or epidemic-style); the dissemination latency in the group grows slowly (logarithmically) with the number of members;
- provides a mechanism to reduce the rate of false positives by “suspecting” a process before “declaring” it as failed within the group
SWIM has two components: (1) a Failure Detector Component, that detects failures of members, and (2) a Dissemination Component, that disseminates information about members that have recently either joined or left the group, or failed.
Given we have a cluster of n nodes. Each node has information about m nodes in the cluster (m <= n).
Every T' time unit (which called a protocol period), at node Mi
-
Increase the period
pr
-
Select a random node in m nodes, called it Mj and ping it
ping(Mi, Mj , pr)
. Wait for the worst-case message round-trip for anack(Mi, Mj , pr)
.- If the ack message come back, that means Mj is still alive.
- If not, move to step 3
-
Select
k
nodes in m nodes, ask them to pingMj
ping-req(Mi, Mj , pr)
- If one of them receive the ack message
ack(Mi, Mj , pr)
, the node is still alive - If no one receive the ack until the end of the period, declared Mj as failed.
- If one of them receive the ack message
-
To reduce the false positive in step 3, instead of mark
Mj
as failed immediately, we will mark it assuspected
. After a prespecified time-out, it will be declared asfailed
. But if it response within the timeout, it will be declared asalive
again.
As any given time, at node Mi
On receipt of ping-req(Mm, Mj , pr)
message (Mj
!= Mi
), send a ping(Mi, Mj , Mm, pr)
message to Mj
On receipt of ack(Mi, Mj , Mm, pr)
message from Mj
, send an ack(Mm, Mj , pr)
message to received to Mm
On receipt of ping(Mm, Mi, Ml, pr)
message from Mm
, reply with an ack(Mm, Mi, Ml, pr)
message to Mm
On receipt of ping(Mm, Mi, pr)
message from Mm
, reply with an ack(Mm, Mi, pr)
message to Mm