Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: consensus protocol design doc #100

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

SUMUKHA-PK
Copy link
Collaborator

@SUMUKHA-PK SUMUKHA-PK commented Mar 29, 2020

The first step towards having a consensus protocol running.
This'll serve as a spec to be followed for implementation.
Updates to this doc will be done when needed to accommodate issues.

Closes #21

@tsatke
Copy link
Contributor

tsatke commented Mar 30, 2020

I like it! Looks good so far

@SUMUKHA-PK
Copy link
Collaborator Author

Moved.

Copy link
Collaborator

@Abby3017 Abby3017 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about membership change i.e if node added or deleted ?

@SUMUKHA-PK
Copy link
Collaborator Author

@TimSatke @Abby3017
Your review happened on the older version, I was still writing the doc.
Please do have a look now.

@SUMUKHA-PK SUMUKHA-PK marked this pull request as ready for review May 6, 2020 18:31
@SUMUKHA-PK SUMUKHA-PK requested a review from tomarrell as a code owner May 6, 2020 18:31
@SUMUKHA-PK SUMUKHA-PK requested review from tsatke and removed request for tomarrell May 7, 2020 05:35
Copy link
Contributor

@tsatke tsatke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check for whitespace errors in the lower part of the document.
Other than that, this is a design document, but instead of designing the system, it is basically just a checklist what needs to be done.
Please add an architecture. This means:

  • include, which components will be implemented
  • how will they interact with each other
  • how will they fit into the existing system
  • what will the APIs look like
    Maybe even draft some Go interface types that show the intended usage of the raft package.

doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
@tsatke
Copy link
Contributor

tsatke commented May 7, 2020

Also important, but optional for this PR, specify the protobuf messages that need to be used.
@SUMUKHA-PK you can do this in here if you like, otherwise, we can get in touch, so that I know what requirements are there, and what needs to be communicated.
A protocol draft would be nice however (what messages are sent (election, appendentries, initial handshake, goodbye, requests))

@SUMUKHA-PK
Copy link
Collaborator Author

Looks like a lot of work here.

So I'd better get into this before I even think of implementing.

@SUMUKHA-PK
Copy link
Collaborator Author

SUMUKHA-PK commented May 8, 2020

What about membership change i.e if node added or deleted ?

@Abby3017

Currently, doesn't fall into the domain of what we want to implement. It's kind of an extended issue, we'll tackle that if and when needed.

@SUMUKHA-PK
Copy link
Collaborator Author

Please also check for whitespace errors in the lower part of the document.
Other than that, this is a design document, but instead of designing the system, it is basically just a checklist what needs to be done.
Please add an architecture. This means:

* include, which components will be implemented

* how will they interact with each other

* how will they fit into the existing system

* what will the APIs look like
  Maybe even draft some Go interface types that show the intended usage of the `raft` package.

Following is how each issue is tackled:

  • The main modules represent the components that will be implemented.
  • A separate module interaction section is added.
  • They'll fit in as a separate module on internal dir; mentioned in the beginning.

As far as the API goes, I think it'll need more time.
I wish to begin an implementation and keep adding small patches to this because I think that'd be better.

Let me know if I misinterpreted something or anything else needs to be added.

* Security: Access control mechanisms need to be in place to decide on access to functions in the servers based on their state (leader, follower, candidate)
* Routing to leader: One of the issues with a varying leader is for the clients to know which IP address to contact for the service. We can solve this problem by advertising any/all IPs of the cluster and simply forward this request to the current leader; OR have a proxy that can forward the request to the current leader wheneve the requests come in. (Section client interaction of post has another approach which works too)
* Routing to leader: One of the issues with a varying leader is for the clients to know which IP address to contact for the service. We can solve this problem by advertising any/all IPs of the cluster and the client returns the IP of the leader if its not the leader.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Any node knows all IPs and can answer that question. Why should follower nodes only respond with the leader IP?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At any point of time when raft is in the "working phase" (which is the leader is up, no election is happenning), there are only 2 kinds of nodes; the leader and the follower. So if the client doesn't hit the leader, it means it hit the follower.

doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
doc/internal/consensus/Consensus-Protocol.md Outdated Show resolved Hide resolved
Comment on lines +23 to +39
* A raft server is implemented as:
```
type simpleServer struct {
node *Node
cluster Cluster
onReplication ReplicationHandler
log zerolog.Logger
}

type Node struct {
State string

PersistentState *PersistentState
VolatileState *VolatileState
VolatileStateLeader *VolatileStateLeader
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there implementation in a design doc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if I add the API, they struct's would be a good reference to have, was my thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that you are writing a design doc, not an implementation guide. Such a concrete struct may have heavy implications on the rest of the implementation. The point of a design doc is, to just outline the components and how they interact, and leave the concrete implementation to the developer.

@@ -64,7 +138,7 @@ A detailed description of all the modules and their implementation follow:
* A committed entry: When a leader decides that the log entry is safe to apply to other state machines, that entry is called committed. All committed entries are durable and _will eventually be executed_ by all state machines.
* An entry -> Committed entry: A log entry is called committed once its replicated on the majority of the servers in the cluster. Once an entry is committed, it commits all the previous entries in the leaders log, including the entries created by the previous leaders.
* The leader keeps track of the highest known index that it knows is committed and it is included in all the future `AppendEntriesRPC` (including heartbeats) to inform other servers.
* Theres some issue about log committing - "A log entry is committed once the leader that createdthe entry has replicated it on a majority of the servers" and " Once a follower learns that a log entry is committed, it applies theentry to its local state machine (in log order)." are not clear whether replicating and applying to state machine are the same. If they are its kind of a contradiction, else "application" can mean executing the STMT in the DB in our case.
* Theres some issue about log committing - "A log entry is committed once the leader that created the entry has replicated it on a majority of the servers" and " Once a follower learns that a log entry is committed, it applies theentry to its local state machine (in log order)." are not clear whether replicating and applying to state machine are the same. If they are its kind of a contradiction, else "application" can mean executing the STMT in the DB in our case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve the issue and add one or more solutions to this doc

@tsatke tsatke changed the title Consensus protocol design doc raft: consensus protocol design doc May 25, 2020
@tsatke tsatke added this to the v0.0.1 milestone May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

define consensus protocol
3 participants