-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: consensus protocol design doc #100
base: master
Are you sure you want to change the base?
Conversation
Adds basic skeleton of the design doc
I like it! Looks good so far |
Co-Authored-By: Tim Satke <[email protected]>
…ensus-protocol-Design-Doc
Moved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about membership change i.e if node added or deleted ?
@TimSatke @Abby3017 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also check for whitespace errors in the lower part of the document.
Other than that, this is a design document, but instead of designing the system, it is basically just a checklist what needs to be done.
Please add an architecture. This means:
- include, which components will be implemented
- how will they interact with each other
- how will they fit into the existing system
- what will the APIs look like
Maybe even draft some Go interface types that show the intended usage of theraft
package.
Also important, but optional for this PR, specify the protobuf messages that need to be used. |
Looks like a lot of work here. So I'd better get into this before I even think of implementing. |
Co-authored-by: Tim Satke <[email protected]>
Co-authored-by: Tim Satke <[email protected]>
Co-authored-by: Tim Satke <[email protected]>
Co-authored-by: Tim Satke <[email protected]>
Currently, doesn't fall into the domain of what we want to implement. It's kind of an extended issue, we'll tackle that if and when needed. |
Following is how each issue is tackled:
As far as the API goes, I think it'll need more time. Let me know if I misinterpreted something or anything else needs to be added. |
* Security: Access control mechanisms need to be in place to decide on access to functions in the servers based on their state (leader, follower, candidate) | ||
* Routing to leader: One of the issues with a varying leader is for the clients to know which IP address to contact for the service. We can solve this problem by advertising any/all IPs of the cluster and simply forward this request to the current leader; OR have a proxy that can forward the request to the current leader wheneve the requests come in. (Section client interaction of post has another approach which works too) | ||
* Routing to leader: One of the issues with a varying leader is for the clients to know which IP address to contact for the service. We can solve this problem by advertising any/all IPs of the cluster and the client returns the IP of the leader if its not the leader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Any node knows all IPs and can answer that question. Why should follower nodes only respond with the leader IP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At any point of time when raft is in the "working phase" (which is the leader is up, no election is happenning), there are only 2 kinds of nodes; the leader and the follower. So if the client doesn't hit the leader, it means it hit the follower.
Co-authored-by: Tim Satke <[email protected]>
Co-authored-by: Tim Satke <[email protected]>
* A raft server is implemented as: | ||
``` | ||
type simpleServer struct { | ||
node *Node | ||
cluster Cluster | ||
onReplication ReplicationHandler | ||
log zerolog.Logger | ||
} | ||
|
||
type Node struct { | ||
State string | ||
|
||
PersistentState *PersistentState | ||
VolatileState *VolatileState | ||
VolatileStateLeader *VolatileStateLeader | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there implementation in a design doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if I add the API, they struct's would be a good reference to have, was my thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that you are writing a design doc, not an implementation guide. Such a concrete struct may have heavy implications on the rest of the implementation. The point of a design doc is, to just outline the components and how they interact, and leave the concrete implementation to the developer.
@@ -64,7 +138,7 @@ A detailed description of all the modules and their implementation follow: | |||
* A committed entry: When a leader decides that the log entry is safe to apply to other state machines, that entry is called committed. All committed entries are durable and _will eventually be executed_ by all state machines. | |||
* An entry -> Committed entry: A log entry is called committed once its replicated on the majority of the servers in the cluster. Once an entry is committed, it commits all the previous entries in the leaders log, including the entries created by the previous leaders. | |||
* The leader keeps track of the highest known index that it knows is committed and it is included in all the future `AppendEntriesRPC` (including heartbeats) to inform other servers. | |||
* Theres some issue about log committing - "A log entry is committed once the leader that createdthe entry has replicated it on a majority of the servers" and " Once a follower learns that a log entry is committed, it applies theentry to its local state machine (in log order)." are not clear whether replicating and applying to state machine are the same. If they are its kind of a contradiction, else "application" can mean executing the STMT in the DB in our case. | |||
* Theres some issue about log committing - "A log entry is committed once the leader that created the entry has replicated it on a majority of the servers" and " Once a follower learns that a log entry is committed, it applies theentry to its local state machine (in log order)." are not clear whether replicating and applying to state machine are the same. If they are its kind of a contradiction, else "application" can mean executing the STMT in the DB in our case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve the issue and add one or more solutions to this doc
The first step towards having a consensus protocol running.
This'll serve as a spec to be followed for implementation.
Updates to this doc will be done when needed to accommodate issues.
Closes #21