Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: ProposeConfChange/ApplyConfChange doesn't match thesis #2397

Closed
bdarnell opened this issue Feb 27, 2015 · 4 comments · Fixed by #2431
Closed

raft: ProposeConfChange/ApplyConfChange doesn't match thesis #2397

bdarnell opened this issue Feb 27, 2015 · 4 comments · Fixed by #2431

Comments

@bdarnell
Copy link
Contributor

Configuration changes do not currently work as described in the thesis (pages 35-36): we call raft.addNode() as a result of ApplyConfChange, when it should be done as soon as the change is proposed (or when we scan the tail of the log in becomeLeader). If this is safe for some other reason (I know @xiang90 understands this issue well), it should be documented.

@xiang90
Copy link
Contributor

xiang90 commented Feb 28, 2015

@bdarnell There are main two differences:

  1. pre-check if the new node can catch up or not.
  2. apply the configuration change before committing it.

For 1, I think it can be done outside raft. We plan to restore the snapshot and some of the committed log entries before the raft node actually starts.

For 2, the only difference I can see is when removing the node it might require a smaller quorum and when adding it might require a larger quorum.

If we apply the configuration change after commit, then that entry can still follow the normal path and does not require extra care. But it introduces a problem when you try to remove a member from a two members cluster. If one of the members dies, then the member cannot be removed since the cluster cannot make progress.

I think the safety guarantees remain the same.

@bdarnell
Copy link
Contributor Author

For 1, I agree this can and should be done outside of raft.

My main concern is that the thesis specifically says that the change should take affect as soon as it is appended, so I just want to understand whether we're risking anything by doing it differently. I think the rule that no membership change can be proposed while another is pending covers every case I can think of; I haven't been able to poke a hole in it yet.

One difference in multiraft is that we send committed entries into a channel to be applied in another thread, while in etcd (I think) you apply everything before consuming the ready channel again. This means that multiraft may acknowledge some entries before older committed entries can be applied. Is it expected that all committed entries are applied before the next entries and messages are processed, or is it OK to do this asynchronously as we are?

@xiang90
Copy link
Contributor

xiang90 commented Mar 2, 2015

One difference in multiraft is that we send committed entries into a channel to be applied in another thread, while in etcd (I think) you apply everything before consuming the ready channel again.

That is not true in the future. etcd needs to apply in another thread too: #2117

Is it expected that all committed entries are applied before the next entries and messages are processed

No.

or is it OK to do this asynchronously as we are?

You must ensure the apply routine applies the entries as the same order they were committed.

The only difference between "apply configuration before commit it" and "apply configuration after commit it" is that the proposals between the real time the leader receives the configuration and the real time the leader commits the configuration are committed by the new configuration or the old configuration. (sorry about my english...) They are both safe IMO since the new configuration and the old configuration must be overlapped.

@xiang90
Copy link
Contributor

xiang90 commented Mar 2, 2015

@bdarnell It would be great if you can spend some time to write a doc for this and add it into /raft/doc.go.

bdarnell added a commit to bdarnell/etcd that referenced this issue Mar 4, 2015
Includes more details on the required caller behavior and the safety of
membership changes.

Closes etcd-io#2397
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants