-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Second Solution] Fix the potential data loss for clusters with only one member (simpler solution) #14400
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For a cluster with only one member, the raft always send identical unstable entries and committed entries to etcdserver, and etcd responds to the client once it finishes (actually partially) the applying workflow. When the client receives the response, it doesn't mean etcd has already successfully saved the data, including BoltDB and WAL, because: 1. etcd commits the boltDB transaction periodically instead of on each request; 2. etcd saves WAL entries in parallel with applying the committed entries. Accordingly, it may run into a situation of data loss when the etcd crashes immediately after responding to the client and before the boltDB and WAL successfully save the data to disk. Note that this issue can only happen for clusters with only one member. For clusters with multiple members, it isn't an issue, because etcd will not commit & apply the data before it being replicated to majority members. When the client receives the response, it means the data must have been applied. It further means the data must have been committed. Note: for clusters with multiple members, the raft will never send identical unstable entries and committed entries to etcdserver. Signed-off-by: Benjamin Wang <[email protected]>
ahrtr
force-pushed
the
one_member_data_loss_raft
branch
from
August 30, 2022 07:29
b5c5455
to
2a10049
Compare
ahrtr
changed the title
Fix the potential data loss for clusters with only one member (Second solution)
Fix the potential data loss for clusters with only one member (simpler solution)
Aug 30, 2022
I suggest to cherry pick this PR or #14394 to 3.5 and 3.4. We can continue to enhance the raft package implementation only in |
ahrtr
changed the title
Fix the potential data loss for clusters with only one member (simpler solution)
[Second Solution] Fix the potential data loss for clusters with only one member (simpler solution)
Aug 31, 2022
Closed
ptabor
approved these changes
Sep 2, 2022
serathius
approved these changes
Sep 5, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the best intermediate solution for etcdserver as proposed in #14370 (comment)
This was referenced Sep 5, 2022
ahrtr
added a commit
to ahrtr/etcd
that referenced
this pull request
Sep 22, 2022
… node cluster Since the raft side change has been merged, so we need to revert the etcdserver side change. Refer to etcd-io#14413 etcd-io#14400 Signed-off-by: Benjamin Wang <[email protected]>
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Second solution to fix #14370
This solution is based on the following feedback,
I compared the performance between this PR and #14394 for one-member cluster , overall #14394 is a little better than this one (about 2.7% higher than this one). But this PR is much simpler; excluding the test and comment, this PR only has about 20 lines of code change.
cc @serathius @spzala @ptabor @liggitt @dims