-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raft problem #2483
Raft problem #2483
Conversation
I have a problem about the flow in above picture, in that, the leader must commit after committed by major, but the stage 1,2 doesn't. |
👍 , this is a known issue (long story). I'll review later. |
the leader could commit after it sends logs to major successfully. after that leader update its commit message to followers in the next request. Before transferring the commit message, commitId between leader and followers could be different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for contribution, some place need to have a discuss, see the comments.
src/kvstore/raftex/RaftPart.cpp
Outdated
req.get_log_str_list().end()); | ||
|
||
// may be need to rollback wal_ | ||
if (!( req.get_last_log_id_sent() == wal_->lastLogId() && req.get_last_log_term_sent() == wal_->lastLogTerm())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we didn't check the condition like in line 1551 req.get_last_log_id_sent() == lastLogId_ && req.get_last_log_term_sent() == lastLogTerm_
?
My point is, when it is true, we don't need to check term in each log according to the property of Log Matching
in paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, using
req.get_last_log_id_sent() == lastLogId_ && req.get_last_log_term_sent() == lastLogTerm_
to check is better, I will modify it later.
I agree that when it is true, it not need to check term in each log. that is why I add this condition check.
if (!( req.get_last_log_id_sent() == wal_->lastLogId() && req.get_last_log_term_sent() == wal_->lastLogTerm()))
In most case, above check is false, therefore we don not need check term of log one by one, just append the new log。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most case, above check is false, therefore we don not need check term of log one by one, just append the new log。
Agree.
The reason I think lastLogId_
and lastLogTerm_
will be better is that: If a host converts from follower -> leader -> follower again, in the leader phase, it could write some log in wal (although the case is rare).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i get it
TermID committedLogTerm = wal_->getLogTerm(committedLogId_); | ||
if (committedLogTerm > 0 ) { | ||
resp.set_last_log_id(committedLogId_); | ||
resp.set_last_log_term(committedLogTerm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could consider do not set_last_log_term
, IIRC, it is not used, so we don't nedd to wal_->getLogTerm(committedLogId_);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered it before, but if we don't set it,then the resp is: (commmitedLogId_,last_log_term).
when leader receive the response, it will send a new request with (commmitedLogId_,last_log_term) not (commmitedLogId_, commmitedLogIdTerm_), see self->lastLogTermSent_ = resp.get_last_log_term();
append log request with (commmitedLogId_,last_log_term) could not pass the check in the follower , which will causing a dead loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right.
Codecov Report
@@ Coverage Diff @@
## master #2483 +/- ##
==========================================
- Coverage 86.46% 86.23% -0.24%
==========================================
Files 649 647 -2
Lines 64374 67326 +2952
==========================================
+ Hits 55662 58059 +2397
- Misses 8712 9267 +555
Continue to review full report at Codecov.
|
I only ran CTest in the centos environment , it is ok, but not in ubuntu clang-9. I don't known why the error happend. |
src/kvstore/raftex/RaftPart.cpp
Outdated
size_t numLogs = req.get_log_str_list().size(); | ||
LogID firstId = req.get_last_log_id_sent() + 1; | ||
|
||
std::vector<cpp2::LogEntry> logEntries (req.get_log_str_list().begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try use std::make_move_iterator
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I have refined it
src/kvstore/raftex/RaftPart.cpp
Outdated
} | ||
|
||
// update msg | ||
logEntries = std::vector<cpp2::LogEntry> (req.get_log_str_list().begin() + diffIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be quite expensive as well, emm, can't figure out a better way for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refined it to avoid copy
OK, leave it alone. |
Is there anything else that needs to be updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late response, LGTM now. Thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@sherlockkenan Thanks for your contribution to the Nebula Graph community! This is Jamie with Nebula and I'd like to email you the Nebula Contributor certificate and ship you a mug to mark this special moment. Could you please kindly reach me via jamie.liu(at)vesoft.com? Again, thanks for being a part of the Nebula community! |
Hi,Jamie,I am please to accept this message
xuehan ke
… 在 2021年7月25日,上午11:26,jamie.liu ***@***.***> 写道:
@sherlockkenan Thanks for your contribution to the Nebula Graph community! This is Jamie with Nebula and I'd like to email you the Nebula Contributor certificate and ship you a mug to mark this special moment. Could you please kindly reach me via jamie.liu(at)vesoft.com? Again, thanks for being a part of the Nebula community!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
* Fix GraphSessionManager::addSessionCount() in multu-thread case * Fix compilation --------- Co-authored-by: Yichen Wang <[email protected]> Co-authored-by: Sophie <[email protected]>
What changes were proposed in this pull request?
####. fix inconsistency problem in raftpart appendlog method.
*Hi, when I read the implementation of raft in neblua, I found out that may exist a inconsistency problem. In the raftpart, if a follower receive logs from leader conflicting with itself, it will rollback the log to lastCommitID. such behavior may cause inconsistency commit logs.
here is a picture to show how it may happend :
I think that the follower could not deletes the log arbitrarily. because some logs may have been already committed.
Why are the changes needed?
I refine the implementation to follow the guildline in the paper of raft .
Will break the compatibility? How if so?
no
Does this PR introduce any user-facing change?
no
How was this patch tested?
Checklist