-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notify user code about follower loss #517
Notify user code about follower loss #517
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting PR. I put a few comments, please take a look.
include/libnuraft/callback.hxx
Outdated
* the peer is considered lost. | ||
* ctx: null. | ||
*/ | ||
FollowerLost = 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no other special reason, please use 28
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
include/libnuraft/peer.hxx
Outdated
bool IsLost() const { return lost_by_leader_; } | ||
void SetLost() { lost_by_leader_ = true; } | ||
void SetRecovered() { lost_by_leader_ = false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the naming convention: is_lost
, set_lost
, and set_recovered
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/raft_server.cxx
Outdated
const auto expiry = | ||
params->heart_beat_interval_ * raft_server::raft_limits_.response_limit_; | ||
std::vector<ptr<peer>> rs; | ||
for (const auto& [id, peer_ptr]: peers_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a c++17 feature, and this project is based on c++11. Please use const auto& p
and p.second
. Looks like id
is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my bad, sorry. Fixed.
src/raft_server.cxx
Outdated
@@ -837,6 +863,9 @@ void raft_server::handle_peer_resp(ptr<resp_msg>& resp, ptr<rpc_exception>& err) | |||
peer* pp = entry->second.get(); | |||
int rpc_errs = pp->get_rpc_errs(); | |||
if (rpc_errs >= raft_server::raft_limits_.warning_limit_) { | |||
if (pp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the flow enters here only when entry != peers.end()
, this if (pp)
is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't guarantee that a shared pointer has a value, does it? But I see that you don't check this at the line above (pp->get_rpc_errs()
), so ok, I removed this if (pp)
here as well
src/raft_server.cxx
Outdated
for (auto& peer : nr_peers_list) { | ||
if (peer->IsLost()) { | ||
continue; | ||
} | ||
peer->SetLost(); | ||
cb_func::Param param(id_, leader_, peer->get_id()); | ||
const auto rc = ctx_->cb_func_.call(cb_func::FollowerLost, ¶m); | ||
assert(rc == cb_func::ReturnCode::Ok); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use 4-space indent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -1085,7 +1114,7 @@ bool raft_server::check_leadership_validity() { | |||
int32 num_voting_members = get_num_voting_members(); | |||
|
|||
int leadership_expiry = get_leadership_expiry(); | |||
int32 nr_peers = (int32)get_not_responding_peers(); | |||
int32 nr_peers = (int32)get_not_responding_peers_count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since get_not_responding_peers
is also used in the same function, can we just call it once?
const auto nr_peers_list = get_not_responding_peers();
int32 nr_peers = nr_peers_list.size();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this point:
- on one hand, yes nothing prevents us from using
get_not_responding_peers()
once; - but on another hand, this way we can get an overhead: creating an
std::list
, filling it up withptr<peer>
s, returning it (though, RVO should work here) - and we need the list only in the case if the condition in theif
below istrue
((num_voting_members - nr_peers < min_quorum_size) == true
), in all other cases we don't need the list; I assume that in the majority of cases, in "normal" circumstances, that condition should befalse
But if you are ok with this overhead, plese let me know and I will change the code
Also, please pay attention, I have changed the code of the get_not_responding_peers_count
/get_not_responding_peers
functions. They contained a duplicated part of code, which could potentially lead to some bugs in the future (if someone change the code of one function, but not another). Now I extracted that common part into a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, your change looks good.
src/raft_server.cxx
Outdated
@@ -608,7 +608,25 @@ int32 raft_server::get_leadership_expiry() { | |||
return expiry; | |||
} | |||
|
|||
size_t raft_server::get_not_responding_peers() { | |||
std::vector<ptr<peer>> raft_server::get_not_responding_peers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me std::list<...>
is enough for the purpose of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed to std::list
@greensky00, thank you for the review. I have made the fixes, and regarding this comment #517 (comment) - need one more round of discussion. Please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, another couple of comments.
@@ -1085,7 +1114,7 @@ bool raft_server::check_leadership_validity() { | |||
int32 num_voting_members = get_num_voting_members(); | |||
|
|||
int leadership_expiry = get_leadership_expiry(); | |||
int32 nr_peers = (int32)get_not_responding_peers(); | |||
int32 nr_peers = (int32)get_not_responding_peers_count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, your change looks good.
include/libnuraft/peer.hxx
Outdated
@@ -498,6 +502,8 @@ private: | |||
*/ | |||
std::atomic<bool> abandoned_; | |||
|
|||
std::atomic<bool> lost_by_leader_ {false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment about this member variable.
Also, instead of brace initializer here, please put the default value into constructor, to be aligned with others.
src/raft_server.cxx
Outdated
@@ -837,6 +861,7 @@ void raft_server::handle_peer_resp(ptr<resp_msg>& resp, ptr<rpc_exception>& err) | |||
peer* pp = entry->second.get(); | |||
int rpc_errs = pp->get_rpc_errs(); | |||
if (rpc_errs >= raft_server::raft_limits_.warning_limit_) { | |||
pp->set_recovered(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_lost()
is invoked in two places:
handle_peer_resp
: based onraft_limits_.warning_limit_
, andcheck_leadership_validity
: fromget_not_responding_peers
which is based onraft_limits_.response_limit_
.
Although both limits have the same default value, they can be configured with different numbers. If warning_limit_ > response_limit_
, there can be a possibility that set_recovered()
may not be called, when a leader loses its role and then becomes leader later again.
Please confirm this. I think calling set_recovered
outside of this if-clause is ok. It is just setting an atomic variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, calling set_recovered
for all peers when a server becomes leader (in raft_server::become_leader
) is also an option. Leader may need a fresh start.
@apostolpav113 |
greensky00, sorry for delaying and thank you for making the changes! :-) I was going to start fixing today, and was surprised by the fact that PR had been already merged |
Hello, within this PR we suggest to introduce a new
cb_func::Type
enum item:FollowerLost
, and, on detection of follower loss, call the callback function with thetype
parameter equal toFollowerLost
.Pavel Yurin,
C++ developer
NXLog Ltd.
[email protected]
www.nxlog.co