-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add active hosts manager for balance in the future. #304
Conversation
Unit testing passed. |
folly::RWSpinLock::ReadHolder rh(&lock_); | ||
hosts.resize(hostsMap_.size()); | ||
std::transform(hostsMap_.begin(), hostsMap_.end(), hosts.begin(), | ||
[](const auto& entry) -> decltype(auto) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know decltype(auto)
here will be deduced to a const reference, don't you? It that on purpose? I noticed the value would be copied anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is OK for const reference due to i want to copy the content.
src/meta/ActiveHostsMan.h
Outdated
return !(*this == that); | ||
} | ||
|
||
int64_t lastHeartBeatTimeSeconds_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a time point. So either use lastHeartBeatTimeInSeconds_
or lastHeartBeatTimeSecond_
. The current name make me think it's a duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming it lastHBTimeInSec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
src/meta/test/ActiveHostsManTest.cpp
Outdated
ASSERT_EQ(3, ahMan.getActiveHosts().size()); | ||
ahMan.updateHostInfo(HostAddr(0, 0), HostInfo(now + 2)); | ||
ASSERT_EQ(3, ahMan.getActiveHosts().size()); | ||
ASSERT_EQ(HostInfo(now + 2), ahMan.hostsMap_[HostAddr(0, 0)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are chances to crash due to race condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are chances to crash due to race condition.
I'm confused, please give me some tips. It is executed in sequence, so why is there a race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
src/meta/test/ActiveHostsManTest.cpp
Outdated
|
||
sleep(3); | ||
ASSERT_EQ(1, ahMan.getActiveHosts().size()); | ||
ASSERT_EQ(HostInfo(now + 2), ahMan.hostsMap_[HostAddr(0, 0)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
namespace nebula { | ||
namespace meta { | ||
|
||
struct HostInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we come up with a more specific name? I think this one is a little generic for a exposed name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the data structure only holds the heart beat from the host, how about renaming it to HostHB
, HostPulse
, or HostHeartbeat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure will be used to collect more information about each host, such as disk usage, cpu idle, memory load etc.
ASSERT_EQ(3, ahMan.getActiveHosts().size()); | ||
ASSERT_EQ(HostInfo(now + 2), ahMan.hostsMap_[HostAddr(0, 0)]); | ||
|
||
sleep(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I were correct, your interesting host would be expired out in 3 seconds. If so, you better to wait longer than 3s to make the case stable enough, like 3s + 100ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hosts would be expired in 2 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the timer might be triggered at the 3rd second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timer will be triggered after the first second and run at a 1-second interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this. Great job! 👍
I have some inline comments
@@ -0,0 +1,100 @@ | |||
/* Copyright (c) 2018 - present, VE Software Inc. All rights reserved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The company name is vesoft inc
, not VE Software Inc
. I think we need to go through the copyright declarations in all files and make a batch correction before releasing the code to the public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let's leave it as it is now :-)
src/meta/ActiveHostsMan.h
Outdated
return !(*this == that); | ||
} | ||
|
||
int64_t lastHeartBeatTimeSeconds_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming it lastHBTimeInSec
?
namespace nebula { | ||
namespace meta { | ||
|
||
struct HostInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the data structure only holds the heart beat from the host, how about renaming it to HostHB
, HostPulse
, or HostHeartbeat
?
explicit HostInfo(int64_t lastHeartBeatTimeSeconds) | ||
: lastHeartBeatTimeSeconds_(lastHeartBeatTimeSeconds) {} | ||
|
||
bool operator==(const HostInfo& that) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this
and that
refers to different hosts, and their last HB times happen to be the same. Do we consider them same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And we should NOT use the structure to check different host Information.
src/meta/ActiveHostsMan.h
Outdated
CHECK(checkThread_.start()); | ||
checkThread_.addTimerTask(intervalSeconds * 1000, | ||
intervalSeconds * 1000, | ||
&ActiveHostsMan::cleanExpiredHosts, this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Put this
on the next line to keep one parameter per line
checkThread_.wait(); | ||
} | ||
|
||
void updateHostInfo(HostAddr hostAddr, HostInfo info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateHostHeartbeat(...)
probably makes more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HostInfo not only contains heartBeat as talked above.
std::vector<HostAddr> hosts; | ||
folly::RWSpinLock::ReadHolder rh(&lock_); | ||
hosts.resize(hostsMap_.size()); | ||
std::transform(hostsMap_.begin(), hostsMap_.end(), hosts.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::transform()
seems an overkill here. Why not just a simple range for
for (const auto& h : hostsMap_) {
hosts.emplace_back(h.first);
}
Clean and easily understandable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, we could benefit from the compiler optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually they are the same thing.
void cleanExpiredHosts() { | ||
int64_t now = time::TimeUtils::nowInSeconds(); | ||
folly::RWSpinLock::WriteHolder rh(&lock_); | ||
auto it = hostsMap_.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Range for!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With no need for forcing our developers to use one loop style in our project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, we have to use iterator explicityly due to the erase operation.
|
||
protected: | ||
void cleanExpiredHosts() { | ||
int64_t now = time::TimeUtils::nowInSeconds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you really care about is the time duration, so please use Duration
class
In general, unless we have to get the time point (I think it's a very rare case), we should always use Duration
, which is monolithic and way faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think use Duration is more convenient here.
src/meta/ActiveHostsMan.h
Outdated
std::unordered_map<HostAddr, HostInfo> hostsMap_; | ||
thread::GenericWorker checkThread_; | ||
int32_t intervalSeconds_ = 0; | ||
int32_t expiredSeconds_ = 5 * 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think expirationInSeconds_
is more understandable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense
Unit testing failed. |
Jenkins go |
Unit testing passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There you go!
Unit testing passed. |
Unit testing passed. |
* Add active hosts manager for balance in the future. * Address dutor's and sherman's comments
* Fix bashism * Report Gherkin diff * Add PHONY target for tests check-and-diff * Improve clang-format-diff handling Use GITHUB_BASE_REF because a PR can easily have more than one commit: https://docs.github.com/en/actions/learn-github-actions/environment-variables#default-environment-variables Switch CLANG_HOME handling to support debian/ubuntu installed clang-format or a user's installation at some other location. Install clang-format-10 if it isn't installed in CI (this is mostly for nektos/act where images might be thinner than the GitHub standard). * Get the base commit In order to see what changes this PR is making, we need to get the base commit. * Use actions/checkout to get base and head Co-authored-by: Josh Soref <[email protected]> Co-authored-by: Yee <[email protected]> Co-authored-by: kyle.cao <[email protected]> Co-authored-by: Josh Soref <[email protected]> Co-authored-by: Josh Soref <[email protected]> Co-authored-by: Yee <[email protected]> Co-authored-by: kyle.cao <[email protected]>
subtask of #283