Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add active hosts manager for balance in the future. #304

Merged
merged 4 commits into from
May 7, 2019
Merged

Add active hosts manager for balance in the future. #304

merged 4 commits into from
May 7, 2019

Conversation

dangleptr
Copy link
Contributor

@dangleptr dangleptr commented Apr 25, 2019

subtask of #283

@nebula-community-bot
Copy link
Member

Unit testing passed.

folly::RWSpinLock::ReadHolder rh(&lock_);
hosts.resize(hostsMap_.size());
std::transform(hostsMap_.begin(), hostsMap_.end(), hosts.begin(),
[](const auto& entry) -> decltype(auto) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know decltype(auto) here will be deduced to a const reference, don't you? It that on purpose? I noticed the value would be copied anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK for const reference due to i want to copy the content.

return !(*this == that);
}

int64_t lastHeartBeatTimeSeconds_ = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a time point. So either use lastHeartBeatTimeInSeconds_ or lastHeartBeatTimeSecond_. The current name make me think it's a duration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming it lastHBTimeInSec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

ASSERT_EQ(3, ahMan.getActiveHosts().size());
ahMan.updateHostInfo(HostAddr(0, 0), HostInfo(now + 2));
ASSERT_EQ(3, ahMan.getActiveHosts().size());
ASSERT_EQ(HostInfo(now + 2), ahMan.hostsMap_[HostAddr(0, 0)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are chances to crash due to race condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are chances to crash due to race condition.
I'm confused, please give me some tips. It is executed in sequence, so why is there a race condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch


sleep(3);
ASSERT_EQ(1, ahMan.getActiveHosts().size());
ASSERT_EQ(HostInfo(now + 2), ahMan.hostsMap_[HostAddr(0, 0)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

namespace nebula {
namespace meta {

struct HostInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we come up with a more specific name? I think this one is a little generic for a exposed name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the data structure only holds the heart beat from the host, how about renaming it to HostHB, HostPulse, or HostHeartbeat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure will be used to collect more information about each host, such as disk usage, cpu idle, memory load etc.

ASSERT_EQ(3, ahMan.getActiveHosts().size());
ASSERT_EQ(HostInfo(now + 2), ahMan.hostsMap_[HostAddr(0, 0)]);

sleep(3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were correct, your interesting host would be expired out in 3 seconds. If so, you better to wait longer than 3s to make the case stable enough, like 3s + 100ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hosts would be expired in 2 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the timer might be triggered at the 3rd second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timer will be triggered after the first second and run at a 1-second interval.

Copy link
Member

@sherman-the-tank sherman-the-tank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this. Great job! 👍

I have some inline comments

@@ -0,0 +1,100 @@
/* Copyright (c) 2018 - present, VE Software Inc. All rights reserved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The company name is vesoft inc, not VE Software Inc. I think we need to go through the copyright declarations in all files and make a batch correction before releasing the code to the public

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's leave it as it is now :-)

return !(*this == that);
}

int64_t lastHeartBeatTimeSeconds_ = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming it lastHBTimeInSec?

namespace nebula {
namespace meta {

struct HostInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the data structure only holds the heart beat from the host, how about renaming it to HostHB, HostPulse, or HostHeartbeat?

explicit HostInfo(int64_t lastHeartBeatTimeSeconds)
: lastHeartBeatTimeSeconds_(lastHeartBeatTimeSeconds) {}

bool operator==(const HostInfo& that) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this and that refers to different hosts, and their last HB times happen to be the same. Do we consider them same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And we should NOT use the structure to check different host Information.

CHECK(checkThread_.start());
checkThread_.addTimerTask(intervalSeconds * 1000,
intervalSeconds * 1000,
&ActiveHostsMan::cleanExpiredHosts, this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Put this on the next line to keep one parameter per line

checkThread_.wait();
}

void updateHostInfo(HostAddr hostAddr, HostInfo info) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateHostHeartbeat(...) probably makes more sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HostInfo not only contains heartBeat as talked above.

std::vector<HostAddr> hosts;
folly::RWSpinLock::ReadHolder rh(&lock_);
hosts.resize(hostsMap_.size());
std::transform(hostsMap_.begin(), hostsMap_.end(), hosts.begin(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::transform() seems an overkill here. Why not just a simple range for

for (const auto& h : hostsMap_) {
    hosts.emplace_back(h.first);
}

Clean and easily understandable!

Copy link
Member

@sherman-the-tank sherman-the-tank Apr 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we could benefit from the compiler optimization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually they are the same thing.

void cleanExpiredHosts() {
int64_t now = time::TimeUtils::nowInSeconds();
folly::RWSpinLock::WriteHolder rh(&lock_);
auto it = hostsMap_.begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range for!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With no need for forcing our developers to use one loop style in our project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, we have to use iterator explicityly due to the erase operation.


protected:
void cleanExpiredHosts() {
int64_t now = time::TimeUtils::nowInSeconds();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you really care about is the time duration, so please use Duration class

In general, unless we have to get the time point (I think it's a very rare case), we should always use Duration, which is monolithic and way faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think use Duration is more convenient here.

std::unordered_map<HostAddr, HostInfo> hostsMap_;
thread::GenericWorker checkThread_;
int32_t intervalSeconds_ = 0;
int32_t expiredSeconds_ = 5 * 60;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think expirationInSeconds_ is more understandable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense

@nebula-community-bot
Copy link
Member

Unit testing failed.

@dangleptr
Copy link
Contributor Author

Jenkins go

@nebula-community-bot
Copy link
Member

Unit testing passed.

Copy link
Contributor

@dutor dutor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There you go!

@nebula-community-bot
Copy link
Member

Unit testing passed.

@nebula-community-bot
Copy link
Member

Unit testing passed.

@dangleptr dangleptr merged commit 3200fd8 into vesoft-inc:master May 7, 2019
@dangleptr dangleptr deleted the hb branch May 7, 2019 07:37
tong-hao pushed a commit to tong-hao/nebula that referenced this pull request Jun 1, 2021
* Add active hosts manager for balance in the future.

* Address dutor's and sherman's comments
yixinglu added a commit to yixinglu/nebula that referenced this pull request Mar 21, 2022
* Fix bashism

* Report Gherkin diff

* Add PHONY target for tests check-and-diff

* Improve clang-format-diff handling

Use GITHUB_BASE_REF because a PR can easily have more than one commit:
https://docs.github.com/en/actions/learn-github-actions/environment-variables#default-environment-variables

Switch CLANG_HOME handling to support debian/ubuntu installed clang-format
or a user's installation at some other location.

Install clang-format-10 if it isn't installed in CI (this is mostly
for nektos/act where images might be thinner than the GitHub standard).

* Get the base commit

In order to see what changes this PR is making,
we need to get the base commit.

* Use actions/checkout to get base and head

Co-authored-by: Josh Soref <[email protected]>
Co-authored-by: Yee <[email protected]>
Co-authored-by: kyle.cao <[email protected]>

Co-authored-by: Josh Soref <[email protected]>
Co-authored-by: Josh Soref <[email protected]>
Co-authored-by: Yee <[email protected]>
Co-authored-by: kyle.cao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants