-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opt](fqdn) Add DNS Cache for FE and BE #32869
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
return Status::OK(); | ||
} | ||
|
||
void DNSCache::_refresh_cache() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method '_refresh_cache' can be made const [readability-make-member-function-const]
void DNSCache::_refresh_cache() { | |
void DNSCache::_refresh_cache() const { |
be/src/util/dns_cache.h:41:
- void _refresh_cache();
+ void _refresh_cache() const;
~DNSCache(); | ||
|
||
// get ip by hostname | ||
Status get(const std::string& hostname, std::string* ip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: unknown type name 'Status' [clang-diagnostic-error]
Status get(const std::string& hostname, std::string* ip);
^
|
||
private: | ||
// update the ip of hostname in cache | ||
Status _update(const std::string& hostname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: unknown type name 'Status' [clang-diagnostic-error]
Status _update(const std::string& hostname);
^
// update cache at fix internal | ||
void _refresh_cache(); | ||
|
||
private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]
private: |
Additional context
be/src/util/dns_cache.h:35: previously declared here
private:
^
run buildall |
1 similar comment
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
// update cache at fix internal | ||
void _refresh_cache(); | ||
|
||
private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]
private: |
Additional context
be/src/util/dns_cache.h:40: previously declared here
private:
^
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client. So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname. This PR mainly changes: 1. Add DNSCache for both FE and BE. The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP. Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it and update the cache. In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may be changed at anytime. There are other implements of this dns cache: 1. kaka11chen@36fed13 This is for BE side, but it does not handle the IP change case. 3. apache#28479 This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change. And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
|
||
DNSCache::DNSCache() { | ||
refresh_thread = std::thread(&DNSCache::_refresh_cache, this); | ||
refresh_thread.detach(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the detached thread can not be join by main thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it
|
||
DNSCache::~DNSCache() { | ||
stop_refresh = true; | ||
if (refresh_thread.joinable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not joinable, so DNECache will be destoried immediately
std::unordered_set<std::string> keys; | ||
{ | ||
std::shared_lock<std::shared_mutex> lock(mutex); | ||
std::transform(cache.begin(), cache.end(), std::inserter(keys, keys.end()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is very likely that object cache has already been destroyed, if main thread has already exited with --grace enabled.
void DNSCache::_refresh_cache() { | ||
while (!stop_refresh) { | ||
// refresh every 1 min | ||
std::this_thread::sleep_for(std::chrono::minutes(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, after _refresh_cache
thread is joinable, when we stop BE with grace enabled, we will wait at least 1 min ...
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client. So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname. This PR mainly changes: 1. Add DNSCache for both FE and BE. The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP. Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it and update the cache. In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may be changed at anytime. There are other implements of this dns cache: 1. kaka11chen@36fed13 This is for BE side, but it does not handle the IP change case. 3. apache#28479 This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change. And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
Proposed changes
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.
This PR mainly changes:
The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
and update the cache.
In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
be changed at anytime.
There are other implements of this dns cache:
kaka11chen@36fed13
This is for BE side, but it does not handle the IP change case.
[fix](proxy) Fix getProxy frequently parse hostname to ip #28479
This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...