Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](fqdn) Add DNS Cache for FE and BE #32869

Merged
merged 1 commit into from
Mar 28, 2024
Merged

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Mar 26, 2024

Proposed changes

In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.

This PR mainly changes:

  1. Add DNSCache for both FE and BE.
    The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
    Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
    and update the cache.
    In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
    be changed at anytime.

There are other implements of this dns cache:

  1. kaka11chen@36fed13
    This is for BE side, but it does not handle the IP change case.

  2. [fix](proxy) Fix getProxy frequently parse hostname to ip #28479
    This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
    And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

return Status::OK();
}

void DNSCache::_refresh_cache() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_refresh_cache' can be made const [readability-make-member-function-const]

Suggested change
void DNSCache::_refresh_cache() {
void DNSCache::_refresh_cache() const {

be/src/util/dns_cache.h:41:

-     void _refresh_cache();
+     void _refresh_cache() const;

~DNSCache();

// get ip by hostname
Status get(const std::string& hostname, std::string* ip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: unknown type name 'Status' [clang-diagnostic-error]

    Status get(const std::string& hostname, std::string* ip);
    ^


private:
// update the ip of hostname in cache
Status _update(const std::string& hostname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: unknown type name 'Status' [clang-diagnostic-error]

    Status _update(const std::string& hostname);
    ^

// update cache at fix internal
void _refresh_cache();

private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]

Suggested change
private:
Additional context

be/src/util/dns_cache.h:35: previously declared here

private:
^

@morningman
Copy link
Contributor Author

run buildall

1 similar comment
@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/util/dns_cache.cpp Show resolved Hide resolved
be/src/util/dns_cache.cpp Show resolved Hide resolved
// update cache at fix internal
void _refresh_cache();

private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]

Suggested change
private:
Additional context

be/src/util/dns_cache.h:40: previously declared here

private:
^

1
2

3

4

4

5

6
@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

@xinyiZzz xinyiZzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 27, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 39d695c into apache:master Mar 28, 2024
26 of 30 checks passed
morningman added a commit to morningman/doris that referenced this pull request Mar 28, 2024
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.

This PR mainly changes:

1. Add DNSCache for both FE and BE.
    The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
    Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
    and update the cache.
    In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
    be changed at anytime.

There are other implements of this dns cache:

1.  kaka11chen@36fed13
    This is for BE side, but it does not handle the IP change case.

3. apache#28479
    This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
    And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.

DNSCache::DNSCache() {
refresh_thread = std::thread(&DNSCache::_refresh_cache, this);
refresh_thread.detach();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it


DNSCache::~DNSCache() {
stop_refresh = true;
if (refresh_thread.joinable()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not joinable, so DNECache will be destoried immediately

std::unordered_set<std::string> keys;
{
std::shared_lock<std::shared_mutex> lock(mutex);
std::transform(cache.begin(), cache.end(), std::inserter(keys, keys.end()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is very likely that object cache has already been destroyed, if main thread has already exited with --grace enabled.

void DNSCache::_refresh_cache() {
while (!stop_refresh) {
// refresh every 1 min
std::this_thread::sleep_for(std::chrono::minutes(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, after _refresh_cache thread is joinable, when we stop BE with grace enabled, we will wait at least 1 min ...

morningman added a commit to morningman/doris that referenced this pull request Apr 7, 2024
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.

This PR mainly changes:

1. Add DNSCache for both FE and BE.
    The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
    Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
    and update the cache.
    In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
    be changed at anytime.

There are other implements of this dns cache:

1.  kaka11chen@36fed13
    This is for BE side, but it does not handle the IP change case.

3. apache#28479
    This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
    And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
@yiguolei yiguolei mentioned this pull request Apr 8, 2024
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.8-merged dev/2.1.2-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants