Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZOOKEEPER-3095] Connect string fix for non-existent hosts #579

Conversation

mjeelanimsft
Copy link

ZKPatch: eda58d9970c76831046ddc45251c9b110856836e (extract)

ZKPatch: eda58d9970c76831046ddc45251c9b110856836e (extract)
@mjeelanimsft mjeelanimsft changed the title Connect string fix for non-existent hosts [ZOOKEEPER-3095] Connect string fix for non-existent hosts Jul 20, 2018
Copy link
Contributor

@breed breed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good @mjeelanimsft just minor formatting

host = strtok_r(0, ",", &strtok_last);
}
#endif
}
if (avec->count == 0) {
rc = ZSYSTEMERROR; // not a single host resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 space indent ;)

/* Checks that a non-existent host will not block the connection*/
void testNonexistentHost() {
char hosts[] = "jimmy:5555,127.0.0.1:22181";
watchctx_t ctx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 space indent

@@ -77,7 +77,7 @@ fi

if [ "x${base_dir}" == "x" ]
then
zk_base="../../"
zk_base="../../../"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? how did this work before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and is it related to this change? ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems the old code is not working if the base_dir is not specified.

We were using this to test the change manually, so it's related from testing purpose.

@asfgit asfgit closed this in 932fee8 Jul 28, 2018
@breed
Copy link
Contributor

breed commented Jul 28, 2018

hey @mjeelanimsft when i did the commit, your email address came up wrong. i think i fixed it, but you might want to check your git config.

thanx for the submission!

@mjeelanimsft
Copy link
Author

Thanks for the commit @breed ! My email was indeed not set properly; thanks for letting me know - I've fixed it now.

@tudor
Copy link
Contributor

tudor commented Feb 6, 2019

Any chance this can be backported to 3.5.x? We're hitting this in production (ZK runs in Kubernetes, and DNS doesn't resolve for pods that are down.) cc: @igorcanadi

@anmolnar
Copy link
Contributor

anmolnar commented Feb 7, 2019

@tudor No problem. Would you mind cherry picking to branch-3.5 and submitting a pull request?

suhasdantkale added a commit to suhasdantkale/zookeeper that referenced this pull request Feb 21, 2020
The change is backported from
Jeelani Mohamed Abdul Khader <[email protected]>'s
apache#579 change.

Code changes to not fail the session initiation if the DNS is not able to
resolve the hostname of one of the servers in the Zookeeper ensemble.

The Zookeeper client resolves all the hostnames in the ensemble while establishing the session.

In Kubernetes environment with coreDNS, the hostname entry gets removed from coreDNS during the POD restarts.
Though we can manipulate the coreDNS settings to delay the removal of the hostname entry from DNS,
we don't want to leave any race where Zookeeper client is trying to establish a session and it fails
because the DNS temporarily is not able to resolve the hostname. So as long as one of the servers
in the ensemble is able to be DNS resolvable, should we not fail the session establishment with hard error
and instead try to establish the connection with one of the other servers
@suhasdantkale
Copy link

@anmolnar , @tudor ,
I've created this PR -#1259 for this.

eolivelli pushed a commit that referenced this pull request May 4, 2020
… DNS does not resolve one of the servers in the zk ensemble

The change is backported from
Jeelani Mohamed Abdul Khader <mjeelanidevvm3360.prn2.facebook.com>'s
#579 change done in
"ZOOKEEPER-3095: Connect string fix for non-existent hosts"

Code changes to not fail the session initiation if the DNS is not able to
resolve the hostname of one of the servers in the Zookeeper ensemble.

Some Background:
The Zookeeper client resolves all the hostnames in the ensemble while establishing the session.
In Kubernetes environment with coreDNS, the hostname entry gets removed from coreDNS during the POD restarts.
Though we can manipulate the coreDNS settings to delay the removal of the hostname entry from DNS,
we don't want to leave any race where Zookeeper client is trying to establish a session and it fails
because the DNS temporarily is not able to resolve the hostname. So as long as one of the servers
in the ensemble is able to be DNS resolvable, should we not fail the session establishment with hard error
and instead try to establish the connection with one of the other servers

Author: Suhas Dantkale <[email protected]>

Reviewers: Enrico Olivelli <[email protected]>, Mate Szalay-Beko <[email protected]>, Damien Diederen <[email protected]>

Closes #1259 from suhasdantkale/suhas/ZOOKEEPER-3723
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
ZKPatch: eda58d9970c76831046ddc45251c9b110856836e (extract)

Author: Jeelani Mohamed Abdul Khader <[email protected]>

Reviewers: Benjamin Reed <[email protected]>

Closes apache#579 from mjeelanimsft/connect-string-fix-for-non-existent-hosts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants