Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KCP] Provide all endpoint when getting ETCD client #2844

Closed
sedefsavas opened this issue Apr 1, 2020 · 12 comments
Closed

[KCP] Provide all endpoint when getting ETCD client #2844

sedefsavas opened this issue Apr 1, 2020 · 12 comments
Assignees
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Milestone

Comments

@sedefsavas
Copy link

Instead of trying to get ETCD client for a specific node, provide all control plane nodes as endpoints and let clientv3.New() handle connecting to an available endpoint. This will simplify some of the logic in the code.
Kubeadm is using a similar logic:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/etcd/etcd.go

/kind cleanup
/area control-plane

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/control-plane Issues or PRs related to control-plane lifecycle management labels Apr 1, 2020
@vincepri
Copy link
Member

vincepri commented Apr 1, 2020

cc @randomvariable

/milestone v0.3.x

@gab-satchi
Copy link
Member

/assign
/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 2, 2020
@vincepri
Copy link
Member

vincepri commented Apr 2, 2020

@gab-satchi let's hold on to this one until we have #2821 merged

@randomvariable
Copy link
Member

I think the dialler would need to be refactored to do something like this:

  • Change the etcd dialler to receive every etcd pod as a URI in the format:
    • portforward://<name>.<namespace>.<type>:<port>
  • Refactor the Dialler so that instead of taking a pod as input, it can parse portforward URLs sent in via .Dial() and create a new tunnel on demand. Or wrap the existing Dialler to do the same.

@vincepri
Copy link
Member

vincepri commented Apr 3, 2020

If we go down this path, how can we get a client to the leader directly? And how can we effectively write tests for these changes?

@gab-satchi
Copy link
Member

Jason suggested using WithRequireLeader in another issue, but it just seems to require that a leader exists.

@vincepri
Copy link
Member

vincepri commented Apr 3, 2020

What do you think about making a POC / small PR to see if the idea works out? Then we can discuss the design there, I want to make we have a test plan in place for any changes related to etcd.

@gab-satchi
Copy link
Member

Agree with that approach. On top of having a test plan, I wanted to know if there's something I can reproduce locally to get etcd connections in a flaky state. Then see if the changes are improving that in any way.

@gab-satchi
Copy link
Member

Constantly scaling up and down comes to mind, but most of our connections go through the leader now. And I think the newest machine gets assigned the leader while the oldest one is always the one to get deleted in a scale down. So unsure if that plan would work

@randomvariable
Copy link
Member

Yeah, a POC probably makes sense.

Was going to mention WithRequireLeader, but we can add that in post, and is more applicable to watches than anything else.

@gab-satchi . Using tc or iptables to knock out / delay packets should work. Over 1s delay causes heartbeats to start failing.

@gab-satchi
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@gab-satchi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Projects
None yet
Development

No branches or pull requests

5 participants