[KCP] Provide all endpoint when getting ETCD client #2844

sedefsavas · 2020-04-01T17:07:50Z

Instead of trying to get ETCD client for a specific node, provide all control plane nodes as endpoints and let clientv3.New() handle connecting to an available endpoint. This will simplify some of the logic in the code.
Kubeadm is using a similar logic:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/etcd/etcd.go

/kind cleanup
/area control-plane

vincepri · 2020-04-01T17:49:38Z

cc @randomvariable

/milestone v0.3.x

gab-satchi · 2020-04-02T13:43:33Z

/assign
/lifecycle active

vincepri · 2020-04-02T14:32:50Z

@gab-satchi let's hold on to this one until we have #2821 merged

randomvariable · 2020-04-03T13:55:00Z

I think the dialler would need to be refactored to do something like this:

Change the etcd dialler to receive every etcd pod as a URI in the format:
- portforward://<name>.<namespace>.<type>:<port>
Refactor the Dialler so that instead of taking a pod as input, it can parse portforward URLs sent in via .Dial() and create a new tunnel on demand. Or wrap the existing Dialler to do the same.

vincepri · 2020-04-03T14:35:30Z

If we go down this path, how can we get a client to the leader directly? And how can we effectively write tests for these changes?

gab-satchi · 2020-04-03T14:52:07Z

Jason suggested using WithRequireLeader in another issue, but it just seems to require that a leader exists.

vincepri · 2020-04-03T14:55:05Z

What do you think about making a POC / small PR to see if the idea works out? Then we can discuss the design there, I want to make we have a test plan in place for any changes related to etcd.

gab-satchi · 2020-04-03T15:01:34Z

Agree with that approach. On top of having a test plan, I wanted to know if there's something I can reproduce locally to get etcd connections in a flaky state. Then see if the changes are improving that in any way.

gab-satchi · 2020-04-03T15:02:46Z

Constantly scaling up and down comes to mind, but most of our connections go through the leader now. And I think the newest machine gets assigned the leader while the oldest one is always the one to get deleted in a scale down. So unsure if that plan would work

randomvariable · 2020-04-06T12:40:17Z

Yeah, a POC probably makes sense.

Was going to mention WithRequireLeader, but we can add that in post, and is more applicable to watches than anything else.

@gab-satchi . Using tc or iptables to knock out / delay packets should work. Over 1s delay causes heartbeats to start failing.

gab-satchi · 2020-06-26T13:37:22Z

/close

k8s-ci-robot · 2020-06-26T13:37:29Z

@gab-satchi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/control-plane Issues or PRs related to control-plane lifecycle management labels Apr 1, 2020

k8s-ci-robot added this to the v0.3.x milestone Apr 1, 2020

vincepri mentioned this issue Apr 1, 2020

KubeadmControlPlane v2 iteration and robustness #2753

Closed

k8s-ci-robot assigned gab-satchi Apr 2, 2020

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 2, 2020

gab-satchi mentioned this issue Apr 9, 2020

🌱 Use all available endpoints for etcd #2888

Merged

k8s-ci-robot closed this as completed Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KCP] Provide all endpoint when getting ETCD client #2844

[KCP] Provide all endpoint when getting ETCD client #2844

sedefsavas commented Apr 1, 2020

vincepri commented Apr 1, 2020

gab-satchi commented Apr 2, 2020

vincepri commented Apr 2, 2020

randomvariable commented Apr 3, 2020

vincepri commented Apr 3, 2020

gab-satchi commented Apr 3, 2020

vincepri commented Apr 3, 2020

gab-satchi commented Apr 3, 2020

gab-satchi commented Apr 3, 2020

randomvariable commented Apr 6, 2020

gab-satchi commented Jun 26, 2020

k8s-ci-robot commented Jun 26, 2020

[KCP] Provide all endpoint when getting ETCD client #2844

[KCP] Provide all endpoint when getting ETCD client #2844

Comments

sedefsavas commented Apr 1, 2020

vincepri commented Apr 1, 2020

gab-satchi commented Apr 2, 2020

vincepri commented Apr 2, 2020

randomvariable commented Apr 3, 2020

vincepri commented Apr 3, 2020

gab-satchi commented Apr 3, 2020

vincepri commented Apr 3, 2020

gab-satchi commented Apr 3, 2020

gab-satchi commented Apr 3, 2020

randomvariable commented Apr 6, 2020

gab-satchi commented Jun 26, 2020

k8s-ci-robot commented Jun 26, 2020