Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded etcd server ignores --advertise-address flag #2965

Closed
bpavacic opened this issue Feb 20, 2021 · 13 comments
Closed

Embedded etcd server ignores --advertise-address flag #2965

bpavacic opened this issue Feb 20, 2021 · 13 comments
Assignees
Labels
kind/feature A large new piece of functionality status/2023 confirmed
Milestone

Comments

@bpavacic
Copy link

Environmental Info:
K3s Version:

k3s version v1.20.2+k3s1 (1d4adb0)
go version go1.15.5

Node(s) CPU architecture, OS, and Version:

Linux ams-2 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
One k3s server with an embedded etcd server, a local 10.x.x.x network interface and an external IP address (51.x.x.x)
Another k3s server trying to join the cluster

Describe the bug:
Embedded etcd server advertises server's private IP address and ignores --advertise-address flag.

Steps To Reproduce:

Server 1 has an internal IP address (10.x.x.x) and a public IP address (51.x.x.x ) that it is accessible through.
K3S server has been installed with --cluster-init --node-external-ip 51.x.x.x --advertise-address 51.x.x.x

Server 2 (outside the 10.x.x.x network) is trying to join with --server https://51.x.x.x:6443/ but the installation hangs.

Expected behavior:
Server 2 is expected to be able to join the cluster. Etcd server on Server 1 is expected to use --advertise-address flag or a separate --etcd-advertise-address flag) and advertise server's external IP address.

Actual behavior:
Installation hangs as Server 2 is unable to connect to Server 1's etcd server.

Additional context / logs:

Server 1 logs:

Feb 15 14:44:43 XXXXX k3s[596166]: {"level":"info","ts":"2021-02-15T14:44:43.535+0100","caller":"embed/etcd.go:302","msg":**"starting an etcd server"**,"etcd-version":"3.4.13","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.15.5","go-os":"linux","go-arch":"amd64","max-cpu-set":2,"max-cpu-available":2,"member-initialized":true,"name":"ams-2-4b4052db","data-dir":"/var/lib/rancher/k3s/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/k3s/server/db/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":[**"http://localhost:2380"**],"listen-peer-urls":["**https://10.x.x.x:2380**"],"advertise-client-urls":["https://10.x.x.x:2379"],"listen-client-urls":["https://10.x.x.x:2379","https://127.0.0.1:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"ne

Server 2 is trying to connect to Server 1's internal IP address (10.x.x.x):

Feb 20 10:43:55 gce-2 k3s[7750]: {"level":"warn","ts":"2021-02-20T10:43:55.095Z","caller":"etcdserver/cluster_util.go:76","msg":"failed to get cluster response","address":"https://10.x.x.x:2380/members","error":"Get \"https://10.x.x.x:2380/members\": dial tcp 10.x.x.x:2380: i/o timeout"}

@brandond
Copy link
Member

This is by design. At the moment, the embedded etcd only communicates via the private network addresses. At some point in the future we may advertise multiple addresses for each node to support control-plane nodes without lan connectivity, but that will require more QA to support than we wanted to allow for initially.

@bpavacic
Copy link
Author

Thank you for your reply.
I think it would still be a nice feature to have as there are use cases where nodes are not physically on the same LAN but with LAN-like latency

@brandond
Copy link
Member

I'm going to leave this open so that we can track it as a feature request.

@brandond brandond reopened this Feb 22, 2021
@brandond brandond added the kind/feature A large new piece of functionality label Feb 22, 2021
@brandond brandond added this to the Backlog milestone Feb 22, 2021
@Wh1t3Fox
Copy link

Wh1t3Fox commented May 6, 2021

This would be great as I have some devices with multiple NICs and it's trying to use the wrong one

@oivindoh
Copy link

oivindoh commented Jul 6, 2021

This is an issue trying to set up a purely ipv6 cluster too. The issue I'm experiencing here is both that the wrong IP address on the interface/LAN is chosen (the interface has two addresses on this network) and I have no way of overriding it, as well as not handling the actual ipv6 address properly.

time="2021-07-06T19:47:17.854533166Z" level=info msg="Managed etcd cluster initializing"
unexpected error setting up listen-peer-urls: URL address does not have the form "host:port": https://2001:100:1:1:64b2:1fff:ae63:3205:2380

@mysticaltech
Copy link

Same here, it's using the public ips, even through I specify both node-ip, node-external-ip, and advertise-address as local ips, it does not take those in consideration!

Aug 31 16:35:38 k3s-control-plane-1 k3s[1005]: {"level":"info","ts":"2021-08-31T16:35:38.330+0200","caller":"embed/etcd.go:302","msg":"starting an etcd server","etcd-version":"3.4.13","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.16.6","go-os":"linux","go-arch":"amd64","max-cpu-set":1,"max-cpu-available":1,"member-initialized":false,"name":"k3s-control-plane-1-e20a1b42","data-dir":"/var/lib/rancher/k3s/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/k3s/server/db/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["https://188.34.177.43:2380"],"advertise-client-urls":["https://188.34.177.43:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://188.34.177.43:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"k3s-control-plane-0-aa7f86d6=https://78.46.151.110:2380,k3s-control-plane-1-e20a1b42=https://188.34.177.43:2380","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-size-bytes":2147483648,"pre-vote":false,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":""}
Aug 31 16:35:38 k3s-control-plane-1 k3s[1005]: {"level":"info","ts":"2021-08-31T16:35:38.353+0200","caller":"etcdserver/backend.go:80","msg":"opened backend db","path":"/var/lib/rancher/k3s/server/db/etcd/member/snap/db","took":"22.49246ms"}
Aug 31 16:35:43 k3s-control-plane-1 k3s[1005]: {"level":"warn","ts":"2021-08-31T16:35:43.294+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
Aug 31 16:35:43 k3s-control-plane-1 k3s[1005]: time="2021-08-31T16:35:43.295363238+02:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"

@brandond
Copy link
Member

@mysticaltech please see the statement at #2965 (comment)

@mysticaltech
Copy link

@brandond Thanks, yes saw that, but the problem is that I WANT it to communicate on private IPs, but now it is choosing public ones!!

@brandond
Copy link
Member

brandond commented Aug 31, 2021

Ah I see, you have multiple interfaces and it's not picking the one you'd like? Setting the --node-ip flag to the address of the private interface you'd like it to use should override the default selection logic, which picks the interface that has the default route associated with it.

@mysticaltech
Copy link

I did that, for --node-ip, --node-external-ip, --advertise-address, and --tls-san, all set to the internal ip, and it still picked up the external one 🤯

However, I'm using Cilium as CNI, could that be the cause?

@mysticaltech
Copy link

Will try sticking to only --node-ip and increase log verbosity, see if something shows up huh, thanks for the tip 🙏 Maybe what I did was overkill, and it did not work because of it?! Will report back.

@mysticaltech
Copy link

mysticaltech commented Aug 31, 2021

@brandond You were right, I tried again, this time carefully reading to see which option go in for servers and which are to be used for agents, then I set:

  • Servers (control plane nodes): I set node-ip, tls-san, advertise-address to the internal IP, and also set flannel-iface to the correct network interface
  • Agents (workers): I set only node-ip and flannel-iface

And it worked like a charm! Thanks again for the support! ✨

For those interested in seeing the details of my working config, it's all open-source here https://github.com/mysticaltech/kube-hetzner.

@grafjo
Copy link

grafjo commented Jan 4, 2022

@mysticaltech i can can confirm that the embedded etcd has some strange behavior: the --flannel-iface-option is used to bind etcd-process on a network interface! even when you're running k3s without flannel cni

@caroline-suse-rancher caroline-suse-rancher moved this to 🆕 New in K3s Backlog Nov 15, 2022
@caroline-suse-rancher caroline-suse-rancher moved this from To Be Sorted to Feature Requests in K3s Backlog Nov 29, 2022
@vitorsavian vitorsavian self-assigned this Jun 13, 2023
@vitorsavian vitorsavian moved this from Backlog to Working in K3s Development Jun 13, 2023
@vitorsavian vitorsavian closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2023
@github-project-automation github-project-automation bot moved this from Working to Done Issue in K3s Development Aug 17, 2023
@github-project-automation github-project-automation bot moved this from Feature Requests to Closed in K3s Backlog Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A large new piece of functionality status/2023 confirmed
Projects
Status: Closed
Archived in project
Development

No branches or pull requests

8 participants