Multicloud K3s not working #8652

AlbertoSoutullo · 2023-10-16T15:37:28Z

Environmental Info:
K3s Version:
k3s version v1.27.6+k3s1 (bd04941)
go version go1.20.8

Node(s) CPU architecture, OS, and Version:
Server: Linux 5.15.0-75-generic #82-Ubuntu SMP Tue Jun 6 23:10:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Agent: Linux 5.15.0-84-generic #93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
2 Nodes in different networks.

Describe the bug:
Hello, first of all, my experience with Kubernetes is not very extensive. I am having troubles setting up a [multicloud](https://docs.k3s.io/installation/network-options#embedded-k3s-multicloud-solution) cluster with K3s. We have several baremetal machines in different networks. I want those machines to work in the same cluster. We have wireguard set up in all of our machines, and it is working propperly.

The only way I could make our multicloud cluster work was disabling our Wireguard, and then everything works fine with:

Server:

curl -sfL [https://get.k3s.io](https://get.k3s.io/) | INSTALL_K3S_EXEC="server --node-external-ip <SERVER_PUBLIC_IP> --flannel-backend=wireguard-native --flannel-external-ip" sh -s -

Agent:

curl -sfL [https://get.k3s.io](https://get.k3s.io/) | INSTALL_K3S_EXEC="agent --server https://<SERVER_PUBLIC_IP>:6443 --token <TOKEN> --node-external-ip <AGENT_PUBLICIP>" sh -s -

With this, everything works properly, but I want to use our own Wireguard. When I configure K3s to use it, I can see that both nodes looks like ready, PODs are being deployed in both of them, but only the pods in the Server can interconnect between them. The PODs in the worker cannot connect to PODs in the worker, and PODs on Server can resolve PODs on worker, but they cannot stablish connection using hostnames.

If I do the deployment, and I enter in the pods and try to do manual connections with netcat, I connects succesfully with POD ip, Service IP, but not hostname, so I assume there is a DNS problem but I don’t know where:

root@pod-0:/node# nc -zv peer-34 5000
^C
root@pod-0:/node# nc -zv 10.42.1.27 5000
Connection to 10.42.1.27 5000 port [tcp/*] succeeded!
root@pod-0:/node# nc -zv 10.43.182.61 5000
Connection to 10.42.1.27 5000 port [tcp/*] succeeded!
root@pod-0:/node# nc -zv pod-34 5000
^C
root@pod-0:/node# nc -zv peer-34 5000
^C

The upper scenario is from a POD located in Server trying to connect a POD in Agent. The other way around doesn’t work. For example:

root@pod-1:/node# nc -zv peer-38 5000
^C
root@pod-1:/node# nc -zv 10.42.1.25 5000
^C
root@pod-1:/node# nc -zv 10.43.112.39 5000
^C

I am doing the following set up:

Server:

curl -sfL [https://get.k3s.io](https://get.k3s.io/) | INSTALL_K3S_EXEC="--node-ip=<WIREGUARD_SERVER_IP> --advertise-address=<WIREGUARD_IP> --node-external-ip=<SERVER_PUBLIC_IP> --flannel-iface=wg0" sh -

Agent:

curl -sfL [https://get.k3s.io](https://get.k3s.io/) | K3S_URL=https://<WIREGUARD_SERVER_IP>:6443 K3S_TOKEN=<TOKEN> INSTALL_K3S_EXEC="--node-ip=<WIREGUARD_AGENT_IP> --node-external-ip=<AGENT_PUBLIC_IP> --flannel-iface=wg0" sh -

I tried more configurations, but this looked the “most correct one” to me, and I didn’t want to flood this issue with every single configuration I tried. What I want to tell K3s is that it has to use our Wireguard interface, but for sure there is something wrong here.

Information that I have read:

This was very similar to my issue, but didn’t solve anything: https://github.com/alekc-go/flannel-fixer

This issue, that refers to the previous Issue: #1824

Regarding using wireguard (wireguard-native) as flannel backend, which makes K3s to crash because the wireguard interface is already set up: #1608

I assume when you are running --flannel-backend=wireguard-native it creates its own interface, which is when I am running into the following error:

level=fatal msg="flannel exited: failed to set up the route: failed to set interface flannel-wg to UP state: address already in use”

Similar situation:

https://publish.reddit.com/embed?url=https://www.reddit.com/r/selfhosted/comments/mu6et4/has_anyone_setup_k3s_over_wireguard_is_it_possible/ih30pqj/

More issues/PR:

#2689

#6180

#7384

https://hackmd.io/@WesleyCh3n/ntu_iot_k3s

Checking kubectl describe nodes:

Server:

…
Annotations:        
[`alpha.kubernetes.io/provided-node-ip:](http://alpha.kubernetes.io/provided-node-ip:) <WIREGUARD_SERVER-IP>
[flannel.alpha.coreos.com/backend-data:](http://flannel.alpha.coreos.com/backend-data:) {"VNI":1,"VtepMAC":"fe:34:5f:b9:9e:42"}
[flannel.alpha.coreos.com/backend-type:](http://flannel.alpha.coreos.com/backend-type:) vxlan
[flannel.alpha.coreos.com/kube-subnet-manager:](http://flannel.alpha.coreos.com/kube-subnet-manager:) true
[flannel.alpha.coreos.com/public-ip:](http://flannel.alpha.coreos.com/public-ip:) <WIREGUARD_SERVER-IP>
[k3s.io/external-ip:](http://k3s.io/external-ip:) <SERVER_PUBLIC_IP>
[k3s.io/hostname:](http://k3s.io/hostname:) <HOSTNAME>
[k3s.io/internal-ip:](http://k3s.io/internal-ip:) <WIREGUARD_SERVER-IP>
[k3s.io/node-args:](http://k3s.io/node-args:)
["server","--node-ip",<WIREGUARD_SERVER-IP>,"--advertise-address",<WIREGUARD_SERVER-IP>,"--node-external-ip",<SERVER_PUBLIC_IP>,"--flannel-iface","wg0"]
…
Addresses:
InternalIP:  <WIREGUARD_SERVER-IP>
ExternalIP:  <SERVER_PUBLIC_IP>
…
PodCIDR:                      10.42.0.0/24
PodCIDRs:                     10.42.0.0/24

In Agent information looks ok aswell.

Kubernetes DNS resolution:

https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

$ sudo kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server:         10.43.0.10
Address:        10.43.0.10#53
Name:   kubernetes.default.svc.cluster.local
Address: 10.43.0.1

$ sudo kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local [statusim.net](http://statusim.net/)
nameserver 10.43.0.10
options ndots:5

DNS pod is running:

$ sudo kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

$ sudo kubectl logs --namespace=kube-system -l k8s-app=kube-dns
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server

$ kubectl get endpoints kube-dns --namespace=kube-system

NAME       ENDPOINTS                                  AGE
kube-dns   10.42.0.4:53,10.42.0.4:53,10.42.0.4:9153   10m

Adding log to CoreDNS:

[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/reload: Running configuration SHA512 = 0b4864bedbec2be9d5f56a213b8a6b63704dfe9ae1a318dda30b5aae0390e6943ed5526c895c41d120a3e93ad0c6302ce165e0d703b7df918cb7797453397d1f
[INFO] Reloading complete
[INFO] 127.0.0.1:51812 - 21722 "HINFO IN 7560458663601928980.4899293149017499055. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.000860068s
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server

Known Issues (From documentation):

Some Linux distributions (e.g. Ubuntu) use a local DNS resolver by default (systemd-resolved). Systemd-resolved moves and replaces /etc/resolv.conf with a stub file that can cause a fatal forwarding loop when resolving names in upstream servers. This can be fixed manually by using kubelet's --resolv-conf flag to point to the correct resolv.conf (With systemd-resolved, this is /run/systemd/resolve/resolv.conf). kubeadm automatically detects systemd-resolved, and adjusts the kubelet flags accordingly.

Checking kubectl -n kube-system edit configmap coredns:

forward . /etc/resolv.conf

It’s content:

nameserver 127.0.0.53
options edns0 trust-ad
search [statusim.net](http://statusim.net/)

Checking /run/systemd/resolve/resolv.conf :

nameserver 185.12.64.2
nameserver 185.12.64.1
nameserver 2a01:4ff:ff00::add:1
# Too many DNS servers configured, the following entries may be ignored.
nameserver 2a01:4ff:ff00::add:2
search [statusim.net](http://statusim.net/)

I tried modifying /run/systemd/resolve/resolv.conf to delete the two ipv6 addresses, and use it as an argument with --resolv-conf, but it is being overwriten for some reason I don’t understand .

My questions are:

Am I doing something wrong in the setup?
What is the point of using --node-external-ip in a situation like this? I guess it is useful for OpenLens or things like that? Or it is not related?
Looks like it is a DNS problem, but I am not sure 100%
If I have my own Wireguard, is it correct that I am using vxlan backend? I assume this is correctly abstracted?

Expected behavior:
Same behaviour as the set up with --flannel-backend=wireguard-native and our Wireguard deactivated.

Actual behavior:
Explained above.

Additional context / logs:
Above.

The text was updated successfully, but these errors were encountered:

github-project-automation bot added this to K3s Development Oct 16, 2023

github-project-automation bot moved this to New in K3s Development Oct 16, 2023

k3s-io locked and limited conversation to collaborators Oct 16, 2023

brandond converted this issue into discussion #8657 Oct 16, 2023

github-project-automation bot moved this from New to Done Issue in K3s Development Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Multicloud K3s not working #8652

Multicloud K3s not working #8652

AlbertoSoutullo commented Oct 16, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Multicloud K3s not working #8652

Multicloud K3s not working #8652

Comments

AlbertoSoutullo commented Oct 16, 2023 • edited Loading

This issue was moved to a discussion.

AlbertoSoutullo commented Oct 16, 2023 •

edited

Loading