Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods on different nodes cannot communicate (flannel/vxlan) #1719

Closed
rytis opened this issue Apr 30, 2020 · 8 comments
Closed

Pods on different nodes cannot communicate (flannel/vxlan) #1719

rytis opened this issue Apr 30, 2020 · 8 comments

Comments

@rytis
Copy link

rytis commented Apr 30, 2020

Version:

k3s version v1.17.4+k3s1 (3eee8ac)

K3s arguments:

/usr/local/bin/k3s server --no-deploy=traefik

Describe the bug

Pods on different nodes cannot communicate. Pods on the same node can.

To Reproduce

  • Two VMs running Fedora32 server
    ** Default install
    ** SELinux disabled
    ** Grub options: cgroup_memory=1 cgroup_enable=memory cgroup_enable=cpuset systemd.unified_cgroup_hierarchy=0
    ** Firewall rules added:
    firewall-cmd --permanent --add-port=6443/tcp # kubernetes api
    firewall-cmd --permanent --add-port=10250/tcp # kubelet
    firewall-cmd --permanent --add-port=8472/udp # flannel
    firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16 # pods
    firewall-cmd --permanent --zone=trusted --add-source=10.43.0.0/16 # services
    firewall-cmd --reload
  • k3s installed on master and worker nodes (192.168.1.72 and .73)
    [root@k3s-master ~]# k3s kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    k3s-worker.localdomain Ready 61m v1.17.4+k3s1
    k3s-master.localdomain Ready master 66m v1.17.4+k3s1

Expected behavior

Pods on different nodes can communicate. Pings to flannel interface IPs on different nodes should work.

Actual behavior

Deployed pods cannot communicate (master -> worker)

Pings to flannel interface don't work either

Whenever I ping from master to worker node I can see ICMP requests arriving at the worker node, but there's no echo reply sent back.

Additional context / logs

Master:

[root@k3s-master ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    100    0        0 enp0s3
10.42.0.0       0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.42.1.0       10.42.1.0       255.255.255.0   UG    0      0        0 flannel.1
192.168.1.0     0.0.0.0         255.255.255.0   U     100    0        0 enp0s3
[root@k3s-master ~]# ping 10.42.1.0
PING 10.42.1.0 (10.42.1.0) 56(84) bytes of data.
^C
--- 10.42.1.0 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4090ms

Worker:

[root@k3s-worker ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    100    0        0 enp0s3
10.42.0.0       10.42.0.0       255.255.255.0   UG    0      0        0 flannel.1
10.42.1.0       0.0.0.0         255.255.255.0   U     0      0        0 cni0
192.168.1.0     0.0.0.0         255.255.255.0   U     100    0        0 enp0s3


[root@k3s-worker ~]# tcpdump -i any -nn port 8472
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
09:18:49.249265 IP 192.168.1.72.59205 > 192.168.1.73.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.0.0 > 10.42.1.0: ICMP echo request, id 4, seq 1, length 64
09:18:50.267200 IP 192.168.1.72.59205 > 192.168.1.73.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.0.0 > 10.42.1.0: ICMP echo request, id 4, seq 2, length 64
09:18:51.293035 IP 192.168.1.72.59205 > 192.168.1.73.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.0.0 > 10.42.1.0: ICMP echo request, id 4, seq 3, length 64
09:18:52.315312 IP 192.168.1.72.59205 > 192.168.1.73.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.0.0 > 10.42.1.0: ICMP echo request, id 4, seq 4, length 64
09:18:53.338992 IP 192.168.1.72.59205 > 192.168.1.73.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.0.0 > 10.42.1.0: ICMP echo request, id 4, seq 5, length 64
@rytis
Copy link
Author

rytis commented Apr 30, 2020

BTW, as per flannel-io/flannel#1243 (comment) I did try ip route add 10.42.0.0/16 dev cni0, but that made no difference.

Same behaviour on Fedora31.

On the same hosts, I tried (without k3s installed, so to avoid id clash):

ip link add vxlan1 type vxlan id 1 remote 192.168.1.73 dstport 8472 dev enp0s3
ip link set vxlan up
ip addr add 10.0.0.1/24 dev vxlan1

And it worked fine, I could ping vxlan device IPs between the hosts.

@niusmallnan
Copy link
Contributor

@rytis Did you try ethtool -K flannel.1 tx-checksum-ip-generic off ?

@rytis
Copy link
Author

rytis commented Apr 30, 2020

I've tried it now (on both sides, master and worker), same effect, no ICMP replies (reqs are appearing on the other node just as before):

[root@k3s-master ~]# ethtool -K flannel.1 tx-checksum-ip-generic off
Actual changes:
tx-checksumming: off
        tx-checksum-ip-generic: off
tcp-segmentation-offload: off
        tx-tcp-segmentation: off [requested on]
        tx-tcp-ecn-segmentation: off [requested on]
        tx-tcp-mangleid-segmentation: off [requested on]
        tx-tcp6-segmentation: off [requested on]
[root@k3s-master ~]# 
[root@k3s-master ~]# ping 10.42.1.0
PING 10.42.1.0 (10.42.1.0) 56(84) bytes of data.
^C
--- 10.42.1.0 ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5111ms

[root@k3s-master ~]# ping 10.42.1.1
PING 10.42.1.1 (10.42.1.1) 56(84) bytes of data.
^C
--- 10.42.1.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1058ms

@niusmallnan
Copy link
Contributor

Please check the fdb info, the dst IP should be the node IP :

bridge fdb show dev flannel.1

@rytis
Copy link
Author

rytis commented Apr 30, 2020

It's the peer's IP:

Master:

[root@k3s-master ~]# ip addr show enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether 08:00:27:15:3a:2a brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.72/24 brd 192.168.1.255 scope global dynamic noprefixroute enp0s3
       valid_lft 63120sec preferred_lft 63120sec
    inet6 fe80::62ef:58e2:63b0:4b7c/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@k3s-master ~]# bridge fdb show dev flannel.1
d6:ad:6b:77:37:eb dst 192.168.1.73 self permanent
[root@k3s-master ~]# 

Worker:

[root@k3s-worker ~]# ip addr show enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether 08:00:27:8a:89:53 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.73/24 brd 192.168.1.255 scope global dynamic noprefixroute enp0s3
       valid_lft 63053sec preferred_lft 63053sec
    inet6 fe80::fc16:8c40:f56d:827a/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@k3s-worker ~]# bridge fdb show dev flannel.1
ca:a3:3a:8f:57:36 dst 192.168.1.72 self permanent

@rytis
Copy link
Author

rytis commented May 4, 2020

@niusmallnan just a bit more info, I just realised that the MAC addresses in FDB don't correspond to anything on those two machines:

Master:

[root@k3s-master ~]# bridge fdb show dev flannel.1
d6:ad:6b:77:37:eb dst 192.168.1.73 self permanent
[root@k3s-master ~]# ip addr|grep ether
    link/ether 08:00:27:15:3a:2a brd ff:ff:ff:ff:ff:ff
    link/ether 6a:c0:83:56:f6:88 brd ff:ff:ff:ff:ff:ff
    link/ether da:ce:08:e7:19:e4 brd ff:ff:ff:ff:ff:ff
    link/ether 2e:44:8c:0e:e1:c6 brd ff:ff:ff:ff:ff:ff link-netns cni-9ab7f860-f12f-cb12-0252-1428ff7fa8e9
    link/ether 3e:4e:8e:21:6a:c9 brd ff:ff:ff:ff:ff:ff link-netns cni-082af5cd-cbdf-4307-0986-85464bfeebdb
    link/ether 8a:42:06:6c:38:3a brd ff:ff:ff:ff:ff:ff link-netns cni-e35503a9-917a-3cfc-ac70-0044221c6ec9
    link/ether 4e:c1:b8:a4:7f:8c brd ff:ff:ff:ff:ff:ff link-netns cni-8554fc99-11f7-b82a-0185-aef6fa240da0

Worker:

[root@k3s-worker ~]# bridge fdb show dev flannel.1
ca:a3:3a:8f:57:36 dst 192.168.1.72 self permanent
[root@k3s-worker ~]# ip addr|grep ether
    link/ether 08:00:27:8a:89:53 brd ff:ff:ff:ff:ff:ff
    link/ether 26:2f:80:14:19:61 brd ff:ff:ff:ff:ff:ff
    link/ether f6:06:ee:66:1a:cc brd ff:ff:ff:ff:ff:ff
    link/ether 5e:7a:08:82:83:d6 brd ff:ff:ff:ff:ff:ff link-netns cni-7b70e9eb-c375-aae2-dff4-10900fdcdbbb
    link/ether 42:76:3c:e8:07:ed brd ff:ff:ff:ff:ff:ff link-netns cni-e915e880-0f0d-c8b3-be9b-9d0fb83a53c6
    link/ether 1e:10:c4:a6:72:50 brd ff:ff:ff:ff:ff:ff link-netns cni-bedcfc25-37db-b656-f5ea-44d527581c6d

What are they?..

@vbohinc
Copy link

vbohinc commented May 7, 2020

Check if you enabled masqerading and iptables-legacy.
sudo firewall-cmd --add-masquerade --permanent
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy

For a more complete list of firewall rules and required open ports: https://rancher.com/docs/rancher/v2.x/en/installation/options/firewall/.

Update 1:
Done a bit of testing on Fedora 32 VM's and can confirm, inter-node communication over vxlan breaks, the work nodes also loose internet connectivity. Though I still have to look for the cause.

Update 2:
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
fixes inter-node communication temporarily; though still not sure why this doesn't stick.

Update 3:
On my main cluster (bare-metal setup wihtout HA) the master node runs Fedora 31, and I added a worker node with Fedora 32 (KVM VM on a host in a different subnet), networking works fine with Calico; inter-node communication and internet access works.

Update 4:
Fedora 32 node does behave strange, connectivity inside pods is sometimes lost completely, then Calico synchronization restores it. Most of the time network issues inside pods appear to be caused by the loss of proper DNS resolution (pinging the ip works), but then again only on this node. This happens because the firewall drops packets. This happens with firewalld with either the iptables or nftables backends.

@stale
Copy link

stale bot commented Jul 31, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 31, 2021
@stale stale bot closed this as completed Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants