How to determine why flannelized IPs can't see one another? #216

jayunit100 · 2015-06-21T03:32:48Z

I've setup flannel 0.2 between some machines using a vagrant+virtualbox recipe with etcd, and docker 1.6. All indicators which I can think of seem to be green, but containers aren't able to ping/reach one another. What other items shall I check for ?

So far I've verified:

The following checklist items seem to be working:

can start flannel as the default networking backdrop on two independent VMs in virtualbox,
using one etcd server, which is working perfectly,
and each of the VMs is getting a non 172 default address (i.e. 10.x.y.z)
the flannel network interface is indeed showing up (so is the docker0) in ip a both appear to be the same subnet.

Additionally, the flanneld logs are not indicating any failures.

machine1:

Jun 21 12:19:09 kube0.ha systemd[1]: Starting Flanneld overlay address etcd agent...
Jun 21 12:19:09 kube0.ha flanneld[19445]: I0621 12:19:09.810365   19445 main.go:247] Installing signal handlers
Jun 21 12:19:09 kube0.ha flanneld[19445]: I0621 12:19:09.810399   19445 main.go:118] Determining IP address of default interface
Jun 21 12:19:09 kube0.ha flanneld[19445]: I0621 12:19:09.810994   19445 main.go:205] Using 10.0.2.15 as external interface
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.199864   19445 subnet.go:83] Subnet lease acquired: 10.0.24.0/24
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.207320   19445 main.go:215] VXLAN mode initialized
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.207750   19445 vxlan.go:115] Watching for L2/L3 misses
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.208097   19445 vxlan.go:121] Watching for new subnet leases
Jun 21 12:19:12 kube0.ha systemd[1]: Started Flanneld overlay address etcd agent.

machine 2:

[root@kube0 vagrant]# 
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Jun 21 12:18:33 kube1.ha systemd[1]: Starting Flanneld overlay address etcd agent...
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.116524   18148 main.go:247] Installing signal handlers
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.116555   18148 main.go:118] Determining IP address of default interface
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.120795   18148 main.go:205] Using 10.0.2.15 as external interface
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.177517   18148 subnet.go:83] Subnet lease acquired: 10.0.24.0/24
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.183422   18148 main.go:215] VXLAN mode initialized
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.183441   18148 vxlan.go:115] Watching for L2/L3 misses
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.183453   18148 vxlan.go:121] Watching for new subnet leases
Jun 21 12:18:33 kube1.ha systemd[1]: Started Flanneld overlay address etcd agent.

Everything seems okay so far... So, the next step, I launched two containers.

[root@kube0 vagrant]# docker run -t -i docker.io/jayunit100/k8-petstore-redis-slave:r.2.8.19 /bin/sh
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
27: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:00:18:0b brd ff:ff:ff:ff:ff:ff
    inet 10.0.24.11/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe00:180b/64 scope link 
       valid_lft forever preferred_lft forever

The second container had the same (ip was 10.0.24.4, however) but the two containers couldn't reach one another:

# ping 10.0.24.11
PING 10.0.24.11 (10.0.24.11) 56(84) bytes of data.
From 10.0.24.4 icmp_seq=1 Destination Host Unreachable

and vice versa as well (10.0.24.11 couldnt ping .4).

How can I determine why it is that these two flannel subnets arent unified / able to reach one another ? Is there a way i can test for membership in the flannel vxlan, as opposed to just looking to confirm that the IP is on the 10 subnet?

My current thought is that maybe I can do a watch on etcd, to see how coreos.com/network entries are being written... but maybe there is a more natural way to debug flannel connectivity ?

The text was updated successfully, but these errors were encountered:

jayunit100 · 2015-06-22T03:25:49Z

I ultimately found that running tcpdump -i eth1 showed me the error buried ...
kube1.ha > kube2.ha ICMP host kube1.ha unreachable - admin prohibited - length 68

The solution ? I ran "iptables -F" on all of my machines. This seemed to clear up the issue.

So in the end, I think there are really 3 steps to debugging this.

first make sure each machine makes docker containers w/ ip's on subnets that are correct and distinct.
then try to ping between your containers from different hosts. if the result is unreachable
run tcpdump and take a good look at the logs . You should see something getting dropped between the hosts. iptables -F may be necessary (even if iptables isn't enabled, the rules can bite you).

Hope this is the right advice, if not please leave a comment in this thread, otherwise, ill close the issue.

eyakubovich · 2015-06-22T20:31:21Z

@jayunit100 Yes, this is a known issue with Vagrant, please see #98

However, also make sure that the network that's been configured for flannel does not overlap with the host's IPs. From

Jun 21 12:19:09 kube0.ha flanneld[19445]: I0621 12:19:09.810994   19445 main.go:205] Using 10.0.2.15 as external interface
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.199864   19445 subnet.go:83] Subnet lease acquired: 10.0.24.0/24

I see that your host's (well, VM's) IP is 10.0.2.15 and the flannel network is probably configured as 10.0.0.0/16. That puts 10.0.2.15 within the flannel range.

jayunit100 mentioned this issue Jun 21, 2015

Flannel allows you to send invalid/unused commandline args without dying. #217

Closed

jayunit100 closed this as completed Jun 22, 2015

jayunit100 changed the title ~~How to determine when flannelized IPs can't see one another?~~ How to determine why flannelized IPs can't see one another? Jun 22, 2015

jayunit100 mentioned this issue Jun 23, 2015

Don't allow extra args #223

Merged

CocaCola183 mentioned this issue Feb 3, 2017

centos flannel vxlan, ping cross host failed #604

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to determine why flannelized IPs can't see one another? #216

How to determine why flannelized IPs can't see one another? #216

jayunit100 commented Jun 21, 2015

jayunit100 commented Jun 22, 2015

eyakubovich commented Jun 22, 2015

How to determine why flannelized IPs can't see one another? #216

How to determine why flannelized IPs can't see one another? #216

Comments

jayunit100 commented Jun 21, 2015

jayunit100 commented Jun 22, 2015

eyakubovich commented Jun 22, 2015