Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to determine why flannelized IPs can't see one another? #216

Closed
jayunit100 opened this issue Jun 21, 2015 · 2 comments
Closed

How to determine why flannelized IPs can't see one another? #216

jayunit100 opened this issue Jun 21, 2015 · 2 comments

Comments

@jayunit100
Copy link
Contributor

I've setup flannel 0.2 between some machines using a vagrant+virtualbox recipe with etcd, and docker 1.6. All indicators which I can think of seem to be green, but containers aren't able to ping/reach one another. What other items shall I check for ?

So far I've verified:

  1. The following checklist items seem to be working:
  • can start flannel as the default networking backdrop on two independent VMs in virtualbox,
  • using one etcd server, which is working perfectly,
  • and each of the VMs is getting a non 172 default address (i.e. 10.x.y.z)
  • the flannel network interface is indeed showing up (so is the docker0) in ip a both appear to be the same subnet.
  1. Additionally, the flanneld logs are not indicating any failures.

machine1:

Jun 21 12:19:09 kube0.ha systemd[1]: Starting Flanneld overlay address etcd agent...
Jun 21 12:19:09 kube0.ha flanneld[19445]: I0621 12:19:09.810365   19445 main.go:247] Installing signal handlers
Jun 21 12:19:09 kube0.ha flanneld[19445]: I0621 12:19:09.810399   19445 main.go:118] Determining IP address of default interface
Jun 21 12:19:09 kube0.ha flanneld[19445]: I0621 12:19:09.810994   19445 main.go:205] Using 10.0.2.15 as external interface
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.199864   19445 subnet.go:83] Subnet lease acquired: 10.0.24.0/24
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.207320   19445 main.go:215] VXLAN mode initialized
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.207750   19445 vxlan.go:115] Watching for L2/L3 misses
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.208097   19445 vxlan.go:121] Watching for new subnet leases
Jun 21 12:19:12 kube0.ha systemd[1]: Started Flanneld overlay address etcd agent.

machine 2:

[root@kube0 vagrant]# 
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Jun 21 12:18:33 kube1.ha systemd[1]: Starting Flanneld overlay address etcd agent...
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.116524   18148 main.go:247] Installing signal handlers
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.116555   18148 main.go:118] Determining IP address of default interface
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.120795   18148 main.go:205] Using 10.0.2.15 as external interface
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.177517   18148 subnet.go:83] Subnet lease acquired: 10.0.24.0/24
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.183422   18148 main.go:215] VXLAN mode initialized
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.183441   18148 vxlan.go:115] Watching for L2/L3 misses
Jun 21 12:18:33 kube1.ha flanneld[18148]: I0621 12:18:33.183453   18148 vxlan.go:121] Watching for new subnet leases
Jun 21 12:18:33 kube1.ha systemd[1]: Started Flanneld overlay address etcd agent.

Everything seems okay so far... So, the next step, I launched two containers.

[root@kube0 vagrant]# docker run -t -i docker.io/jayunit100/k8-petstore-redis-slave:r.2.8.19 /bin/sh
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
27: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:00:18:0b brd ff:ff:ff:ff:ff:ff
    inet 10.0.24.11/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe00:180b/64 scope link 
       valid_lft forever preferred_lft forever

The second container had the same (ip was 10.0.24.4, however) but the two containers couldn't reach one another:

# ping 10.0.24.11
PING 10.0.24.11 (10.0.24.11) 56(84) bytes of data.
From 10.0.24.4 icmp_seq=1 Destination Host Unreachable

and vice versa as well (10.0.24.11 couldnt ping .4).

How can I determine why it is that these two flannel subnets arent unified / able to reach one another ? Is there a way i can test for membership in the flannel vxlan, as opposed to just looking to confirm that the IP is on the 10 subnet?

My current thought is that maybe I can do a watch on etcd, to see how coreos.com/network entries are being written... but maybe there is a more natural way to debug flannel connectivity ?

@jayunit100
Copy link
Contributor Author

I ultimately found that running tcpdump -i eth1 showed me the error buried ...
kube1.ha > kube2.ha ICMP host kube1.ha unreachable - admin prohibited - length 68

The solution ? I ran "iptables -F" on all of my machines. This seemed to clear up the issue.

So in the end, I think there are really 3 steps to debugging this.

  • first make sure each machine makes docker containers w/ ip's on subnets that are correct and distinct.
  • then try to ping between your containers from different hosts. if the result is unreachable
  • run tcpdump and take a good look at the logs . You should see something getting dropped between the hosts. iptables -F may be necessary (even if iptables isn't enabled, the rules can bite you).

Hope this is the right advice, if not please leave a comment in this thread, otherwise, ill close the issue.

@jayunit100 jayunit100 changed the title How to determine when flannelized IPs can't see one another? How to determine why flannelized IPs can't see one another? Jun 22, 2015
@eyakubovich
Copy link
Contributor

@jayunit100 Yes, this is a known issue with Vagrant, please see #98

However, also make sure that the network that's been configured for flannel does not overlap with the host's IPs. From

Jun 21 12:19:09 kube0.ha flanneld[19445]: I0621 12:19:09.810994   19445 main.go:205] Using 10.0.2.15 as external interface
Jun 21 12:19:12 kube0.ha flanneld[19445]: I0621 12:19:12.199864   19445 subnet.go:83] Subnet lease acquired: 10.0.24.0/24

I see that your host's (well, VM's) IP is 10.0.2.15 and the flannel network is probably configured as 10.0.0.0/16. That puts 10.0.2.15 within the flannel range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants