Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flannel stops working on EC2 #87

Closed
dennybritz opened this issue Nov 24, 2014 · 19 comments
Closed

Flannel stops working on EC2 #87

dennybritz opened this issue Nov 24, 2014 · 19 comments
Labels

Comments

@dennybritz
Copy link

First of all, sorry for the high-level error report, but I'm not sure what is going wrong or how to best debug it. I am running a ~8 node CoreOS cluster on EC2, stable channel. My etcd is running on a separate node. Using flannel from the current master branch built within a container.

Everything works fine for about 12 hours but then the routing stops working and packets are being dropped. This is reproducible and happens every time I re-provision my cluster (restart flannel and docker). I was initially running the UDP backend and I didn't find any error messages in the journal logs. I then switched to the vxlan backend and I am seeing the following:

Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.506152 17341 vxlan.go:264] L3 miss: 10.10.109.8
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.506189 17341 device.go:228] calling NeighSet: 10.10.109.8, 12:7d:bc:75:19:1f
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.506235 17341 vxlan.go:275] AddL3 succeeded
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.507063 17341 vxlan.go:248] L2 miss: 12:7d:bc:75:19:1f
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.507078 17341 device.go:202] calling NeighAdd: 10.0.0.129, 12:7d:bc:75:19:1f
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.507115 17341 vxlan.go:259] AddL2 succeeded
Nov 24 11:08:01 kalm flanneld[17341]: I1124 11:08:01.843236 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8
Nov 24 11:08:12 kalm flanneld[17341]: I1124 11:08:12.576140 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8
Nov 24 11:08:13 kalm flanneld[17341]: I1124 11:08:13.578175 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8
Nov 24 11:08:14 kalm flanneld[17341]: I1124 11:08:14.580082 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8
Nov 24 11:08:36 kalm flanneld[17341]: I1124 11:08:36.658179 17341 vxlan.go:264] L3 miss: 10.10.109.8
Nov 24 11:08:36 kalm flanneld[17341]: I1124 11:08:36.658219 17341 device.go:228] calling NeighSet: 10.10.109.8, 12:7d:bc:75:19:1f
Nov 24 11:08:36 kalm flanneld[17341]: I1124 11:08:36.658292 17341 vxlan.go:275] AddL3 succeeded
Nov 24 11:09:04 kalm flanneld[17341]: I1124 11:09:04.946135 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8

Any ideas what could be causing this? Is it perhaps something specific to EC2?

@eyakubovich
Copy link
Contributor

@dennybritz Ignoring not a miss: should be benign and can be normal. It's logged as vxlan is new and I want extra logging. I think the problem is somewhere else. You're running in a VPC, correct? When it stops working, are you still able to reach other machines via eth0, not flannel? Next, can you ping the flannel interface (flannel0 or flannel.1) on a local machine? Then try flannel0/flannel.1 of a remote host. If remote flannel interface is not pingable, could you run tcpdump to capture traffic on UDP 8285 (if running udp backend) or UDP 8472 (if running vxlan). You can trun tcpdump via toolbox (https://github.com/coreos/toolbox) which should be installed on your CoreOS machine. The tcpdump should show where the traffic is getting dropped.

@dennybritz
Copy link
Author

@eyakubovich Yes, I'm running in a VPC.

It's logged as vxlan is new and I want extra logging. I think the problem is somewhere else.

I see. However, the same log statements appear nowhere else in the logs and only start appearing around the same time flannel stops working.

Pinging eth0 on the remote machine works:

PING 10.0.0.126 (10.0.0.126) 56(84) bytes of data.
64 bytes from 10.0.0.126: icmp_seq=1 ttl=64 time=0.638 ms
64 bytes from 10.0.0.126: icmp_seq=2 ttl=64 time=0.370 ms

Pinging flannel0 (this is docker0, right?) on the local machine works for both machines:

PING 10.10.109.1 (10.10.109.1) 56(84) bytes of data.
64 bytes from 10.10.109.1: icmp_seq=1 ttl=64 time=0.046 ms
64 bytes from 10.10.109.1: icmp_seq=2 ttl=64 time=0.035 ms

Pinging flannel.1 on the local machine works for both machines:

PING 10.10.109.0 (10.10.109.0) 56(84) bytes of data.
64 bytes from 10.10.109.0: icmp_seq=1 ttl=64 time=0.054 ms
64 bytes from 10.10.109.0: icmp_seq=2 ttl=64 time=0.040 ms

Pinging either docker0 or flannel.1 on the remote machine does not work.

This happens in tcpdump:

Sending machine:

sh-4.2# tcpdump -v 'port 8472'
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
02:56:02.327762 IP (tos 0x0, ttl 64, id 11321, offset 0, flags [none], proto UDP (17), length 134)
    ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 64, id 16107, offset 0, flags [DF], proto ICMP (1), length 84)
    ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 1, length 64
02:56:03.327960 IP (tos 0x0, ttl 64, id 11922, offset 0, flags [none], proto UDP (17), length 134)
    ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 64, id 16494, offset 0, flags [DF], proto ICMP (1), length 84)
    ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 2, length 64
02:56:04.327951 IP (tos 0x0, ttl 64, id 12315, offset 0, flags [none], proto UDP (17), length 134)
    ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 64, id 17120, offset 0, flags [DF], proto ICMP (1), length 84)
    ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 3, length 64
02:56:05.327949 IP (tos 0x0, ttl 64, id 12358, offset 0, flags [none], proto UDP (17), length 134)
    ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 64, id 17866, offset 0, flags [DF], proto ICMP (1), length 84)
    ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 4, length 64
02:56:06.327948 IP (tos 0x0, ttl 64, id 12663, offset 0, flags [none], proto UDP (17), length 134)
    ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 64, id 18023, offset 0, flags [DF], proto ICMP (1), length 84)
    ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 5, length 64
02:56:07.327952 IP (tos 0x0, ttl 64, id 12918, offset 0, flags [none], proto UDP (17), length 134)
    ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 64, id 18240, offset 0, flags [DF], proto ICMP (1), length 84)
    ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 6, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel

Receiving machine (packets from other machine than the sending machine are received as well):

sh-4.2# tcpdump 'port 8472'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
02:56:00.372779 IP ip-10-0-0-130.ec2.internal.36461 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-85-0.ec2.internal.55896 > ip-10-10-109-8.ec2.internal.amqp: Flags [P.], seq 439869726:439869734, ack 4278089106, win 218, options [nop,nop,TS val 684964124 ecr 684939401], length 8
02:56:00.372860 IP ip-10-0-0-129.ec2.internal.44026 > ip-10-0-0-130.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-109-8.ec2.internal.amqp > ip-10-10-85-0.ec2.internal.55896: Flags [.], ack 8, win 217, options [nop,nop,TS val 684954401 ecr 684964124], length 0
02:56:02.326245 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 1, length 64
02:56:03.326735 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 2, length 64
02:56:04.214092 IP ip-10-0-0-129.ec2.internal.44026 > ip-10-0-0-130.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-109-8.ec2.internal.amqp > ip-10-10-85-0.ec2.internal.55896: Flags [P.], seq 1:9, ack 8, win 217, options [nop,nop,TS val 684958242 ecr 684964124], length 8
02:56:04.214546 IP ip-10-0-0-130.ec2.internal.36461 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-85-0.ec2.internal.55896 > ip-10-10-109-8.ec2.internal.amqp: Flags [.], ack 9, win 218, options [nop,nop,TS val 684967966 ecr 684958242], length 0
02:56:04.326424 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 3, length 64
02:56:05.326529 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 4, length 64
02:56:06.326458 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 5, length 64
02:56:07.326805 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 6, length 64
02:56:07.761686 IP ip-10-0-0-128.ec2.internal.60215 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-78-0.ec2.internal.57891 > ip-10-10-109-8.ec2.internal.amqp: Flags [P.], seq 1085651941:1085651949, ack 4057287077, win 218, options [nop,nop,TS val 684972546 ecr 684946790], length 8
02:56:07.761783 IP ip-10-0-0-129.ec2.internal.34077 > ip-10-0-0-128.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-109-8.ec2.internal.amqp > ip-10-10-78-0.ec2.internal.57891: Flags [.], ack 8, win 217, options [nop,nop,TS val 684961790 ecr 684972546], length 0
02:56:08.326477 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 7, length 64
02:56:09.326445 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 8, length 64
02:56:10.326695 IP ip-10-0-0-126.ec2.internal.43647 > ip-10-0-0-129.ec2.internal.otv: OTV, flags [I] (0x08), overlay 0, instance 1
IP ip-10-10-1-0.ec2.internal > ip-10-10-109-1.ec2.internal: ICMP echo request, id 22316, seq 9, length 64
^C
15 packets captured
15 packets received by filter
0 packets dropped by kernel

So it seems like packets are arriving on the overlay network...

@dennybritz
Copy link
Author

Based on the above it seems like this is a one-way issue. If I switch receiving/sending machines I get

 ping 10.10.1.1
PING 10.10.1.1 (10.10.1.1) 56(84) bytes of data.
From 10.10.109.0 icmp_seq=1 Destination Host Unreachable
From 10.10.109.0 icmp_seq=2 Destination Host Unreachable
From 10.10.109.0 icmp_seq=3 Destination Host Unreachable
From 10.10.109.0 icmp_seq=4 Destination Host Unreachable

And no ICMP packets are being captured on port 8472.

@eyakubovich
Copy link
Contributor

@dennybritz and the tcpdump confirms that as well -- we see Echo Request make it from one host to another but no sightings of Echo Reply. Let's do a few tests on the box that had nothing captured. As next step, let's see if the routes are still there: netstat -rn (we want to see a line that routes 10.10.0.0/16 via flannel.1).

Now, keep running ping 10.10.1.0. We'll see if there's an ARP entry that maps 10.10.1.0 to some MAC:
arp. Next we want to see if vxlan's forwarding database has the route (VXLAN is L2 technology so it has a bridge-like forwarding database that maps MAC addresses to remote host's IPs). Run bridge fdb and we want to see an entry mapping the MAC we saw in arp output to 10.0.0.126.

Can you also paste the output if ifconfig and ip addr just to make sure there's nothing odd here.

I'm also going to setup an cluster on EC2 with VPC and run it over the next few days.

@eyakubovich
Copy link
Contributor

Please also do a quick dmesg to make sure there's nothing odd showing up there.

@dennybritz
Copy link
Author

@eyakubovich

ifconfig
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 8951
        inet 10.10.109.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::5484:7aff:fefe:9799  prefixlen 64  scopeid 0x20<link>
        ether 56:84:7a:fe:97:99  txqueuelen 0  (Ethernet)
        RX packets 81863  bytes 34079868 (32.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 69889  bytes 28946691 (27.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 10.0.0.129  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fe80::80e:65ff:fe61:cea4  prefixlen 64  scopeid 0x20<link>
        ether 0a:0e:65:61:ce:a4  txqueuelen 1000  (Ethernet)
        RX packets 24867028  bytes 17914452744 (16.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20723760  bytes 3348891261 (3.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 8951
        inet 10.10.109.0  netmask 255.255.0.0  broadcast 0.0.0.0
        inet6 fe80::107d:bcff:fe75:191f  prefixlen 64  scopeid 0x20<link>
        ether 12:7d:bc:75:19:1f  txqueuelen 0  (Ethernet)
        RX packets 48029  bytes 26666114 (25.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 45038  bytes 28195557 (26.8 MiB)
        TX errors 0  dropped 2051 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 27187  bytes 5091098 (4.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 27187  bytes 5091098 (4.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth3834: flags=67<UP,BROADCAST,RUNNING>  mtu 8951
        inet6 fe80::7c84:40ff:fecc:1f16  prefixlen 64  scopeid 0x20<link>
        ether 7e:84:40:cc:1f:16  txqueuelen 1000  (Ethernet)
        RX packets 17864  bytes 5507830 (5.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 17913  bytes 1527268 (1.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth473d: flags=67<UP,BROADCAST,RUNNING>  mtu 8951
        inet6 fe80::d426:3dff:feca:d64d  prefixlen 64  scopeid 0x20<link>
        ether d6:26:3d:ca:d6:4d  txqueuelen 1000  (Ethernet)
        RX packets 14456  bytes 1480514 (1.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8313  bytes 548832 (535.9 KiB)
        TX errors 0  dropped 1 overruns 0  carrier 0  collisions 0

veth7d7f: flags=67<UP,BROADCAST,RUNNING>  mtu 8951
        inet6 fe80::7009:a4ff:fef9:efc2  prefixlen 64  scopeid 0x20<link>
        ether 72:09:a4:f9:ef:c2  txqueuelen 1000  (Ethernet)
        RX packets 9  bytes 738 (738.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 78  bytes 4404 (4.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP qlen 1000
    link/ether 0a:0e:65:61:ce:a4 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.129/24 brd 10.0.0.255 scope global dynamic eth0
       valid_lft 3083sec preferred_lft 3083sec
    inet6 fe80::80e:65ff:fe61:cea4/64 scope link 
       valid_lft forever preferred_lft forever
105: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN 
    link/ether 12:7d:bc:75:19:1f brd ff:ff:ff:ff:ff:ff
    inet 10.10.109.0/16 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::107d:bcff:fe75:191f/64 scope link 
       valid_lft forever preferred_lft forever
110: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UP 
    link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff
    inet 10.10.109.1/24 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::5484:7aff:fefe:9799/64 scope link 
       valid_lft forever preferred_lft forever
120: veth7d7f: <BROADCAST,UP,LOWER_UP> mtu 8951 qdisc pfifo_fast master docker0 state UP qlen 1000
    link/ether 72:09:a4:f9:ef:c2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7009:a4ff:fef9:efc2/64 scope link 
       valid_lft forever preferred_lft forever
122: veth473d: <BROADCAST,UP,LOWER_UP> mtu 8951 qdisc pfifo_fast master docker0 state UP qlen 1000
    link/ether d6:26:3d:ca:d6:4d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::d426:3dff:feca:d64d/64 scope link 
       valid_lft forever preferred_lft forever
124: veth3834: <BROADCAST,UP,LOWER_UP> mtu 8951 qdisc pfifo_fast master docker0 state UP qlen 1000
    link/ether 7e:84:40:cc:1f:16 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7c84:40ff:fecc:1f16/64 scope link 
       valid_lft forever preferred_lft forever
netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG        0 0          0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 eth0
10.0.0.1        0.0.0.0         255.255.255.255 UH        0 0          0 eth0
10.10.0.0       0.0.0.0         255.255.0.0     U         0 0          0 flannel.1
10.10.109.0     0.0.0.0         255.255.255.0   U         0 0          0 docker0

The ARP table while running ping, seems like there is no mapping?

ip-10-10-109-4.ec2.inte          (incomplete)                              docker0
ip-10-10-85-0.ec2.inter  ether   06:c6:77:33:ac:46   C                     flannel.1
ip-10-0-0-128.ec2.inter  ether   0a:0a:6b:b6:37:2a   C                     eth0
ip-10-10-109-5.ec2.inte  ether   d2:9f:58:80:9c:a3   C                     docker0
ip-10-10-109-8.ec2.inte  ether   16:eb:b7:b4:78:9a   C                     docker0
ip-10-0-0-130.ec2.inter  ether   0a:79:35:13:84:b0   C                     eth0
ip-10-10-109-3.ec2.inte  ether   22:31:58:f0:29:55   C                     docker0
ip-10-0-0-126.ec2.inter  ether   0a:d1:53:10:59:72   C                     eth0
ip-10-10-92-0.ec2.inter  ether   b6:15:9f:ed:a9:99   C                     flannel.1
ip-10-10-1-1.ec2.intern          (incomplete)                              flannel.1
ip-10-10-1-0.ec2.intern          (incomplete)                              flannel.1
ip-10-10-17-3.ec2.inter          (incomplete)                              flannel.1
ip-10-0-0-1.ec2.interna  ether   0a:09:1c:0d:72:c1   C                     eth0
ip-10-0-0-2.ec2.interna  ether   0a:09:1c:0d:72:c1   C                     eth0
ip-10-10-78-0.ec2.inter  ether   86:cb:0b:78:50:88   C                     flannel.1
ip-10-10-109-7.ec2.inte  ether   96:df:6b:c6:22:ba   C                     docker0
ip-10-0-0-167.ec2.inter  ether   0a:a1:99:3d:39:b7   C                     eth0
bridge fdb
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:61:ce:a4 dev eth0 self permanent
06:c6:77:33:ac:46 dev flannel.1 dst 10.0.0.130 self 
86:cb:0b:78:50:88 dev flannel.1 dst 10.0.0.128 self 
56:84:7a:fe:97:99 dev docker0 vlan 0 permanent
96:df:6b:c6:22:ba dev veth473d vlan 0 
72:09:a4:f9:ef:c2 dev veth7d7f vlan 0 permanent
d6:26:3d:ca:d6:4d dev veth473d vlan 0 permanent
7e:84:40:cc:1f:16 dev veth3834 vlan 0 permanent
16:eb:b7:b4:78:9a dev veth3834 vlan 0 
33:33:00:00:00:01 dev veth7d7f self permanent
01:00:5e:00:00:01 dev veth7d7f self permanent
33:33:ff:f9:ef:c2 dev veth7d7f self permanent
33:33:00:00:00:01 dev veth473d self permanent
01:00:5e:00:00:01 dev veth473d self permanent
33:33:ff:ca:d6:4d dev veth473d self permanent
33:33:00:00:00:01 dev veth3834 self permanent
01:00:5e:00:00:01 dev veth3834 self permanent
33:33:ff:cc:1f:16 dev veth3834 self permanent

This is the tail of dmesg:

[627839.752731] docker0: port 1(veth7d7f) entered forwarding state
[627839.758218] docker0: port 1(veth7d7f) entered forwarding state
[627854.816087] docker0: port 1(veth7d7f) entered forwarding state
[627867.690771] docker0: port 2(vetha248) entered disabled state
[627867.694547] device vetha248 left promiscuous mode
[627867.696624] docker0: port 2(vetha248) entered disabled state
[627930.186824] device veth473d entered promiscuous mode
[627930.192813] IPv6: ADDRCONF(NETDEV_UP): veth473d: link is not ready
[627930.201281] IPv6: ADDRCONF(NETDEV_CHANGE): veth473d: link becomes ready
[627930.206117] docker0: port 2(veth473d) entered forwarding state
[627930.210761] docker0: port 2(veth473d) entered forwarding state
[627945.248064] docker0: port 2(veth473d) entered forwarding state
[628006.978642] docker0: port 3(vethf5ca) entered disabled state
[628006.982581] device vethf5ca left promiscuous mode
[628006.984739] docker0: port 3(vethf5ca) entered disabled state
[628037.723530] device veth3834 entered promiscuous mode
[628037.729605] IPv6: ADDRCONF(NETDEV_UP): veth3834: link is not ready
[628037.737120] IPv6: ADDRCONF(NETDEV_CHANGE): veth3834: link becomes ready
[628037.741944] docker0: port 3(veth3834) entered forwarding state
[628037.746370] docker0: port 3(veth3834) entered forwarding state
[628052.768068] docker0: port 3(veth3834) entered forwarding state
[684719.043883] device veth21bd entered promiscuous mode
[684719.049382] IPv6: ADDRCONF(NETDEV_UP): veth21bd: link is not ready
[684719.058117] IPv6: ADDRCONF(NETDEV_CHANGE): veth21bd: link becomes ready
[684719.062250] docker0: port 4(veth21bd) entered forwarding state
[684719.066120] docker0: port 4(veth21bd) entered forwarding state
[684719.100427] docker0: port 4(veth21bd) entered disabled state
[684719.107517] device veth21bd left promiscuous mode
[684719.109867] docker0: port 4(veth21bd) entered disabled state
[684858.707589] Netfilter messages via NETLINK v0.30.
[684858.710216] device eth0 entered promiscuous mode
[684906.146435] device eth0 left promiscuous mode
[685098.401399] device eth0 entered promiscuous mode
[685114.527857] device eth0 left promiscuous mode
[685252.427875] device eth0 entered promiscuous mode
[685264.962632] device eth0 left promiscuous mode
[686772.549967] device eth0 entered promiscuous mode
[686784.037602] device eth0 left promiscuous mode
[687274.354654] device eth0 entered promiscuous mode
[687277.540947] device eth0 left promiscuous mode
[687446.522131] device eth0 entered promiscuous mode
[687537.721096] device eth0 left promiscuous mode
[691412.256376] device eth0 entered promiscuous mode

@eyakubovich
Copy link
Contributor

@dennybritz exactly. I'm assuming you were pinging 10.10.1.0 or 10.10.1.1 and we see ARP resolution in progress but not resolved:

ip-10-10-1-1.ec2.intern          (incomplete)                              flannel.1
ip-10-10-1-0.ec2.intern          (incomplete)                              flannel.1

That would be causing the problem. Now, can you look into flannel logs. At that time there should be something like:

vxlan.go:264] L3 miss: 10.10.1.1
device.go:228] calling NeighSet: 10.10.1.1, xx:xx:xx:xx:xx:xx
vxlan.go:275] AddL3 succeeded (or failed)

ARP resolutions are forwarded up to flanneld and it inserts entries into the ARP table based on its knowledge.

@dennybritz
Copy link
Author

@eyakubovich Yep, that's right, though I don't see any calling NeighSet: 10.10.1.1, xx:xx:xx:xx:xx:xx lines.

Nov 25 05:11:47 rabi flanneld[10068]: I1125 05:11:47.892862 10068 vxlan.go:264] L3 miss: 10.10.1.1
Nov 25 05:11:47 rabi flanneld[10068]: I1125 05:11:47.892912 10068 vxlan.go:268] Route for 10.10.1.1 not found
Nov 25 05:11:48 rabi flanneld[10068]: I1125 05:11:48.895367 10068 vxlan.go:264] L3 miss: 10.10.1.1
Nov 25 05:11:48 rabi flanneld[10068]: I1125 05:11:48.895418 10068 vxlan.go:268] Route for 10.10.1.1 not found
Nov 25 05:11:49 rabi flanneld[10068]: I1125 05:11:49.897387 10068 vxlan.go:264] L3 miss: 10.10.1.1
Nov 25 05:11:49 rabi flanneld[10068]: I1125 05:11:49.897435 10068 vxlan.go:268] Route for 10.10.1.1 not found

This is what my etcd looks like, if that helps? Is PublicIP supposed to be 0.0.0.0? https://gist.github.com/dennybritz/5ba9d14fd52fad51a700

@eyakubovich
Copy link
Contributor

@dennybritz no, PublicIP needs to be the IP of your eth0. And it's 0.0.0.0 for all entries. But since flannel worked for a while, it had to be valid for some period of time. This is really odd.

@eyakubovich
Copy link
Contributor

flannel gets a lease for 24 hours (via etcd) and renews it with 1 hour to go. I wonder if something broke in the renewing logic and it renews after 12 hours but sets PublicIP to 0.0.0.0. But then your TTL still shows over 10 hours remaining. I'm going to look into this.

@dennybritz
Copy link
Author

Thanks. I'm not sure if this is related, but my log is full of these messages (on all machines):

Nov 25 05:03:47 rabi flanneld[10068]: W1125 05:03:47.174537 10068 subnet.go:405] Watch of subnet leases failed because etcd index outside history window

However, these were appearing when flannel was working as well.

@eyakubovich
Copy link
Contributor

@dennybritz If it does renew, it should print:
Lease renewed, new expiration: ... in the log. Also, this propagates to all other nodes which should print: Subnet added: <ip-address>

@eyakubovich
Copy link
Contributor

This happens if etcd advances too fast but flannel has logic to recover (although I wonder if there could be a bug there). Is your cluster busy? In general, it shouldn't constantly fall behind. But this gives me some clues to investigate tomorrow.

@dennybritz
Copy link
Author

@eyakubovich Looking at the logs:

journalctl -u flannel.service | grep 'renewed'
Nov 19 03:42:17 rabi flanneld[2132]: I1119 03:42:17.692125 02132 subnet.go:415] Lease renewed, new expiration: 2014-11-20 03:42:17.690585721 +0000 UTC
Nov 20 02:42:17 rabi flanneld[2132]: I1120 02:42:17.696573 02132 subnet.go:415] Lease renewed, new expiration: 2014-11-21 02:42:17.694406103 +0000 UTC
Nov 21 06:46:59 rabi flanneld[22435]: I1121 06:46:59.382488 22435 subnet.go:442] Lease renewed, new expiration: 2014-11-22 06:46:59.379720337 +0000 UTC
Nov 22 09:39:30 rabi flanneld[28096]: I1122 09:39:30.420100 28096 subnet.go:442] Lease renewed, new expiration: 2014-11-23 09:39:30.419292114 +0000 UTC
Nov 23 09:51:50 rabi flanneld[22225]: I1123 09:51:50.133352 22225 subnet.go:442] Lease renewed, new expiration: 2014-11-24 09:51:50.133835176 +0000 UTC
Nov 24 15:18:17 rabi flanneld[10068]: I1124 15:18:17.699239 10068 subnet.go:442] Lease renewed, new expiration: 2014-11-25 15:18:17.694001804 +0000 UTC
journalctl -u flannel.service | grep 'Subnet added: 10.10.1.0'
Nov 23 16:18:43 rabi flanneld[10068]: I1123 16:18:43.897713 10068 vxlan.go:190] Subnet added: 10.10.1.0/24
Nov 24 15:18:43 rabi flanneld[10068]: I1124 15:18:43.894272 10068 vxlan.go:190] Subnet added: 10.10.1.0/24

I don't think my cluster is busy, I've mainly been testing things. However, these are t2.small instances that can't handle much. But looking at CPU/memory utilization everything should be fine.

@eyakubovich
Copy link
Contributor

@dennybritz The renewals seem to be happening no less than 23 hours apart. Not sure what could be happening after 12 hours but something definitely messes up etcd. Will investigate and let you know.

@eyakubovich
Copy link
Contributor

@dennybritz I found and fixed the bug (introduced during some refactor) that was zeroing out PublicIP during the lease renewal. Can you check if it stays up now?

@dennybritz
Copy link
Author

That's great news, thanks! I am trying it.

@dennybritz
Copy link
Author

@eyakubovich Has been up for more than 24 hours now so I think this is resolved!

@eyakubovich
Copy link
Contributor

Great. Will consider this resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants