-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flannel cross node traffic does not work with latest systemd 242 due to a race #1155
Comments
Here's a quick documentation of the workaround (at least this worked in my lab):
After this, I rebooted my controllers and workers and flannel's overlay worked. |
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the one set by flannel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information. In #279 we tried adding the MACAddressPolicy=none setting to the existing 50-flannel.network file. But the change should have been in a .link file, not a .network file.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information. In #279 we tried adding the MACAddressPolicy=none setting to the existing 50-flannel.network file. But the change should have been in a .link file, not a .network file.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information. In #279 we tried adding the MACAddressPolicy=none setting to the existing 50-flannel.network file. But the change should have been in a .link file, not a .network file.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information. In #279 we tried adding the MACAddressPolicy=none setting to the existing 50-flannel.network file. But the change should have been in a .link file, not a .network file.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information. In #279 we tried adding the MACAddressPolicy=none setting to the existing 50-flannel.network file. But the change should have been in a .link file, not a .network file.
When setting up flannel interfaces, use MACAddressPolicy=none, so that the MAC Address used is the initial one set by the kernel and not the one assigned by systemd. See flannel-io/flannel#1155 for more information. In #279 we tried adding the MACAddressPolicy=none setting to the existing 50-flannel.network file. But the change should have been in a .link file, not a .network file.
Looking at the cross-references here, I think more people are stepping on this. Perhaps it would be worth carrying the link unit in this repo, so that it's easier for people to notice and install it. |
Just got bit by this issue, spent several hours trying to understand why a single node can't communicate with others. At least until flanneld is killed and then it suddenly works. Tnx for reporting this issue in detail @mcastelino! Yeah, many will be bitten and pull hair over this... |
I think there is a better solution than configuring systemd - netlink library allows to set a peer hardware address when creating a link, which should be a sane workaround. systemd shouldn't touch links with addresses already assigned.
Let me prepare a PR. |
systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <[email protected]>
systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <[email protected]>
work for me |
systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <[email protected]> (cherry picked from commit 0198d5d)
* vxlan: Generate MAC address before creating a link systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <[email protected]> (cherry picked from commit 0198d5d) * Concern only about flannel ip addresses Currently flannel interface ip addresses are checked on startup when using vxlan and ipip backends. If multiple addresses are found, startup fails fatally. If only one address is found and is not the currently leased one, it will be assumed that it comes from a previous lease and be removed. This criteria seems arbitrary both in how it is done and in its timing. It may cause failures in situations where it might not be strictly necessary like for example if the node is running a dhcp client that is assigning link local addresses to all interfaces. It also might fail at flannel unexpected restarts which are completly unrelated to the external event that caused the unexpected modification in the flannel interface. This patch proposes to concern and check only ip address within the flannel network and takes the simple approach to ignore any other ip addresses assuming these would pose no problem on flannel operation. A discarded but more agressive alternative would be to remove all addresses that are not the currently leased one. Fixes flannel-io#1060 Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit 33a2fac) * Fix flannel hang if lease expired (cherry picked from commit 78035d0) * subnets: move forward the cursor to skip illegal subnet This PR fixs an issue when flannel gets illegal subnet event in watching leases, it doesn't move forward the etcd cursor and will stuck in the same invalid event forever. (cherry picked from commit 1a1b6f1) * fix cherry-pick glitches and test failures * disable udp backend tests since we don't actually have the udp backend in our fork Co-authored-by: Michal Rostecki <[email protected]> Co-authored-by: Jaime Caamaño Ruiz <[email protected]> Co-authored-by: Chun Chen <[email protected]> Co-authored-by: huangxuesen <[email protected]>
Expected Behavior
Cross node pod traffic should work, node to pod traffic should work across nodes.
Current Behavior
When running flannel with systemd 242+ there seems to be a race condition between flannel programming the mac address of the flannel.1 interface and systemd programming the mac address on the virtual interface. This results in all cross node traffic being dropped at layer 2 on the destination node due to incorrect destination vtep mac.
With systemd 242 the default policy is setup to be
MACAddressPolicy=persistent
/usr/lib/systemd/network/99-default.link
When flannel brings up the interface it programs the mac address and systemd then reprograms it again.
In the trace below you will see
But the arp tables on remote nodes are setup with a different mac address
d6:02:e3:df:ea:7a
vs5e:89:db:49:c6:a4
Looking at the netlink traces you see the mac address being changed twice, the first time by flannel and the second time to a different address by systemd based on its default policy
Possible Solution
MACAddressPolicy=none
on the flannel* interface on each system which hides the issue, but requires node level changesor
Steps to Reproduce (for bugs)
Context
Flannel and any flannel based network plugins stop working with systemd 242 (Canal).
This will impact other distributions when they upgrade to systemd 242 and beyond.
Your Environment
Clearlinux 4.19.53-53.lts2018 with
systemd 242 (242)
+PAM +AUDIT -SELINUX +IMA -APPARMOR -SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 -IDN -PCRE2 default-hierarchy=legacy
The text was updated successfully, but these errors were encountered: