Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea does not run on Photon OS 3 #591

Closed
antoninbas opened this issue Apr 3, 2020 · 13 comments · Fixed by #640
Closed

Antrea does not run on Photon OS 3 #591

antoninbas opened this issue Apr 3, 2020 · 13 comments · Fixed by #640
Assignees
Labels
kind/documentation Categorizes issue or PR as related to a documentation.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
When creating a single node cluster with kubeadm on a Photon OS VM, Pod Networking does not work. For example, trying to ping the local gw0 from any Pod does not work. When looking at the Antrea agent logs, one can see the following:

time="2020-04-03T01:13:07Z" level=info msg="Openflow Connection for new switch: 00:00:0a:a0:6f:8d:a6:4c"
I0403 01:13:07.114199       1 ofctrl_bridge.go:178] OFSwitch is connected: 00:00:0a:a0:6f:8d:a6:4c
time="2020-04-03T01:13:07Z" level=error msg="Received bundle error msg: [4 4 0 120 0 0 0 51 79 78 70 0 0 0 8 253 0 0 0 4 0 0 0 1 4 14 0 96 0 0 0 51 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 200 255 255 255 255 255 255 255 255 255 255 255 255 0 0 0 0 0 1 0 10 128 0 10 2 8 0 0 0 0 0 0 0 0 4 0 32 0 0 0 0 255 255 0 24 0 0 35 32 0 35 0 0 0 0 0 0 255 240 31 0 0 0 0 0]"
time="2020-04-03T01:13:07Z" level=error msg="Received bundle error msg: [4 4 0 128 0 0 0 59 79 78 70 0 0 0 8 253 0 0 0 4 0 0 0 1 4 14 0 104 0 0 0 59 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 105 0 0 0 0 0 0 190 255 255 255 255 255 255 255 255 255 255 255 255 0 0 0 0 0 1 0 22 128 0 10 2 8 0 0 1 211 8 0 0 0 33 0 0 0 33 0 0 0 4 0 32 0 0 0 0 255 255 0 24 0 0 35 32 0 35 0 1 0 0 0 0 255 240 110 0 0 0 0 0]"
time="2020-04-03T01:13:07Z" level=error msg="Received bundle error msg: [4 4 0 168 0 0 0 57 79 78 70 0 0 0 8 253 0 0 0 4 0 0 0 1 4 14 0 144 0 0 0 57 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 105 0 0 0 0 0 0 200 255 255 255 255 255 255 255 255 255 255 255 255 0 0 0 0 0 1 0 34 128 0 10 2 8 0 0 1 211 8 0 0 0 33 0 0 0 33 0 1 1 8 0 0 0 1 0 0 255 255 0 0 0 0 0 0 0 4 0 56 0 0 0 0 255 255 0 48 0 0 35 32 0 35 0 1 0 0 0 0 255 240 110 0 0 0 0 0 255 255 0 24 0 0 35 32 0 7 0 31 0 1 214 4 0 0 0 0 0 0 0 32]"

BTW, @wenyingd do you think these log messages can be displayed in a more user-friendly format :) ?

If I dump the flows, I can see that table 30 is empty, and this flow is therefore missing:

table=30, priority=200,ip actions=ct(table=31,zone=65520)

Trying to add the flow manually gives the following error:

root@photon-machine:/# ovs-ofctl add-flow br-int 'table=30,priority=200,ip,actions=ct(table=31,zone=65520)'
OFPT_ERROR (xid=0x8): NXBAC_CT_DATAPATH_SUPPORT
OFPT_FLOW_MOD (xid=0x8): ADD table:30 priority=200,ip actions=ct(table=31,zone=65520)

To Reproduce

Versions:
Antrea: v0.5.1

root@photon-machine [ ~ ]# modinfo openvswitch
filename:       /lib/modules/4.19.15-1.ph3-esx/kernel/net/openvswitch/openvswitch.ko.xz
alias:          net-pf-16-proto-16-family-ovs_ct_limit
alias:          net-pf-16-proto-16-family-ovs_meter
alias:          net-pf-16-proto-16-family-ovs_packet
alias:          net-pf-16-proto-16-family-ovs_flow
alias:          net-pf-16-proto-16-family-ovs_vport
alias:          net-pf-16-proto-16-family-ovs_datapath
license:        GPL
description:    Open vSwitch switching datapath
depends:        nf_conntrack,nf_nat,nf_conncount,nf_nat_ipv6,nf_nat_ipv4,nf_defrag_ipv6,nsh
intree:         Y
name:           openvswitch
vermagic:       4.19.15-1.ph3-esx SMP mod_unload
@antoninbas antoninbas added the bug label Apr 3, 2020
@antoninbas antoninbas added the kind/bug Categorizes issue or PR as related to a bug. label Apr 3, 2020
@antoninbas
Copy link
Contributor Author

@jianjuns @abhiraut FYI

@antoninbas
Copy link
Contributor Author

@wenyingd let me know if you need more information. I know we don't explicitly document that we support Photon OS, but the kernel looks recent to me and so I'm surprised that we see this error. If you want me to try to install something on my Photon OS VM, please let me know. Unfortunately I cannot give you SSH access since the VM is running locally on my laptop...

@wenyingd
Copy link
Contributor

wenyingd commented Apr 3, 2020

@antoninbas It looks photon doesn't support "ct" feature on the OVS. Could you help check the OVS kernel module version on the testing VM? In my memory, the OVS kernel module version should be higher than 2.6.

@tnqn
Copy link
Member

tnqn commented Apr 3, 2020

I remember @edwardbadboy found an issue that photos OS didn't compile multiple conntrack zone support by default. It looks like similar.

@antoninbas
Copy link
Contributor Author

I just saw this: https://github.com/vmware/photon/blob/master/SPECS/linux/linux-esx.spec#L322

Maybe a slightly more recent version of Photon OS will work?

@tnqn
Copy link
Member

tnqn commented Apr 3, 2020

Yes, I guess so.

@edwardbadboy
Copy link
Contributor

Hi Antonin,

Would you check the following command output?

grep CONFIG_NF_CONNTRACK_ZONES /boot/config-$(uname -r)

See if it's CONFIG_NF_CONNTRACK_ZONES=y. If not, it's the cause of the failure.

It could be when they compile the kernel, the zone support of conntrack module was not enabled. Previously when I tried Antrea on Photon OS, I recompiled the Photon kernel with that flag set to "y" ( edwardbadboy/photon@a6c3c10 )

I thought last time Jianjun said Photon developers agreed to turn on the switch by default. Let me check if the upstream Photon has that change. If not, I can submit the pull request to Photon upstream.

@edwardbadboy
Copy link
Contributor

I just saw this: https://github.com/vmware/photon/blob/master/SPECS/linux/linux-esx.spec#L322

Maybe a slightly more recent version of Photon OS will work?

Seems they already made the change. Let's use a more recent Photon OS version then.

@antoninbas
Copy link
Contributor Author

I ran tdnf upgrade linux-esx and it fixed that specific issue. Pod networking is still not working, so I'm looking into it.

@antoninbas antoninbas assigned antoninbas and unassigned wenyingd Apr 3, 2020
@antoninbas
Copy link
Contributor Author

Alright this was a combination of multiple things, but I managed to make it work:

  • tdnf upgrade linux-esx to upragde kernel
  • pick a correct cluster Pod CIDR (whoever reported the issue on Slack was using one that overlapped with the Service CIDR)
  • allow traffic on gw0: iptables -A INPUT -i gw0 -j ACCEPT. It seems that Photon OS has a strict firewall by default.

Maybe these things are worth documenting somewhere? @jianjuns

@tnqn
Copy link
Member

tnqn commented Apr 3, 2020

Perhaps we could solve the 3rd with antrea-agent if it's common for other CNIs to add such rules for their traffic. Right now we only add rules to FORWARD chain.

@jianjuns
Copy link
Contributor

jianjuns commented Apr 3, 2020

@antoninbas agreed we should document CONFIG_NF_CONNTRACK_ZONES and firewall rules. CONFIG_NF_CONNTRACK_ZONES is a known issue for Photon OS, and last time we pushed a change to enable it for the vSphere build.

@antoninbas antoninbas added kind/documentation Categorizes issue or PR as related to a documentation. and removed bug kind/bug Categorizes issue or PR as related to a bug. labels Apr 3, 2020
@abhiraut
Copy link
Contributor

abhiraut commented Apr 4, 2020

Perhaps we could solve the 3rd with antrea-agent if it's common for other CNIs to add such rules for their traffic. Right now we only add rules to FORWARD chain.

maybe check if Input policy is drop and only then apply the rule ?

antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 21, 2020
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 21, 2020
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 21, 2020
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 21, 2020
antoninbas added a commit that referenced this issue Apr 24, 2020
@antoninbas antoninbas added this to the Antrea v0.6.0 release milestone Apr 28, 2020
McCodeman pushed a commit to McCodeman/antrea that referenced this issue Jun 2, 2020
McCodeman pushed a commit that referenced this issue Jun 2, 2020
GraysonWu pushed a commit to GraysonWu/antrea that referenced this issue Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to a documentation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants