Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The network speed of the pod cannot reach the speed of the baremetal, only half the speed of the baremetal server #7926

Closed
ming12713 opened this issue Aug 14, 2023 · 13 comments

Comments

@ming12713
Copy link

ming12713 commented Aug 14, 2023

calico version: v3.26.1
kubernetes version: v1.26.6
calico installation spec

spec:
  calicoNetwork:
    bgp: Enabled
    hostPorts: Enabled
    ipPools:
    - blockSize: 26
      cidr: 10.244.0.0/16
      disableBGPExport: false
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()
    linuxDataplane: Iptables
    multiInterfaceMode: None
    nodeAddressAutodetectionV4:
      firstFound: true

iperf3 testing in pod

  1. iperf3 server pod running on airflow01 node with 10g nic
  2. iperf3 client pod running on airflow02 node with 10g nic
    img_v2_a1669bf3-63cc-41ae-ab47-afaab52e998g

iperf3 testing in baremetal node

  1. iperf3 server running on airflow01 node with 10g nic
  2. iperf3 client running on airflow02 node with 10g nic
    img_v2_c71ffefa-8297-4501-af07-3611e88162ag
@ming12713 ming12713 changed the title The network speed of the pod cannot reach the speed of the bare metalserver, only half the speed of the bare metalserver The network speed of the pod cannot reach the speed of the baremetal, only half the speed of the baremetal server Aug 14, 2023
@sridhartigera
Copy link
Member

@ming12713 I am not sure if I understand this correctly. Baremetal has 10G NICs and iperf output is ~10Gbps. Am I missing something?

@ming12713
Copy link
Author

@sridhartigera
Yes, the baremetal servers have 10G network card, and when using iperf to test on the baremetal servers, the speed reaches 10G. However, when pods are running on these baremetal servers using the Calcio plugin, point-to-point network testing does not achieve 10G. The two pods being tested are located on different baremetal server

@lwr20
Copy link
Member

lwr20 commented Aug 24, 2023

Halving of throughput generally indicates MTU issues. Often throughput is limited by maximum packets-per-second, and if there is an MTU issue in the path, that will lead to fragmentation and a doubling of the number of packets (which halves throughput)

See https://docs.tigera.io/calico/latest/networking/configuring/mtu

@ming12713
Copy link
Author

@lwr20 thanks
I've set the MTU to 8950 and then used iperf to perform bandwidth testing between different pods. However, the bandwidth still doesn't reach the bandwidth of the baremeta servers.

Screenshot from 2023-08-25 14-30-20

Screenshot from 2023-08-25 14-33-02

@lwr20
Copy link
Member

lwr20 commented Aug 25, 2023

OK, and what's the MTU of the network interface between the nodes (eth0 or whatever)? And the MTU set on any routers between the nodes?

@ming12713
Copy link
Author

@lwr20 thanks, baremeta server network interface mtu is 9000,nodes connected through 10G switch, and the 10G switch default setting of 1500 mtu.

@lwr20
Copy link
Member

lwr20 commented Aug 29, 2023

That doesn't sound good - if the server has 9000 MTU, switch should also have MTU=9000.

But on the other hand, that's the same for both baremetal and pod-pod case, so its clearly not the cause of this issue.

You sound like you're using VXLAN encapsulation (since you mentioned 8950 MTU setting in Calico). Do you need VXLAN encap at all in this scenario? ISTR there was a recent linux kernel bug with VXLAN checksum offloading. Can you try without VXLAN to establish if the problem is related to VXLAN or not?

@ming12713
Copy link
Author

ming12713 commented Aug 30, 2023

yes ,i use VXLANCrossSubnet encap, i think I've encountered the same bug you mentioned.
releate issue #7974

@lwr20
Copy link
Member

lwr20 commented Aug 30, 2023

I don't think that's the VXLAN checksum offload issue, that's a kernel crash, isn't it? The VXLAN checksum offload issue "just" causes dropped packets (I think)

Based on #4727 (comment)
Can you try setting featureDetectOverride: "ChecksumOffloadBroken=true" in the default FelixConfiguration and see if that fixes the issue please?

@ming12713
Copy link
Author

@lwr20
I changed the encapsulation VXLANCossSubnet mode to IPIPCrossSubnet, and after testing, I found that the bandwidth still couldn't be fully utilized. IPIP performs slightly worse than VXLAN in terms of performance.
Screenshot from 2023-08-31 08-59-37

Screenshot from 2023-08-31 08-52-39

@onesb23
Copy link

onesb23 commented Sep 11, 2023

I don't think that's the VXLAN checksum offload issue, that's a kernel crash, isn't it? The VXLAN checksum offload issue "just" causes dropped packets (I think)

Based on #4727 (comment) Can you try setting featureDetectOverride: "ChecksumOffloadBroken=true" in the default FelixConfiguration and see if that fixes the issue please?

This fixed the vxlan calico underspeed issue for me.

@mazdakn
Copy link
Member

mazdakn commented Sep 19, 2023

@ming12713 have you tried the fix that @lwr20 mentioned above?

@ming12713
Copy link
Author

@ming12713 have you tried the fix that @lwr20 mentioned above?
no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants