-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS/OCP 4.5.13: Submariner + Globalnet: Pod with HostNetworking on GW Node to remoteService failing #995
Comments
On investigating this issue, it appears like its related to MTU mismatch and when the remoteCluster is sending ICMP unreachable - need to fragment packet, it does not seem to be properly handled. Following is the tcpdump on the Gateway node of cluster-west when e2e test-scenario is executed.
|
By setting a value of 1 (or 2) for /proc/sys/net/ipv4/tcp_mtu_probing on the Gateway Node, this problem is resolved.
|
Co
Cool, that seems like a better solution than mss clamping. Let me try it |
Ok, it doesn't work for this case. I will continue with the mss clamping |
On some platforms like AWS when using Globalnet, it was seen that Path MTU discovery was not happening properly because of this, one of the e2e test is failing when the sourcePod is on Gateway Node with HostNetworking enabled. This PR enables TCP Packetization-Layer Path MTU discovery when an ICMP black hole is detected by configuring the appropriate proc entry. Also, we update the base mss value to RFC4821 recommended value of 1024. This change is done only on the active Gateway node of the cluster. Fixes issue: submariner-io#995 Signed-Off-by: Sridhar Gaddam <[email protected]>
On some platforms like AWS when using Globalnet, it was seen that Path MTU discovery was not happening properly because of this, one of the e2e test is failing when the sourcePod is on Gateway Node with HostNetworking enabled. This PR enables TCP Packetization-Layer Path MTU discovery when an ICMP black hole is detected by configuring the appropriate proc entry. Also, we update the base mss value to RFC4821 recommended value of 1024. This change is done only on the active Gateway node of the cluster. Fixes issue: submariner-io#995 Signed-Off-by: Sridhar Gaddam <[email protected]>
On some platforms like AWS when using Globalnet, it was seen that Path MTU discovery was not happening properly because of this, one of the e2e test is failing when the sourcePod is on Gateway Node with HostNetworking enabled. This PR enables TCP Packetization-Layer Path MTU discovery when an ICMP black hole is detected by configuring the appropriate proc entry. Also, we update the base mss value to RFC4821 recommended value of 1024. This change is done only on the active Gateway node of the cluster. Fixes issue: #995 Signed-Off-by: Sridhar Gaddam <[email protected]>
On some platforms like AWS when using Globalnet, it was seen that Path MTU discovery was not happening properly because of this, one of the e2e test is failing when the sourcePod is on Gateway Node with HostNetworking enabled. This PR enables TCP Packetization-Layer Path MTU discovery when an ICMP black hole is detected by configuring the appropriate proc entry. Also, we update the base mss value to RFC4821 recommended value of 1024. This change is done only on the active Gateway node of the cluster. Fixes issue: submariner-io#995 Signed-Off-by: Sridhar Gaddam <[email protected]> (cherry picked from commit 4de34d7)
On some platforms like AWS when using Globalnet, it was seen that Path MTU discovery was not happening properly because of this, one of the e2e test is failing when the sourcePod is on Gateway Node with HostNetworking enabled. This PR enables TCP Packetization-Layer Path MTU discovery when an ICMP black hole is detected by configuring the appropriate proc entry. Also, we update the base mss value to RFC4821 recommended value of 1024. This change is done only on the active Gateway node of the cluster. Fixes issue: submariner-io#995 Signed-Off-by: Sridhar Gaddam <[email protected]> (cherry picked from commit 4de34d7)
On some platforms like AWS when using Globalnet, it was seen that Path MTU discovery was not happening properly because of this, one of the e2e test is failing when the sourcePod is on Gateway Node with HostNetworking enabled. This PR enables TCP Packetization-Layer Path MTU discovery when an ICMP black hole is detected by configuring the appropriate proc entry. Also, we update the base mss value to RFC4821 recommended value of 1024. This change is done only on the active Gateway node of the cluster. Fixes issue: #995 Signed-Off-by: Sridhar Gaddam <[email protected]> (cherry picked from commit 4de34d7)
In the previous fix to this issue, we enabled PL-PMTUD only when an ICMP blackhole is detected (aka tcp_mtu_probing value of 1), but during testing it was seen that it sometimes takes time for MTU discovery and e2e fails occasionally. In this PR, we enable PL-PMTUD always (aka tcp_mtu_probing value of 2) after which the e2e tests pass consistently. Fixes issue: submariner-io#995 Signed-Off-by: Sridhar Gaddam <[email protected]>
In the previous fix to this issue, we enabled PL-PMTUD only when an ICMP blackhole is detected (aka tcp_mtu_probing value of 1), but during testing it was seen that it sometimes takes time for MTU discovery and e2e fails occasionally. In this PR, we enable PL-PMTUD always (aka tcp_mtu_probing value of 2) after which the e2e tests pass consistently. Fixes issue: #995 Signed-Off-by: Sridhar Gaddam <[email protected]>
…1182) In the previous fix to this issue, we enabled PL-PMTUD only when an ICMP blackhole is detected (aka tcp_mtu_probing value of 1), but during testing it was seen that it sometimes takes time for MTU discovery and e2e fails occasionally. In this PR, we enable PL-PMTUD always (aka tcp_mtu_probing value of 2) after which the e2e tests pass consistently. Fixes issue: submariner-io#995 Signed-Off-by: Sridhar Gaddam <[email protected]> (cherry picked from commit fce257f)
In the previous fix to this issue, we enabled PL-PMTUD only when an ICMP blackhole is detected (aka tcp_mtu_probing value of 1), but during testing it was seen that it sometimes takes time for MTU discovery and e2e fails occasionally. In this PR, we enable PL-PMTUD always (aka tcp_mtu_probing value of 2) after which the e2e tests pass consistently. Fixes issue: #995 Signed-Off-by: Sridhar Gaddam <[email protected]> (cherry picked from commit fce257f)
On some platforms like AWS when using Globalnet, it was seen that Path MTU discovery was not happening properly because of this, one of the e2e test is failing when the sourcePod is on Gateway Node with HostNetworking enabled. This PR enables TCP Packetization-Layer Path MTU discovery when an ICMP black hole is detected by configuring the appropriate proc entry. Also, we update the base mss value to RFC4821 recommended value of 1024. This change is done only on the active Gateway node of the cluster. Fixes issue: submariner-io/submariner#995 Signed-Off-by: Sridhar Gaddam <[email protected]>
In the previous fix to this issue, we enabled PL-PMTUD only when an ICMP blackhole is detected (aka tcp_mtu_probing value of 1), but during testing it was seen that it sometimes takes time for MTU discovery and e2e fails occasionally. In this PR, we enable PL-PMTUD always (aka tcp_mtu_probing value of 2) after which the e2e tests pass consistently. Fixes issue: submariner-io/submariner#995 Signed-Off-by: Sridhar Gaddam <[email protected]>
What happened:
In a Submariner Globalnet deployment, when e2e tests are executed on AWS/OCP Clusters, one of them is consistently failing.
Basically when a Pod with HostNetworking is trying to connect to a remoteService its failing.
Interestingly, if you look at the output of the listener pod, we can see that the connector pod was indeed able to reach the listener, but it was unable to send the UUID string (that is sent as part of e2e tests).
What you expected to happen:
e2e tests should pass consistently.
Anything else we need to know?:
Environment:
subctl version
): v0.8.0-rc0kubectl version
): v1.18.3+47c0e71cat /etc/os-release
): Alpine Linuxuname -a
): 4.18.0-193.23.1.el8_2.x86_64The text was updated successfully, but these errors were encountered: