You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
=== RUN TestProxyLoadBalancerModeDSR
2024/09/19 15:31:32 Applying Antrea YAML
2024/09/19 15:31:34 Waiting for all Antrea DaemonSet Pods
2024/09/19 15:31:35 Checking CoreDNS deployment
fixtures.go:286: Creating 'testproxyloadbalancermodedsr-3i55riyi' K8s Namespace
=== RUN TestProxyLoadBalancerModeDSR/IPv4,withSessionAffinity
proxy_test.go:1182:
Error Trace: /home/runner/work/antrea/antrea/test/e2e/proxy_test.go:1182
/home/runner/work/antrea/antrea/test/e2e/proxy_test.go:1203
Error: Not equal:
expected: "1.1.1.1"
actual : "10.244.0.1"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-1.1.1.1
+10.244.0.1
Test: TestProxyLoadBalancerModeDSR/IPv4,withSessionAffinity
Messages: Client IP should be preserved with DSR mode
proxy_test.go:1187: Request #0 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #1 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #2 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #3 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #4 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #5 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #6 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #7 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #8 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1187: Request #9 from external-client-jjk3ndrv got hostname: agnhost-1
proxy_test.go:1182:
Error Trace: /home/runner/work/antrea/antrea/test/e2e/proxy_test.go:1182
/home/runner/work/antrea/antrea/test/e2e/proxy_test.go:1204
Error: Not equal:
expected: "10.244.0.58"
actual : "10.244.0.1"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-10.244.0.58
+10.244.0.1
Test: TestProxyLoadBalancerModeDSR/IPv4,withSessionAffinity
Messages: Client IP should be preserved with DSR mode
proxy_test.go:1187: Request #0 from internal-client got hostname: agnhost-1
proxy_test.go:1187: Request #1 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #2 from internal-client got hostname: agnhost-1
proxy_test.go:1187: Request #3 from internal-client got hostname: agnhost-1
proxy_test.go:1187: Request #4 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #5 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #6 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #7 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #8 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #9 from internal-client got hostname: agnhost-3
proxy_test.go:1197:
Error Trace: /home/runner/work/antrea/antrea/test/e2e/proxy_test.go:1197
/home/runner/work/antrea/antrea/test/e2e/proxy_test.go:1204
Error: "map[agnhost-1:{} agnhost-3:{}]" should have 1 item(s), but has 2
Test: TestProxyLoadBalancerModeDSR/IPv4,withSessionAffinity
Messages: Hostnames should be the same when session affinity is enabled
proxy_test.go:1217:
Error Trace: /home/runner/work/antrea/antrea/test/e2e/proxy_test.go:1217
Error: Should not be: "1.1.1.1"
Test: TestProxyLoadBalancerModeDSR/IPv4,withSessionAffinity
Messages: Client IP should not be preserved with NAT mode
fixtures.go:531: Deleting Pod 'external-client-jjk3ndrv'
=== RUN TestProxyLoadBalancerModeDSR/IPv4,withoutSessionAffinity
proxy_test.go:1187: Request #0 from external-client-ub9ixulw got hostname: agnhost-1
proxy_test.go:1187: Request #1 from external-client-ub9ixulw got hostname: agnhost-2
proxy_test.go:1187: Request #2 from external-client-ub9ixulw got hostname: agnhost-0
proxy_test.go:1187: Request #3 from external-client-ub9ixulw got hostname: agnhost-2
proxy_test.go:1187: Request #4 from external-client-ub9ixulw got hostname: agnhost-1
proxy_test.go:1187: Request #5 from external-client-ub9ixulw got hostname: agnhost-3
proxy_test.go:1187: Request #6 from external-client-ub9ixulw got hostname: agnhost-2
proxy_test.go:1187: Request #7 from external-client-ub9ixulw got hostname: agnhost-2
proxy_test.go:1187: Request #8 from external-client-ub9ixulw got hostname: agnhost-3
proxy_test.go:1187: Request #9 from external-client-ub9ixulw got hostname: agnhost-0
proxy_test.go:1187: Request #0 from internal-client got hostname: agnhost-1
proxy_test.go:1187: Request #1 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #2 from internal-client got hostname: agnhost-2
proxy_test.go:1187: Request #3 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #4 from internal-client got hostname: agnhost-0
proxy_test.go:1187: Request #5 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #6 from internal-client got hostname: agnhost-0
proxy_test.go:1187: Request #7 from internal-client got hostname: agnhost-3
proxy_test.go:1187: Request #8 from internal-client got hostname: agnhost-2
proxy_test.go:1187: Request #9 from internal-client got hostname: agnhost-1
fixtures.go:531: Deleting Pod 'external-client-ub9ixulw'
Analysis
Looking at the antrea-agent log, I suspect it's because Antrea's proxy runner was throttled due to rate limiter while kube-proxy's runner wasn't. The following may be what happened:
15:31:40.209: It received the Service creation, LB IP was not set yet.
15:31:40.230: The 1st sync finished.
I0919 15:31:40.209279 13 config.go:242] Calling handler.OnServiceAdd
I0919 15:31:40.230760 13 proxier.go:1000] syncProxyRules took 21.428631ms
I0919 15:31:40.230777 13 runner.go:220] antrea-agent-proxy: ran, next possible in 1s, periodic in 30s
15:31:40.240: It received the EndpointSlice creation. The 2nd sync started immediately because the burst is 2.
15:31:40.244: During the 2nd sync it received the Service update, which had LB IP set and scheduled the 3rd sync in 999ms, around 15:31:41.251.
15:31:40.252: The 2nd sync finished.
I0919 15:31:40.240839 13 config.go:333] "Calling handler.OnEndpointSliceAdd" endpointSlice="testproxyloadbalancermodedsr-3i55riyi/svc-dsr-b7mql"
I0919 15:31:40.244894 13 config.go:259] Calling handler.OnServiceUpdate
I0919 15:31:40.252834 13 proxier.go:1000] syncProxyRules took 11.562958ms
I0919 15:31:40.252903 13 runner.go:220] antrea-agent-proxy: ran, next possible in 1s, periodic in 30s
I0919 15:31:40.252914 13 runner.go:229] antrea-agent-proxy: 15.7µs since last run, possible in 999.9843ms, scheduled in 29.9999843s
I0919 15:31:40.252923 13 runner.go:236] antrea-agent-proxy: throttled, scheduling run in 999.9843ms
15:31:41.277: During the 3rd sync, it added the LB IP to ipset. Before it, the requests were processed by kube-proxy's iptables rules, which should account for the error "Client IP should be preserved with DSR mode".
15:31:41.278: The 3rd sync finished.
I0919 15:31:41.277980 13 route_linux.go:1924] "Added external IP to ipset" IPSet="ANTREA-EXTERNAL-IP" IP="1.1.2.1"
I0919 15:31:41.278022 13 proxier.go:1000] syncProxyRules took 24.607467ms
I0919 15:31:41.278036 13 runner.go:220] antrea-agent-proxy: ran, next possible in 1s, periodic in 30s
15:31:41.727: It received the Service update, which changed the LoadBalancerMode to NAT mode and scheduled the 4rd sync in 549ms, around 15:31:42.277.
I0919 15:31:41.727486 13 config.go:259] Calling handler.OnServiceUpdate
I0919 15:31:41.728362 13 runner.go:229] antrea-agent-proxy: 450.324741ms since last run, possible in 549.675259ms, scheduled in 29.549675259s
I0919 15:31:41.728376 13 runner.go:236] antrea-agent-proxy: throttled, scheduling run in 549.675259ms
15:31:41.796: It received the Service deletion. The change would also be handled by the 4th sync, causing the Service to be removed directly, never behaving in NAT mode, which should account for the error "Client IP should not be preserved with NAT mode".
I0919 15:31:41.796053 13 config.go:369] "Calling handler.OnEndpointSliceDelete" endpointSlice="testproxyloadbalancermodedsr-3i55riyi/svc-dsr-b7mql"
I0919 15:31:41.796082 13 proxier.go:1089] "Processing EndpointSlice DELETE event" EndpointSlice="testproxyloadbalancermodedsr-3i55riyi/svc-dsr-b7mql"
I0919 15:31:41.796219 13 runner.go:229] antrea-agent-proxy: 518.181092ms since last run, possible in 481.818908ms, scheduled in 29.481818908s
I0919 15:31:41.796273 13 runner.go:236] antrea-agent-proxy: throttled, scheduling run in 481.818908ms
15:31:42.289: The 4th sync deleted the LB IP.
I0919 15:31:42.289073 13 route_linux.go:1960] "Deleted route for external IP" IP="1.1.2.1"
I0919 15:31:42.290635 13 route_linux.go:1966] "Deleted external IP from ipset" IPSet="ANTREA-EXTERNAL-IP" IP="1.1.2.1"
I0919 15:31:42.301581 13 proxier.go:1000] syncProxyRules took 22.762799ms
I0919 15:31:42.301593 13 runner.go:220] antrea-agent-proxy: ran, next possible in 1s, periodic in 30s
The code works as intended, with the rate limiter preventing the proxy runner from executing too frequently in response to each event. To avoid test flakiness, we could add a 1s delay for the Service to be fully realized. The sudden appearance and disappearance of the issue may be caused by performance fluctuations of the GitHub runners. If both EndpointSliceAdd and ServiceUpdate events arrived before the 1st sync finished, they would be handled together by the 2nd sync, and the test would succeed.
The text was updated successfully, but these errors were encountered:
tnqn
added
kind/bug
Categorizes issue or PR as related to a bug.
area/test/e2e
Issues or PRs related to Antrea specific end-to-end testing.
and removed
kind/bug
Categorizes issue or PR as related to a bug.
labels
Sep 30, 2024
Describe the bug
@antoninbas reported
TestProxyLoadBalancerModeDSR/IPv4,withSessionAffinity
has been failing occasionally in Kind CI:https://github.com/antrea-io/antrea/actions/runs/10943236467
https://github.com/antrea-io/antrea/actions/runs/11018658226
The error is typically the following:
Analysis
Looking at the antrea-agent log, I suspect it's because Antrea's proxy runner was throttled due to rate limiter while kube-proxy's runner wasn't. The following may be what happened:
LoadBalancerMode
toNAT
mode and scheduled the 4rd sync in 549ms, around 15:31:42.277.The code works as intended, with the rate limiter preventing the proxy runner from executing too frequently in response to each event. To avoid test flakiness, we could add a 1s delay for the Service to be fully realized. The sudden appearance and disappearance of the issue may be caused by performance fluctuations of the GitHub runners. If both
EndpointSliceAdd
andServiceUpdate
events arrived before the 1st sync finished, they would be handled together by the 2nd sync, and the test would succeed.The text was updated successfully, but these errors were encountered: