Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPL TestMultipleProtocols unit test failing #3847

Closed
antoninbas opened this issue May 31, 2022 · 8 comments
Closed

NPL TestMultipleProtocols unit test failing #3847

antoninbas opened this issue May 31, 2022 · 8 comments
Assignees
Labels
area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.

Comments

@antoninbas
Copy link
Contributor

antoninbas commented May 31, 2022

Describe the bug

https://github.com/antrea-io/antrea/actions/runs/2416940934

2022-05-31T18:49:36.3028158Z I0531 18:48:57.192145   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3028978Z I0531 18:48:57.192453   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3029858Z I0531 18:48:57.294133   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3030771Z I0531 18:48:57.294157   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3031546Z I0531 18:48:57.294256   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3032348Z I0531 18:48:57.294545   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3032983Z I0531 18:48:57.294638   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3033950Z I0531 18:48:57.294808   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3034532Z I0531 18:48:58.297184   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3035074Z I0531 18:48:58.298544   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3035708Z I0531 18:48:58.298580   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3036499Z I0531 18:48:58.398612   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3037105Z I0531 18:48:58.398818   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3037661Z I0531 18:48:58.400170   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3038254Z I0531 18:48:58.400248   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3039314Z I0531 18:48:58.400426   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3040088Z I0531 18:48:58.400531   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3041531Z I0531 18:48:59.399874   15954 npl_controller.go:193] "Service is of type NodePort and cannot be used for NodePortLocal, the NodePortLocal annotation will have no effect" service="default/test-svc"
2022-05-31T18:49:36.3042237Z I0531 18:49:01.400835   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3042735Z I0531 18:49:01.403437   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3043266Z I0531 18:49:01.403481   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3043811Z I0531 18:49:01.403981   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3044501Z I0531 18:49:01.404041   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3045116Z I0531 18:49:01.404140   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3045700Z I0531 18:49:01.404233   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3055043Z I0531 18:49:01.503990   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3057311Z I0531 18:49:01.504140   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3057777Z I0531 18:49:04.507201   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3058190Z I0531 18:49:04.508468   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3058657Z I0531 18:49:04.508509   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3059120Z I0531 18:49:04.611039   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3059608Z I0531 18:49:04.611129   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3060145Z I0531 18:49:04.611272   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3060659Z I0531 18:49:04.611387   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3061167Z I0531 18:49:04.613287   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3061673Z I0531 18:49:04.613503   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3062121Z I0531 18:49:06.615143   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3062526Z I0531 18:49:06.616509   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3062969Z I0531 18:49:06.616547   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3063468Z I0531 18:49:06.717551   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3063972Z I0531 18:49:06.717689   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3064431Z I0531 18:49:06.717822   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3064925Z I0531 18:49:06.717879   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3065457Z I0531 18:49:06.717983   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3065951Z I0531 18:49:06.718141   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3066401Z I0531 18:49:09.719945   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3067114Z I0531 18:49:09.721938   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3068515Z I0531 18:49:09.721977   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3069004Z I0531 18:49:09.823180   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3069482Z I0531 18:49:09.823268   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3069998Z I0531 18:49:09.823391   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3070503Z I0531 18:49:09.823529   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3071102Z I0531 18:49:09.823620   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3071596Z I0531 18:49:09.823782   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3072055Z I0531 18:49:11.825376   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3072463Z I0531 18:49:11.826620   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3072889Z I0531 18:49:11.826665   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3073337Z I0531 18:49:11.927695   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3073835Z I0531 18:49:11.927796   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3074486Z I0531 18:49:11.927953   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3074979Z I0531 18:49:11.928104   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3075471Z I0531 18:49:11.929813   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3075958Z I0531 18:49:11.930351   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3076385Z I0531 18:49:13.931834   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3076775Z I0531 18:49:13.933362   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3077196Z I0531 18:49:13.933399   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3077624Z I0531 18:49:14.035099   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3078097Z I0531 18:49:14.035190   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3078595Z I0531 18:49:14.035311   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3079214Z I0531 18:49:14.035551   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3079713Z I0531 18:49:14.037789   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3080198Z I0531 18:49:14.037961   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3080630Z I0531 18:49:16.039211   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3081176Z I0531 18:49:16.040537   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3081612Z I0531 18:49:16.040585   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3082105Z I0531 18:49:16.141715   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3082599Z I0531 18:49:16.141880   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3083054Z I0531 18:49:16.141733   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3083824Z I0531 18:49:16.142028   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3084382Z I0531 18:49:16.142148   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3084894Z I0531 18:49:16.142258   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3085326Z I0531 18:49:17.142591   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3085737Z I0531 18:49:17.144361   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3086175Z I0531 18:49:17.144414   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3086618Z I0531 18:49:17.244828   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3087109Z I0531 18:49:17.244960   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3087732Z I0531 18:49:17.245116   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3088253Z I0531 18:49:17.245171   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3088757Z I0531 18:49:17.245260   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3089315Z I0531 18:49:17.245339   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3089857Z I0531 18:49:18.246037   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3090347Z I0531 18:49:18.247213   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3090962Z I0531 18:49:18.247308   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3091558Z I0531 18:49:18.348370   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3092142Z I0531 18:49:18.348604   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3092680Z I0531 18:49:18.348423   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3093296Z I0531 18:49:18.348787   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3094994Z I0531 18:49:18.348924   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3095642Z I0531 18:49:18.349080   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3096160Z I0531 18:49:19.349357   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3096828Z I0531 18:49:19.352131   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3097370Z I0531 18:49:19.352183   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3097953Z I0531 18:49:19.452928   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3098450Z I0531 18:49:19.452999   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3099019Z I0531 18:49:19.453167   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3099664Z I0531 18:49:19.453197   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3100257Z I0531 18:49:19.453299   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3101065Z I0531 18:49:19.453495   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3101592Z I0531 18:49:20.455749   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3102134Z I0531 18:49:20.458358   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3102890Z I0531 18:49:20.458402   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3103440Z I0531 18:49:20.559086   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3104026Z I0531 18:49:20.559172   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3104687Z I0531 18:49:20.559287   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3105827Z I0531 18:49:20.559419   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3106370Z I0531 18:49:20.561410   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3106950Z I0531 18:49:20.561516   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3107561Z I0531 18:49:21.562151   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3109615Z I0531 18:49:21.563629   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3110145Z I0531 18:49:21.563667   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3110716Z I0531 18:49:21.664437   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3111353Z I0531 18:49:21.664585   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3111910Z I0531 18:49:21.664887   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3112484Z I0531 18:49:21.665014   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3113093Z I0531 18:49:21.665230   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3113739Z I0531 18:49:21.665377   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3114385Z I0531 18:49:23.666408   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3114806Z I0531 18:49:23.667614   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3115306Z I0531 18:49:23.667654   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3117169Z I0531 18:49:23.768908   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3117904Z I0531 18:49:23.769036   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3118430Z I0531 18:49:23.769224   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3118985Z I0531 18:49:23.769282   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3119601Z I0531 18:49:23.769386   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3120224Z I0531 18:49:23.769509   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3120732Z I0531 18:49:25.770840   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3121331Z I0531 18:49:25.772080   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3121859Z I0531 18:49:25.772126   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3122472Z I0531 18:49:25.872084   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3123050Z I0531 18:49:25.872236   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3123539Z I0531 18:49:25.873631   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3124271Z I0531 18:49:25.873711   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3125028Z I0531 18:49:25.873829   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3125620Z I0531 18:49:25.873958   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3126162Z I0531 18:49:28.874347   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3127483Z --- FAIL: TestMultipleProtocols (3.10s)
2022-05-31T18:49:36.3128140Z     npl_agent_test.go:341: Successfully updated Pod: pod2
2022-05-31T18:49:36.3128707Z     npl_agent_test.go:335: Successfully updated Service: svc2
2022-05-31T18:49:36.3129157Z     annotations.go:77: 
2022-05-31T18:49:36.3129676Z         	Error Trace:	annotations.go:77
2022-05-31T18:49:36.3130352Z         	            				npl_agent_test.go:722
2022-05-31T18:49:36.3130839Z         	Error:      	Not equal: 
2022-05-31T18:49:36.3131359Z         	            	expected: 61001
2022-05-31T18:49:36.3131911Z         	            	actual  : 61002
2022-05-31T18:49:36.3132392Z         	Test:       	TestMultipleProtocols
2022-05-31T18:49:36.3162540Z         	Messages:   	NodePort mismatch in annotation
2022-05-31T18:49:36.3163059Z I0531 18:49:28.875565   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3163519Z I0531 18:49:28.875602   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3164019Z I0531 18:49:28.976057   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3164508Z I0531 18:49:28.976210   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3164967Z I0531 18:49:28.976505   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3165454Z I0531 18:49:28.976590   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3165992Z I0531 18:49:28.976699   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3166509Z I0531 18:49:28.976815   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3166947Z I0531 18:49:29.976935   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3167343Z I0531 18:49:29.978606   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3167776Z I0531 18:49:29.978664   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3168223Z I0531 18:49:30.079761   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3168712Z I0531 18:49:30.079874   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3169734Z I0531 18:49:30.080096   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3170319Z I0531 18:49:30.080218   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3170812Z I0531 18:49:30.082288   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3171346Z I0531 18:49:30.082394   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3171795Z I0531 18:49:31.083287   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3172200Z I0531 18:49:31.084716   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3172630Z I0531 18:49:31.086712   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3173073Z I0531 18:49:31.086874   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3173563Z I0531 18:49:31.087050   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3174404Z I0531 18:49:31.087189   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3174996Z I0531 18:49:31.087366   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3175484Z I0531 18:49:31.185019   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3175971Z I0531 18:49:31.185154   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3176404Z I0531 18:49:33.186286   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3176795Z I0531 18:49:33.187480   15954 npl_controller.go:127] Starting NPLController
2022-05-31T18:49:36.3177203Z I0531 18:49:33.187519   15954 shared_informer.go:255] Waiting for caches to sync for NPLController
2022-05-31T18:49:36.3177634Z I0531 18:49:33.288715   15954 shared_informer.go:262] Caches are synced for NPLController
2022-05-31T18:49:36.3178115Z I0531 18:49:33.288805   15954 npl_controller.go:563] "Will fetch Pods and generate NodePortLocal rules for these Pods"
2022-05-31T18:49:36.3178630Z I0531 18:49:33.288925   15954 npl_controller.go:617] "Waiting for initialization of NodePortLocal rules to complete"
2022-05-31T18:49:36.3179159Z E0531 18:49:33.289037   15954 port_table_linux.go:336] "Failed to restore iptables rules" err="iptables failure" backoff="2s"
2022-05-31T18:49:36.3179658Z I0531 18:49:33.289103   15954 shared_informer.go:255] Waiting for caches to sync for AntreaAgentNPLController
2022-05-31T18:49:36.3180395Z I0531 18:49:33.289207   15954 shared_informer.go:262] Caches are synced for AntreaAgentNPLController
2022-05-31T18:49:36.3181045Z I0531 18:49:35.289309   15954 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
2022-05-31T18:49:36.3181496Z I0531 18:49:36.289964   15954 npl_controller.go:123] Shutting down NPLController
2022-05-31T18:49:36.3181792Z FAIL

Versions:
main branch

@antoninbas antoninbas added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature labels May 31, 2022
@XinShuYang
Copy link
Contributor

XinShuYang commented Jun 1, 2022

I didn't find the same error after triggering test again. It's a little weird because we didn't change recent linux port allocation code and it should still assign the same nodeport to a new rule with existed podip+podport and new protocol. And I think maybe this nodeport consistency check can be removed after we support unified port allocation? @antoninbas

@XinShuYang
Copy link
Contributor

XinShuYang commented Jun 1, 2022

testSvc2.Spec.Ports = append(testSvc2.Spec.Ports, corev1.ServicePort{ Port: 81, Protocol: corev1.ProtocolUDP, TargetPort: intstr.IntOrString{ Type: intstr.Int, IntVal: 80, }, })

I think the Protocol should be ProtocolTCP here, currently we get "spec: {[{ udp 81 {0 80 } 0} { UDP 81 {0 80 } 0}] map[tcp:true udp:true]" from testSvc2 after svc update.
And the new annotation is still correct (This test has never failed in my test environment):
pod2ValueUpdate: [{80 10.10.10.10 61001 tcp [tcp]} {80 10.10.10.10 61001 udp [udp]}]

Not sure if this is the root cause and why it can cause randomly failure for TestMultipleProtocols.

@antoninbas
Copy link
Contributor Author

I am taking a look at this and trying to reproduce.
I agree that this is a strange failure (plus the test is written in a confusing way). My fear is that there is some race condition. I just saw the test fail again on main: https://github.com/antrea-io/antrea/actions/runs/2420323323

@wenyingd
Copy link
Contributor

wenyingd commented Jun 2, 2022

I am also looking at this issue, and I doubt it is related with the order of update Pod2's label and update Service2's label ( not sure about it). The current testing code is first to update Pod2's label, then Service2's label. When npl_controller processes Pod2's update, if the Service2 is not updated yet, it should first add TCP protocol ( selected by Service1 ), then delete UDP protocol ( selected by original Service2). But is Service2 is updatd in the mock apiserver cache, the process for npl_controller should be add UDP protocol only.

This might introduce a un-sure conditions, not sure if the issue is introduced by this.

@XinShuYang
Copy link
Contributor

XinShuYang commented Jun 2, 2022

My fear is that there is some race condition.

Wenying and I just found a possible case that may cause this issue. Although the pod label was updated earlier(updatePodOrFail , updateServiceOrFail), NPL workers still could finish service info update before pod info update in race condition. In this situation, the pod2 annotation was fully removed firstly, then added a new nodeport with both tcp and udp rules to pod2 by service update (final pod2 annotation is [{80 10.10.10.10 61002 tcp [tcp]} {80 10.10.10.10 61002 udp [udp]}]).

But why this issue was first observed after we merged recent NPL PR? I am not sure but maybe the longer independent annotation could change execution time for some npl controller workers.

I will add a sleep function between pod update and service update to ensure execution sequence, then rerun multi rounds unit test to see if there is will be the same error. @antoninbas

@antoninbas
Copy link
Contributor Author

@XinShuYang I was able to reproduce it in my fork with a simple workflow: https://github.com/antoninbas/antrea/blob/go-test/.github/workflows/go-test.yml. You may want to do something similar in your fork. It took 3 tries, and it seems to only happen with the -race flag and the code coverage flags.

I increased the log verbosity a bit for my test, and I saw the following logs:

I0602 04:57:31.572372    7357 npl_controller.go:619] "Initialization of NodePortLocal rules successful"
I0602 04:57:31.572624    7357 port_table_linux.go:79] "Looking for free Node port" podIP="192.168.32.1" podPort=80
I0602 04:57:31.572848    7357 port_table_linux.go:79] "Looking for free Node port" podIP="192.168.32.2" podPort=80
I0602 04:57:33.576557    7357 port_table_linux.go:79] "Looking for free Node port" podIP="192.168.32.2" podPort=80
I0602 04:57:34.575765    7357 npl_controller.go:123] Shutting down NPLController
--- FAIL: TestMultipleProtocols (3.11s)
    npl_agent_test.go:341: Successfully updated Pod: pod2
    npl_agent_test.go:335: Successfully updated Service: svc2
    annotations.go:77: 
        	Error Trace:	annotations.go:77
        	            				npl_agent_test.go:722
        	Error:      	Not equal: 
        	            	expected: 61001
        	            	actual  : 61002
        	Test:       	TestMultipleProtocols
        	Messages:   	NodePort mismatch in annotation

Notice that there is a 3rd Looking for free Node port statement, which is for pod2. This is why a 3rd NodePort is allocated 61002). I am pretty sure there is indeed a race somewhere. Is the above consistent with your explanation? I think your explanation makes sense given that PortSearchStart is not "reset" when a NodePort value is released. You may want to add more logs and see if you can reproduce. It would be good to print the full actual annotation for example. And also add a log when a mapping is allocated or released.

But why this issue was first observed after we merged recent NPL PR? I am not sure but maybe the longer independent annotation could change execution time for some npl controller workers.

Yes, the change in timings has led to this issue becoming apparent, there was probably always a race.

I will add a sleep function between pod update and service update to ensure execution sequence, then rerun multi rounds unit test to see if there is will be the same error.

If your explanation is correct, I guess it is an issue with the test case itself, and not with the NPL implementation. Even though it's surprising behavior, if there is a (short) window of time during which the Pod is no longer used for NPL, it makes sense for the port assignment to be released.

XinShuYang added a commit to XinShuYang/antrea that referenced this issue Jun 2, 2022
issue antrea-io#3847

Ensure the execution sequence to avoid unexpected annotation in race condition.

Signed-off-by: Shuyang Xin <[email protected]>
@XinShuYang
Copy link
Contributor

XinShuYang commented Jun 2, 2022

I was able to reproduce it in my fork with a simple workflow: https://github.com/antoninbas/antrea/blob/go-test/.github/workflows/go-test.yml.

Thanks, I will also try to reproduce the error and collect logs with your setup.

@XinShuYang
Copy link
Contributor

In failed case the annotation is:
pod2ValueUpdate,: [{80 10.10.10.10 61002 tcp [tcp]} {80 10.10.10.10 61002 udp [udp]}]

After adding a waiting window, the test has not failed again.

XinShuYang added a commit to XinShuYang/antrea that referenced this issue Jun 8, 2022
The root cause is service could update before pod label update in race condition.
In this case, the pod2 annotation was fully removed firstly.
Then NPL controller added a new nodeport with both tcp and udp rules to pod2 by service update.
As the result, the pod2 annotation was:
'[{80 10.10.10.10 61002 tcp [tcp]} {80 10.10.10.10 61002 udp [udp]}]'.

issue antrea-io#3847

The fix ensures the execution sequence to avoid unexpected annotation in race condition.

Annotation after fix:
'[{80 10.10.10.10 61001 tcp [tcp]} {80 10.10.10.10 61001 udp [udp]}]'

Signed-off-by: Shuyang Xin <[email protected]>
XinShuYang added a commit to XinShuYang/antrea that referenced this issue Jun 8, 2022
The failed case has updated both Service and Pod.
The issue happens when Service update is processed by NPL controller before Pod update.
In this case, the pod2 annotation was fully removed firstly.
Then NPL controller added a new nodeport with both tcp and udp rules to pod2 by service update.
As the result, the pod2 annotation was:
'[{80 10.10.10.10 61002 tcp [tcp]} {80 10.10.10.10 61002 udp [udp]}]'.

issue antrea-io#3847

The fix ensures the execution sequence to avoid unexpected annotation in race condition.

Annotation after fix:
'[{80 10.10.10.10 61001 tcp [tcp]} {80 10.10.10.10 61001 udp [udp]}]'

Signed-off-by: Shuyang Xin <[email protected]>
XinShuYang added a commit to XinShuYang/antrea that referenced this issue Jun 9, 2022
The failed case has updated both Service and Pod.
The issue happens when Service update is processed by NPL controller before Pod update.
In this case, the pod2 annotation was fully removed firstly.
Then NPL controller added a new nodeport with both tcp and udp rules to pod2 by service update.
As the result, the pod2 annotation was:
'[{80 10.10.10.10 61002 tcp [tcp]} {80 10.10.10.10 61002 udp [udp]}]'.

issue antrea-io#3847

The fix removes unnecessary label update for pod2 to avoid unexpected annotation in race condition.

Annotation after fix:
'[{80 10.10.10.10 61001 tcp [tcp]} {80 10.10.10.10 61001 udp [udp]}]'

Signed-off-by: Shuyang Xin <[email protected]>
@tnqn tnqn closed this as completed in 2736201 Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.
Projects
None yet
Development

No branches or pull requests

3 participants