NodePortLocal does not handle multi-protocol well (same target port) if the Node ports is already in use for one protocol #2894
Labels
area/proxy/nodeportlocal
Issues or PRs related to the NodePortLocal feature
kind/bug
Categorizes issue or PR as related to a bug.
Milestone
Describe the bug
To provide a good user experience and simplify the implementation, the NodePortLocal controller allocates a single Node port value for a give (PodIP, PodPort) tuple.
With a Service like this one:
For any backend Pod implementing this Service, the NPL controller will allocate one local Node port and traffic to that port will be forwarded to the Pod (iptables rules) for both TCP and UDP.
However, the implementation processes the protocols sequentially, as follows:
This means that if port 61000 (for example) is available for TCP but not UDP, it may be selected by the NPL controller for a given (PodIP, PodPort) if the TCP protocol is handled first. UDP processing will then fail with no possible recovery (unless the selected port eventually becomes available for UDP).
In this scenario, we have considered the case where NPL needs to be configured for both TCP and UDP "at the same time" (the Service is created with 2 ports using the same
targetPort
, one for TCP and one for UDP). The same issue exists if the Service is created with a single port, and later updated to support an additional protocol.To Reproduce
Create a UDP server on the worker Node where the Pods will be scheduled, and bind to the first one in the configured NPL range:
nc -u -l 61000
.Use the following YAML (you may need to force the Pod to be scheduled to the given worker Node, if your cluster has more than one):
Look at the
antrea-agent
logs on that Node:Expected
The NPL controller should select the next available port (61001) in order to support both TCP and UDP.
Actual behavior
The NPL controller selects 61000 because it is available for TCP, and then fails to configure forwarding for UDP, with no receovery.
Versions:
Antrea TOT (v1.4.0-dev)
Additional context
I think the best solution would be to reserve the port for all supported protocols (TCP + UDP, and later SCTP) and move on to a different Node port if one protocol is not available.
The text was updated successfully, but these errors were encountered: