Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodePortLocal does not handle multi-protocol well (same target port) if the Node ports is already in use for one protocol #2894

Closed
antoninbas opened this issue Oct 15, 2021 · 0 comments · Fixed by #2903
Assignees
Labels
area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature kind/bug Categorizes issue or PR as related to a bug.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
To provide a good user experience and simplify the implementation, the NodePortLocal controller allocates a single Node port value for a give (PodIP, PodPort) tuple.
With a Service like this one:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  annotations:
    nodeportlocal.antrea.io/enabled: "true"
spec:
  ports:
  - name: web
    port: 80
    protocol: TCP
    targetPort: 80
  - name: web-udp
    port: 80
    protocol: UDP
    targetPort: 80
  selector:
    app: nginx

For any backend Pod implementing this Service, the NPL controller will allocate one local Node port and traffic to that port will be forwarded to the Pod (iptables rules) for both TCP and UDP.
However, the implementation processes the protocols sequentially, as follows:

  • if no Node port was allocated for the given (PodIP, PodPort) tuple, find an available port and reserve it by binding to it (according to the current protocol)
  • otherwise, if a Node port has already been allocated, bind to it for the new protocol or fail

This means that if port 61000 (for example) is available for TCP but not UDP, it may be selected by the NPL controller for a given (PodIP, PodPort) if the TCP protocol is handled first. UDP processing will then fail with no possible recovery (unless the selected port eventually becomes available for UDP).

In this scenario, we have considered the case where NPL needs to be configured for both TCP and UDP "at the same time" (the Service is created with 2 ports using the same targetPort, one for TCP and one for UDP). The same issue exists if the Service is created with a single port, and later updated to support an additional protocol.

To Reproduce
Create a UDP server on the worker Node where the Pods will be scheduled, and bind to the first one in the configured NPL range: nc -u -l 61000.
Use the following YAML (you may need to force the Pod to be scheduled to the given worker Node, if your cluster has more than one):

apiVersion: v1
kind: Service
metadata:
  name: nginx
  annotations:
    nodeportlocal.antrea.io/enabled: "true"
spec:
  ports:
  - name: web
    port: 80
    protocol: UDP
    targetPort: 80
  - name: web-udp
    port: 80
    protocol: UDP
    targetPort: 80
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx

Look at the antrea-agent logs on that Node:

E1014 23:55:48.061852       1 npl_controller.go:366] Error syncing Pod default/nginx-6799fc88d8-xbflx, requeuing. Error: failed to add rule for Pod default/nginx-6799fc88d8-xbflx: Local port 61000 was previously allocated for 10.10.1.15:80 but is not available for protocol udp: listen udp4 :61000: bind: address already in use

Expected
The NPL controller should select the next available port (61001) in order to support both TCP and UDP.

Actual behavior
The NPL controller selects 61000 because it is available for TCP, and then fails to configure forwarding for UDP, with no receovery.

Versions:
Antrea TOT (v1.4.0-dev)

Additional context
I think the best solution would be to reserve the port for all supported protocols (TCP + UDP, and later SCTP) and move on to a different Node port if one protocol is not available.

@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature labels Oct 15, 2021
@antoninbas antoninbas added this to the Antrea v1.4 release milestone Oct 15, 2021
@antoninbas antoninbas self-assigned this Oct 19, 2021
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 19, 2021
There is an edge case, which is not handled well by the current NPL
controller implementation: if a given Pod port needs to be exposed for
both TCP and UDP, and the first available Node port is only available
for TCP, it will be still be selected and NPL rule installation will
succeed for TCP but not for UDP.

We have 2 options:

1. reserve the port for both protocols ahead of time, even if the port
is only needed for one protocol initially.
2. support using different Node ports for different protocols, even when
the Pod port is the same.

In this patch, we go with option 1) to preserve the "nice" property that
a give Pod port maps to a unique Node port.

Fixes antrea-io#2894

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 19, 2021
There is an edge case, which is not handled well by the current NPL
controller implementation: if a given Pod port needs to be exposed for
both TCP and UDP, and the first available Node port is only available
for TCP, it will be still be selected and NPL rule installation will
succeed for TCP but not for UDP.

We have 2 options:

1. reserve the port for both protocols ahead of time, even if the port
is only needed for one protocol initially.
2. support using different Node ports for different protocols, even when
the Pod port is the same.

In this patch, we go with option 1) to preserve the "nice" property that
a give Pod port maps to a unique Node port.

Fixes antrea-io#2894

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 20, 2021
There is an edge case, which is not handled well by the current NPL
controller implementation: if a given Pod port needs to be exposed for
both TCP and UDP, and the first available Node port is only available
for TCP, it will be still be selected and NPL rule installation will
succeed for TCP but not for UDP.

We have 2 options:

1. reserve the port for both protocols ahead of time, even if the port
is only needed for one protocol initially.
2. support using different Node ports for different protocols, even when
the Pod port is the same.

In this patch, we go with option 1) to preserve the "nice" property that
a give Pod port maps to a unique Node port.

Fixes antrea-io#2894

Signed-off-by: Antonin Bas <[email protected]>
tnqn pushed a commit that referenced this issue Oct 21, 2021
There is an edge case, which is not handled well by the current NPL
controller implementation: if a given Pod port needs to be exposed for
both TCP and UDP, and the first available Node port is only available
for TCP, it will be still be selected and NPL rule installation will
succeed for TCP but not for UDP.

We have 2 options:

1. reserve the port for both protocols ahead of time, even if the port
is only needed for one protocol initially.
2. support using different Node ports for different protocols, even when
the Pod port is the same.

In this patch, we go with option 1) to preserve the "nice" property that
a give Pod port maps to a unique Node port.

Fixes #2894

Signed-off-by: Antonin Bas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant