Fix access to hostNetwork port on NodeIP when egress-selector-mode=agent #6829
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With
egress-selector-mode: agent
(the default), andegress.X.io/cluster: true
node labels (also a default), if any Pods are running withhostNetwork: true
(and therefore use the Node IP), then the API Server will attempt to use agent tunnels to communicate with those Pods, but tunnel authorization will fail, resulting in connection failures.This commit fixes the issue by adjusting the API Server such that theegress.X.io/cluster: true
node labels are ignored withegress-selector-mode: agent
. This ensures that agent tunnels are only used when connecting to kubelet ports on Node IPs, and will not be used when connecting tohostNetwork
Pod ports on Node IPs.This commit fixes the issue by removing the
egress.X.io/cluster: true
node labels unlessegress-selector-mode
iscluster
orpod
. Whenegress-selector-mode
isagent
, this ensures that agent tunnels are only used when connecting to kubelet ports on Node IPs, and will not be used when connecting tohostNetwork
Pod ports on Node IPs.Linked Issue
More details
This is loosely related to the following issues: #5637 rancher/rke2#3016
When deciding whether to tunnel connections, the server (prior to this commit):
k3s/pkg/daemons/control/tunnel.go
Lines 256 to 257 in b411864
k3s/pkg/daemons/control/tunnel.go
Lines 243 to 247 in b411864
k3s/pkg/daemons/control/tunnel.go
Lines 227 to 230 in b411864
egress-selector-mode
isdisabled
:k3s/pkg/daemons/control/tunnel.go
Lines 111 to 112 in b411864
egress-selector-mode
isagent
then this list is populated with Node IPs:k3s/pkg/daemons/control/tunnel.go
Line 130 in b411864
egress-selector-mode
ispod
orcluster
then this list is populated with both Node and Pod IPs:k3s/pkg/daemons/control/tunnel.go
Line 155 in b411864
k3s/pkg/daemons/control/tunnel.go
Lines 233 to 235 in b411864
egress.X.io/cluster: true
:k3s/pkg/daemons/control/tunnel.go
Line 237 in b411864
When deciding whether to accept a tunnel connection, the server:
k3s/pkg/agent/tunnel/tunnel.go
Lines 343 to 344 in 76729d8
k3s/pkg/agent/tunnel/tunnel.go
Lines 346 to 347 in 76729d8
egress-selector-mode
isdisabled
oragent
:k3s/pkg/agent/tunnel/tunnel.go
Line 107 in 76729d8
egress-selector-mode
iscluster
then this list is populated with Node IPs and Cluster CIDRs:k3s/pkg/agent/tunnel/tunnel.go
Line 174 in 76729d8
egress-selector-mode
ispod
then this list is populated with Node and Pod IPs. If the destination matches a Node IP, then an additional check is made on the destination port, and the connection is accepted only if the destination port belongs to a Pod that hashostNetwork: true
:k3s/pkg/agent/tunnel/tunnel.go
Line 190 in 76729d8
k3s/pkg/agent/tunnel/tunnel.go
Lines 349 to 350 in 76729d8
In my particular case
I'm running RKE2 with the ingress configured with
hostNetwork: true
. (This was configured to permit IPv6 access to the ingress from outside the cluster before we had IPv6 working within the cluster. It may no longer be necessary now that we have IPv6 working within the cluster.)At some point a few months ago, ingress creation/configuration became unreliable. It would eventually succeed if we retried it enough times, but it would usually fail with the following error:
(We now know that it would succeed if the API Server happened to connect to the ingress on the local node, but would fail if the API Server connected to the ingress on any other node.)
The
rke2 server
log on the source node included:The
rke2 server
log on the destination node included:User-Facing Change