ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local #7561

proligde · 2023-05-16T15:19:12Z

K3s Version:
Migrating from v1.25.6+k3s1 to v1.25.7+k3s1

Node(s) CPU architecture, OS, and Version:
1 Node, x86, reproducible on different OS like Ubuntu 22.04 and Ubuntu on WSL

Cluster Configuration:
1 Node local dev development

Describe the bug:
We're using k3s as our local development environment platform, routing FQDNs to our dev machines using /etc/hosts entries pointing to 127.0.0.1

This worked perfectly fine for basically years now, until recently. I had to dig quite a while before I was able to pinpoint it to upgrading k3s from v1.25.6+k3s1 to v1.25.7+k3s1. It stops working on 1.25.7 and works again after downgrading to 1.25.6. The problem also exists on the most current 1.27.1

Steps To Reproduce:
Install k3s and ingress controller like so and access it using 127.0.0.1

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.25.7+k3s1 K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="--disable=traefik" sh -

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx

helm install -n kube-system --version '^4.0.0' --set 'controller.watchIngressWithoutClass=true' --set 'controller.service.externalTrafficPolicy=Local' nginx-ingress ingress-nginx/ingress-nginx

curl http://127.0.0.1

Expected behavior:
A 404 Message returned by nginx

Actual behavior:
Network timeout

However - the Ports/Services are in fact working perfectly fine using the Node's LAN Address like 192.168.... It's just not available via 127.0.0.1 anymore and I can't figure out, why.

Additional info

I checked the corresponding changelog at https://github.com/k3s-io/k3s/releases/tag/v1.25.7%2Bk3s1 and found some changes regarding the servicelb but at least to me none of them explained the behavior I'm seeing.
I exported iptables while running 1.25.6 and while running 1.25.7 and tried to compare them somehow but I'm afraid my knowledge of iptables is not sufficient to assess whether the cause of the problem is to be found here or not.

Thanks a lot for any help in advance - Max

The text was updated successfully, but these errors were encountered:

brandond · 2023-05-16T18:30:29Z

I suspect that it's related to the flannel version bump. I see that @rbrtbnfgl has assigned this to himself so I'll let him reply once he's had a chance to do some investigation.

rbrtbnfgl · 2023-05-16T18:40:35Z

Yes I want to do a futher investigation on it. The only difference between the Kube-router and flannel versions should be the iptables rules order but they should be related only to the pods traffic and it seems that the issue is related to the traffic from the node to localhost.

rbrtbnfgl · 2023-05-23T09:08:32Z

I think the issue is related to externalTrafficPolicy=Local if it's configured to Cluster it works. I have to check what happened between the two versions that led Local to not work with localhost.

rbrtbnfgl · 2023-05-23T09:32:40Z

I don't know if this is the right fix or this behave is the expected one. It could be related to this https://github.com/k3s-io/k3s/blob/master/pkg/cloudprovider/servicelb.go#L556
Maybe also localhost needs to be added as destination IP.

brandond · 2023-05-23T16:46:45Z

We haven't changed anything about how traffic gets into the svclb pods, that still relies on NodePort pods - the kubelet and portmap CNI plugin handle all of that.

rbrtbnfgl · 2023-05-24T10:45:11Z

The behaviour of the CNI is the same on both versions the packets directed to localhost:80 are masqueraded with the IP of the svclb-nginx-ingress-ingress-nginx-controller pod and the packets arrive to that pod but are dropped by it.
Inspecting that pod I noticed that it's installing internally some iptables rules to redirect the traffic to the nginx-ingress-ingress-nginx-controller pod.
On 1.25.6

Defaulted container "lb-tcp-80" out of: lb-tcp-80, lb-tcp-443
+ trap exit TERM INT
+ echo 0.0.0.0/0
+ grep -Eq :
+ iptables -t filter -I FORWARD -s 0.0.0.0/0 -p TCP --dport 80 -j ACCEPT
+ echo 10.43.117.1
+ grep -Eq :
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '==' 1 ]
+ iptables -t filter -A FORWARD -d 10.43.117.1/32 -p TCP --dport 80 -j DROP
+ iptables -t nat -I PREROUTING '!' -s 10.43.117.1/32 -p TCP --dport 80 -j DNAT --to 10.43.117.1:80
+ iptables -t nat -I POSTROUTING -d 10.43.117.1/32 -p TCP -j MASQUERADE
+ '[' '!' -e /pause ]
+ mkfifo /pause

On 1.25.7

Defaulted container "lb-tcp-80" out of: lb-tcp-80, lb-tcp-443
+ trap exit TERM INT
+ echo 0.0.0.0/0
+ grep -Eq :
+ iptables -t filter -I FORWARD -s 0.0.0.0/0 -p TCP --dport 80 -j ACCEPT
+ echo 10.1.1.4
+ grep -Eq :
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '==' 1 ]
+ iptables -t filter -A FORWARD -d 10.1.1.4/32 -p TCP --dport 32545 -j DROP
+ iptables -t nat -I PREROUTING '!' -s 10.1.1.4/32 -p TCP --dport 80 -j DNAT --to 10.1.1.4:32545
+ iptables -t nat -I POSTROUTING -d 10.1.1.4/32 -p TCP -j MASQUERADE
+ '[' '!' -e /pause ]
+ mkfifo /pause

10.1.1.4 is the nodeIP. If I check the iptables rules matched internally by the pod in case of 1.25.7 the drop rule is matched.
This one iptables -t filter -A FORWARD -d 10.1.1.4/32 -p TCP --dport 32545 -j DROP. That rules is matched in this case because the port is changed and it doesn't match the default ACCEPT rule that uses --dport 80.

brandond · 2023-05-24T18:42:03Z

Yes, the difference comes from the ExternalTrafficPolicy, which was added in https://github.com/k3s-io/k3s/pull/6726/files#diff-38e1c51632f2d12566706b6d7f22cf82e1441ea19e27f787ff15e4b7b92dc197R566-R592

If the ExternalTrafficPolicy is set to local, the svclb pods target the host IP and NodePort to ensure that traffic only goes to local pods. If it is set to anything else, it targets the cluster service address and port.

Why would the drop rule only match if the connection is to the loopback address, instead of the node IP? Is it bypassing a port rewrite rule on the way in?

rbrtbnfgl · 2023-05-24T18:49:05Z

When you the ip is the node IP the kube-proxy rules masquerade the packets with the ingress pod IP and not the svclb one.

brandond · 2023-05-24T18:58:53Z

Hmm. The intent of the FORWARD rules is to enforce LoadBalancerSourceRanges; I guess that ends up blocking local access that doesn't go through kube-proxy.

What's the source of the traffic in this case? Is it from the loopback address?

rbrtbnfgl · 2023-05-25T08:35:23Z

The source address is the IP of cni0

brandond · 2023-05-25T22:37:33Z

Hmm OK. So in the case of targeting the local pod's node port for local traffic policy, the forward rules we put in place to enforce source ranges end up blocking access via localhost. I'll have to do some thinking on how to handle that.

Is it always the cni0 IP as the source, across all CNIs? or is this a flannel-specific thing?

rbrtbnfgl · 2023-05-26T11:03:42Z

It should be related to portmap CNI. It changes the destination IP to the IP of the pod and the linux routing table decides to use the IP of cni0 that belongs to the same network of the pods.

brandond · 2023-05-26T22:54:50Z

I just realized that I was overcomplicating the path here, and the core issue is just that the source and destination port are not the same when the ExternalTrafficPolicy is set to local, and I used the wrong port in the allow rule.

There appears to be a community PR to fix this at k3s-io/klipper-lb#54

fmoral2 · 2023-06-12T18:50:54Z

Validated on Version:

-k3s version v1.27.2+k3s-55db9b18 (55db9b18)

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
Ubuntu

Cluster Configuration:
1 node

Config.yaml:

token: secret
write-kubeconfig-mode: 644
selinux: true
cluster-init: true

Steps to Repro the issue:

Install k3s in previous version
Deploy a workload that will reach local :8080
Check the connection

Steps to Validate the fix:

Install k3s in latest commit
Deploy a workload that will reach local :8080
Check the connection successfully

    #### Issue Rep: ####

~$ k3s -v
    k3s version v1.27.2+k3s-0485a56f (0485a56f)


 ~$ k apply
    `
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
    name: ingresstest-deploy
    labels:
    app: ingresstest
    spec:
    selector:
    matchLabels:
    app: ingresstest
    template:
    metadata:
    labels:
    app: ingresstest
    spec:
    containers:
    - name: ingresstest
    image: ranchertest/mytestcontainer:unprivileged
    imagePullPolicy: Always
    ---
    apiVersion: v1
    kind: Service
    metadata:
    name: ingresstest-ingress-svc
    labels:
    app: ingresstest
    spec:
    externalTrafficPolicy: Local
    type: LoadBalancer
    ports:
    - port: 8080
    targetPort: 8080
    protocol: TCP
    name: http
    selector:
    app: ingresstest
    `


~$ curl http://127.0.0.1:8080
    curl: (28) Failed to connect to 127.0.0.1 port 8080 after 129254 ms: Connection timed out




    ###### Issue Validation: ########



 ~$ k3s -v
    k3s version v1.27.2+k3s-55db9b18 (55db9b18)


~$ curl http://127.0.0.1:8080
    <!DOCTYPE html>
    <html>
    <head>
        <title>Welcome to nginx!</title>
        <style>
            html { color-scheme: light dark; }
            body { width: 35em; margin: 0 auto;
                font-family: Tahoma, Verdana, Arial, sans-serif; }
        </style>
    </head>
    <body>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
        working. Further configuration is required.</p>

    <p>For online documentation and support please refer to
        <a href="http://nginx.org/">nginx.org</a>.<br/>
        Commercial support is available at
        <a href="http://nginx.com/">nginx.com</a>.</p>

    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>



~$  k describe pod svclb-ingresstest-ingress-svc-467619a8-6sm9j -n kube-system | grep "klipper"
    Image:          rancher/klipper-lb:v0.4.4
    Image ID:       docker.io/rancher/klipper-lb@sha256:d6780e97ac25454b56f88410b236d52572518040f11d0db5c6baaac0d2fcf860
  Normal  Pulled     5m18s  kubelet            Container image "rancher/klipper-lb:v0.4.4" already present on machine

coopstah13 · 2023-11-15T23:14:57Z

can this be backported?

brandond · 2023-11-15T23:18:52Z

Backported to what? It was already backported to all branches that were active at the time the issue was closed. You can find the backport issues linked above.

coopstah13 · 2023-11-16T00:04:45Z

Sorry I see that now, I read the last comment about it being verified on 1.27 and assumed that was the only place there was a fix

coopstah13 · 2023-11-16T00:06:56Z

I'm on 1.25.15 and still facing this issue, however

brandond · 2023-11-16T00:20:43Z

Please open a new issue and fill out the issue template, describing specifically what you're running into. The issue described here was resolved in #7718 - I suspect you're running into something different.

github-project-automation bot added this to K3s Development May 16, 2023

github-project-automation bot moved this to New in K3s Development May 16, 2023

rbrtbnfgl self-assigned this May 16, 2023

rbrtbnfgl mentioned this issue May 22, 2023

Ingress not working outside of host on cluster upgrade #7586

Closed

brandond moved this from New to Working in K3s Development May 25, 2023

brandond added this to the v1.27.3+k3s1 milestone May 25, 2023

brandond self-assigned this May 25, 2023

brandond mentioned this issue May 26, 2023

Bump klipper-lb to v0.4.4 #7617

Merged

brandond moved this from Working to To Test in K3s Development May 26, 2023

brandond moved this from To Test to Peer Review in K3s Development May 26, 2023

brandond changed the title ~~ServiceLB/Ingress ports 80 and 443 timeout on host's 127.0.0.1 in k3s versions > v1.25.6+k3s1~~ ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local May 31, 2023

This was referenced May 31, 2023

[release-1.26] ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local #7637

Closed

[release-1.25] ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local #7638

Closed

fmoral2 self-assigned this Jun 1, 2023

brandond mentioned this issue Jun 8, 2023

[release-1.24] ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local #7704

Closed

fmoral2 closed this as completed Jun 12, 2023

github-project-automation bot moved this from Peer Review to Done Issue in K3s Development Jun 12, 2023

fmoral2 mentioned this issue Jun 13, 2023

Dependabot Bump Klipper LB version #7630

Closed

gfrankliu mentioned this issue Jun 20, 2023

The servicelb DaemonSet should support setting hostNetwork #7798

Closed

k3s-io locked as resolved and limited conversation to collaborators Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local #7561

ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local #7561

proligde commented May 16, 2023 •

edited

Loading

brandond commented May 16, 2023

rbrtbnfgl commented May 16, 2023 •

edited

Loading

rbrtbnfgl commented May 23, 2023 •

edited

Loading

rbrtbnfgl commented May 23, 2023

brandond commented May 23, 2023

rbrtbnfgl commented May 24, 2023 •

edited

Loading

brandond commented May 24, 2023 •

edited

Loading

rbrtbnfgl commented May 24, 2023

brandond commented May 24, 2023 •

edited

Loading

rbrtbnfgl commented May 25, 2023

brandond commented May 25, 2023 •

edited

Loading

rbrtbnfgl commented May 26, 2023

brandond commented May 26, 2023 •

edited

Loading

fmoral2 commented Jun 12, 2023

coopstah13 commented Nov 15, 2023

brandond commented Nov 15, 2023

coopstah13 commented Nov 16, 2023

coopstah13 commented Nov 16, 2023

brandond commented Nov 16, 2023 •

edited

Loading

ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local #7561

ServiceLB cannot be accessed via loopback when service ExternalTrafficPolicy=Local #7561

Comments

proligde commented May 16, 2023 • edited Loading

brandond commented May 16, 2023

rbrtbnfgl commented May 16, 2023 • edited Loading

rbrtbnfgl commented May 23, 2023 • edited Loading

rbrtbnfgl commented May 23, 2023

brandond commented May 23, 2023

rbrtbnfgl commented May 24, 2023 • edited Loading

brandond commented May 24, 2023 • edited Loading

rbrtbnfgl commented May 24, 2023

brandond commented May 24, 2023 • edited Loading

rbrtbnfgl commented May 25, 2023

brandond commented May 25, 2023 • edited Loading

rbrtbnfgl commented May 26, 2023

brandond commented May 26, 2023 • edited Loading

fmoral2 commented Jun 12, 2023

Validated on Version:

Environment Details

Steps to Repro the issue:

Steps to Validate the fix:

coopstah13 commented Nov 15, 2023

brandond commented Nov 15, 2023

coopstah13 commented Nov 16, 2023

coopstah13 commented Nov 16, 2023

brandond commented Nov 16, 2023 • edited Loading

proligde commented May 16, 2023 •

edited

Loading

rbrtbnfgl commented May 16, 2023 •

edited

Loading

rbrtbnfgl commented May 23, 2023 •

edited

Loading

rbrtbnfgl commented May 24, 2023 •

edited

Loading

brandond commented May 24, 2023 •

edited

Loading

brandond commented May 24, 2023 •

edited

Loading

brandond commented May 25, 2023 •

edited

Loading

brandond commented May 26, 2023 •

edited

Loading

brandond commented Nov 16, 2023 •

edited

Loading