Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod Connectivity Issue: Using Port Names Instead of Port Numbers in Network Policies #7391

Closed
irrandon opened this issue May 2, 2023 · 7 comments
Assignees
Milestone

Comments

@irrandon
Copy link

irrandon commented May 2, 2023

Environmental Info:

K3s Version:

[root@cloud-k3s-node1 ~]# k3s -v
k3s version v1.26.4+k3s1 (8d0255af)
go version go1.19.8

Node(s) CPU architecture, OS, and Version:

[root@cloud-k3s-node1 ~]# uname -a
Linux cloud-k3s-node1 4.18.0-425.19.2.el8_7.x86_64 #1 SMP Tue Apr 4 05:30:47 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

  • 1 Server 2 Agents
  • Firewalld is disabled
  • Air gapped environment
  • --prefer-bundled-bin flag is set

Describe the bug:

Sometimes it's not possible to establish a connection with one of the replicas of a deployment, when there are more than one, due to activated network policies. This issue occurs only when the network policy has the port name set in our case http. However, when the port number is specified, the connection works without any problems. This behavior is not consistent, as it doesn't always happen. Despite checking the iptables rules, no relevant differences were found between the two cases.
The issue only occurs when there are multiple replicas, as with a single pod the issue is not observed.

Steps To Reproduce:

Custom systemd unit server:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server --https-listen-port 6443 --data-dir /data/k3s --prefer-bundled-bin

k3s.service.env:

K3S_TOKEN="myToken"

Custom systemd unit agent:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    agent --data-dir /data/k3s --prefer-bundled-bin

k3s.service.env:

K3S_URL="https://server.local:6443"
K3S_TOKEN="myToken"

  • Disable firewalld
  • Installed K3s with custom systemd unit
  • Create a deployment with at least two replicas
  • Create a network policy like this:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-myApp-web
  namespace: myNamespace
spec:
  egress:
  - ports:
    - port: 5432
      protocol: TCP
    to:
    - ipBlock:
        cidr: 10.10.10.10/32
  - ports:
    - port: 25
      protocol: TCP
    to:
    - ipBlock:
        cidr: 10.10.10.11/32
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          app.kubernetes.io/name: traefik
    ports:
    - port: http
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/component: web
      app.kubernetes.io/instance: web
      app.kubernetes.io/name: myApp
  policyTypes:
  - Egress
  - Ingress

Expected behavior:

A connection is always possible with more than 1 replica when the port name is set in the network polices.

Actual behavior:

With more than 1 replica, its sometime not possible to establish a connection with one of them.

@rbrtbnfgl
Copy link
Contributor

It seems strange that it doesn't work only when the port is specified with name.
I'll take a look at this to check if it's K3s or kube-router related.

@rbrtbnfgl rbrtbnfgl self-assigned this May 2, 2023
@rbrtbnfgl rbrtbnfgl moved this from New to Next Up in K3s Development May 2, 2023
@rbrtbnfgl
Copy link
Contributor

Could you give more info of your setup? How I read from the policy you are allowing only the traffic from traefik to the web service (I imagine that your two replicas deployment is the web service) and also the egress traffic allowed is to who?

@irrandon
Copy link
Author

irrandon commented May 4, 2023

Yes, the application is a web service that requires access to a PostgreSQL and a Postfix instance located outside of our cluster. To enable this access, we have created two egress rules specifically for these instances.

@rbrtbnfgl
Copy link
Contributor

How do you access the service? Did you create an Ingress resource?

@irrandon
Copy link
Author

irrandon commented May 4, 2023

Yeah there is a Ingress resource. But it not only occurs with traefik.

@rbrtbnfgl
Copy link
Contributor

Are you contacting the web service from a pod or from a node? Are you using the Traefik service IP?

@est-suse
Copy link
Contributor

Validated on branch with commit / version
Environment Details

Validated on Master Branch 1.27 10fb39ae60abfc60592497903ba734c7e9a17bdf
k3s version v1.27.1+k3s-10fb39ae (10fb39ae)
go version go1.20.3

Infrastructure

[X ] Cloud
Hosted
Node(s) CPU architecture, OS, and Version:

Linux ip-172-31-33-29 5.15.0-1033-aws #37~20.04.1-Ubuntu SMP Fri Mar 17 11:39:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Cluster Configuration:

1 server

Config.yaml:

write-kubeconfig-mode: 644
token: test

Testing Steps:

1. Copy config.yaml
2 sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
curl -sfL https://get.k3s.io | INSTALL_K3S_COMMIT=10fb39ae60abfc60592497903ba734c7e9a17bdf sh -

Replication Results:

k3s version used for replication:

Run the script:

#!/bin/bash
set -euo pipefail

kubectl delete sts/wordpress || true
kubectl delete deployment/busybox || true
kubectl delete service/wordpress-service || true
kubectl delete networkpolicy/wordpress-network-policy || true

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: wordpress
spec:
  serviceName: wordpress-service
  replicas: 2
  selector:
    matchLabels:
      app: wordpress
  template:
    metadata:
      labels:
        app: wordpress
    spec:
      containers:
      - name: wordpress
        image: wordpress
        ports:
        - name: http
          containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
spec:
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      containers:
      - name: busybox
        image: busybox
        command:
        - sleep
        - infinity
---
apiVersion: v1
kind: Service
metadata:
  name: wordpress-service
spec:
  ports:
    - name: http
      port: 80
      targetPort: http
  clusterIP: None
  selector:
    app: wordpress
EOF

kubectl rollout status sts/wordpress
kubectl rollout status deployment/busybox
! (
    kubectl exec -it deployment/busybox -- wget -O - wordpress-0.wordpress-service.default.svc.cluster.local >/dev/null || echo "failed"
    kubectl exec -it deployment/busybox -- wget -O - wordpress-1.wordpress-service.default.svc.cluster.local >/dev/null || echo "failed"
) | grep failed

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: wordpress-network-policy
spec:
  podSelector:
    matchLabels:
      app: wordpress
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: busybox
    ports:
    - port: http
EOF

sleep 30
! (
    kubectl exec -it deployment/busybox -- wget -O - wordpress-0.wordpress-service.default.svc.cluster.local >/dev/null || echo "failed"
    kubectl exec -it deployment/busybox -- wget -O - wordpress-1.wordpress-service.default.svc.cluster.local >/dev/null || echo "failed"
) | grep failed1

Validation Results:

Validated the script ran sucessfully:

statefulset.apps/wordpress created
deployment.apps/busybox created
service/wordpress-service created
Waiting for 2 pods to be ready...
Waiting for 1 pods to be ready...
partitioned roll out complete: 2 new pods have been updated...
deployment "busybox" successfully rolled out
networkpolicy.networking.k8s.io/wordpress-network-policy created


NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-957fdf8bc-dhx5f   1/1     Running     0          7m40s
kube-system   coredns-77ccd57875-q2q2j                 1/1     Running     0          7m40s
kube-system   helm-install-traefik-crd-fbc7g           0/1     Completed   0          7m41s
kube-system   helm-install-traefik-pdqv7               0/1     Completed   1          7m41s
kube-system   svclb-traefik-53b1fa17-5dlqt             2/2     Running     0          7m27s
kube-system   traefik-64f55bb67d-ktncm                 1/1     Running     0          7m28s
kube-system   metrics-server-54dc485875-f9k8z          1/1     Running     0          7m40s
default       busybox-f7db5bc95-tc5ws                  1/1     Running     0          119s
default       wordpress-0                              1/1     Running     0          119s
default       wordpress-1                              1/1     Running     0          108s

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants