Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PODs stuck in error "Failed to create pod sandbox" #452

Closed
dkomchenko opened this issue Oct 22, 2024 · 8 comments
Closed

PODs stuck in error "Failed to create pod sandbox" #452

dkomchenko opened this issue Oct 22, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@dkomchenko
Copy link

Hi! I'm trying to install cozystack and after running this command
kubectl apply -f https://github.com/aenix-io/cozystack/raw/v0.17.1/manifests/cozystack-installer.yaml
some pods stuck in error
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "<...>": plugin type="kube-ovn" failed (add): RPC failed; Post "http://dummy/api/v1/add": dial unix /run/openvswitch/kube-ovn-daemon.sock: connect: no such file or directory
kube-ovn-cni pods in Back-off restarting with error Liveness probe failed: dial tcp <node ip>:10665: connect: connection refused

@dosubot dosubot bot added the bug Something isn't working label Oct 22, 2024
@ozhankaraman
Copy link

ozhankaraman commented Oct 22, 2024

yes same issue for me, looks like kubeovn and cilium has a small war there

~/Projects/cozystack/cluster2 ❯ k describe pod coredns-cc8bf9fd8-ng9bz -n kube-system                                                                                             root@melbourne
Name:                 coredns-cc8bf9fd8-ng9bz
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 srv3/192.168.0.130
Start Time:           Tue, 22 Oct 2024 16:50:13 +0000
Labels:               k8s-app=kube-dns
                      pod-template-hash=cc8bf9fd8
Annotations:          ovn.kubernetes.io/allocated: true
                      ovn.kubernetes.io/cidr: 10.244.0.0/16
                      ovn.kubernetes.io/gateway: 10.244.0.1
                      ovn.kubernetes.io/ip_address: 10.244.0.14
                      ovn.kubernetes.io/logical_router: ovn-cluster
                      ovn.kubernetes.io/logical_switch: ovn-default
                      ovn.kubernetes.io/mac_address: 00:00:00:7E:9B:14
                      ovn.kubernetes.io/pod_nic_type: veth-pair
                      ovn.kubernetes.io/routed: true
~/Projects/cozystack/cluster2 ❯ k get nodes                                                                                                                                       root@melbourne
NAME   STATUS   ROLES           AGE   VERSION
srv1   Ready    control-plane   17m   v1.30.1
srv2   Ready    control-plane   17m   v1.30.1
srv3   Ready    control-plane   15m   v1.30.1
~/Projects/cozystack/cluster2 ❯ k get pods -A                                                                                                                                     root@melbourne
NAMESPACE      NAME                                          READY   STATUS              RESTARTS      AGE
cozy-cilium    cilium-dm2s4                                  1/1     Running             0             8m48s
cozy-cilium    cilium-operator-7dff7c98fb-hm6t2              1/1     Running             0             8m48s
cozy-cilium    cilium-operator-7dff7c98fb-tkjbp              1/1     Running             0             8m48s
cozy-cilium    cilium-r7zxp                                  1/1     Running             0             8m48s
cozy-cilium    cilium-wz7qk                                  1/1     Running             0             8m48s
cozy-fluxcd    flux-operator-5d78d48d76-mtb4k                1/1     Running             0             9m4s
cozy-fluxcd    helm-controller-864bb7dbcf-txd9q              0/1     ContainerCreating   0             8m54s
cozy-fluxcd    image-automation-controller-f5bcdfb85-9qstw   0/1     ContainerCreating   0             8m54s
cozy-fluxcd    image-reflector-controller-64d77dcbdf-jdfmr   0/1     ContainerCreating   0             8m54s
cozy-fluxcd    kustomize-controller-69c69c4d7f-w8p9h         0/1     ContainerCreating   0             8m54s
cozy-fluxcd    notification-controller-57dd7757d5-nbx28      0/1     ContainerCreating   0             8m53s
cozy-fluxcd    source-controller-8cdf75cd9-54vf6             0/1     ContainerCreating   0             8m53s
cozy-kubeovn   kube-ovn-cni-gk4pk                            0/1     CrashLoopBackOff    6 (69s ago)   8m45s
cozy-kubeovn   kube-ovn-cni-pl27x                            0/1     CrashLoopBackOff    6 (62s ago)   8m45s
cozy-kubeovn   kube-ovn-cni-q8z25                            0/1     CrashLoopBackOff    6 (62s ago)   8m45s
cozy-kubeovn   kube-ovn-controller-546b5498b4-2bckx          1/1     Running             0             8m45s
cozy-kubeovn   kube-ovn-controller-546b5498b4-7rqxw          1/1     Running             0             8m45s
cozy-kubeovn   kube-ovn-controller-546b5498b4-xlt2m          1/1     Running             0             8m45s
cozy-kubeovn   kube-ovn-monitor-6d954df6db-ffr66             1/1     Running             0             8m45s
cozy-kubeovn   kube-ovn-pinger-7d64j                         0/1     ContainerCreating   0             8m44s
cozy-kubeovn   kube-ovn-pinger-nrx87                         0/1     ContainerCreating   0             8m44s
cozy-kubeovn   kube-ovn-pinger-zhbv8                         0/1     ContainerCreating   0             8m44s
cozy-kubeovn   ovn-central-69b6f4b4dd-fxnwk                  1/1     Running             0             8m45s
cozy-kubeovn   ovn-central-69b6f4b4dd-t5pc5                  1/1     Running             0             8m45s
cozy-kubeovn   ovn-central-69b6f4b4dd-vv948                  1/1     Running             0             8m45s
cozy-kubeovn   ovs-ovn-9m8k9                                 1/1     Running             0             8m45s
cozy-kubeovn   ovs-ovn-fkszj                                 1/1     Running             0             8m45s
cozy-kubeovn   ovs-ovn-pzggz                                 1/1     Running             0             8m45s
cozy-system    cozystack-69894646bd-xdtgs                    2/2     Running             0             9m13s
kube-system    coredns-cc8bf9fd8-cd6hd                       0/1     ContainerCreating   0             5m54s
kube-system    coredns-cc8bf9fd8-ng9bz                       0/1     ContainerCreating   0             4m52s
kube-system    kube-apiserver-srv1                           1/1     Running             0             17m
kube-system    kube-apiserver-srv2                           1/1     Running             0             16m
kube-system    kube-apiserver-srv3                           1/1     Running             0             15m
kube-system    kube-controller-manager-srv1                  1/1     Running             2 (18m ago)   16m
kube-system    kube-controller-manager-srv2                  1/1     Running             0             16m
kube-system    kube-controller-manager-srv3                  1/1     Running             0             15m
kube-system    kube-scheduler-srv1                           1/1     Running             2 (18m ago)   16m
kube-system    kube-scheduler-srv2                           1/1     Running             0             16m
kube-system    kube-scheduler-srv3                           1/1     Running             0             15m

@ozhankaraman
Copy link

by the way I am using Cozystack 0.17.1

@gecube
Copy link
Collaborator

gecube commented Oct 22, 2024

Hey! Please provide more details for debug.

  1. what cozy-stack settings did you use?
  2. what are nodes IP addresses, gw address? Which service / pod CIDR did you choose?
  3. do you see any strange in the logs of failing pods?

@ozhankaraman
Copy link

my settings are below

~/Projects/cozystack/cluster2 ❯ cat cozystack-config.yaml                                                                                                                         root@melbourne
apiVersion: v1
kind: ConfigMap
metadata:
  name: cozystack
  namespace: cozy-system
data:
  bundle-name: "paas-full"
  ipv4-pod-cidr: "10.244.0.0/16"
  ipv4-pod-gateway: "10.244.0.1"
  ipv4-svc-cidr: "10.96.0.0/16"
  ipv4-join-cidr: "100.64.0.0/16"
~/Projects/cozystack/cluster2 ❯ cat patch-controlplane.yaml                                                                                                                       root@melbourne
cluster:
  allowSchedulingOnControlPlanes: true
  controllerManager:
    extraArgs:
      bind-address: 0.0.0.0
  scheduler:
    extraArgs:
      bind-address: 0.0.0.0
  apiServer:
    certSANs:
    - 127.0.0.1
  proxy:
    disabled: true
  discovery:
    enabled: false
  etcd:
    advertisedSubnets:
    - 192.168.0.0/24
~/Projects/cozystack/cluster2 ❯ cat patch.yaml                                                                                                                                    root@melbourne
machine:
  kubelet:
    nodeIP:
      validSubnets:
      - 192.168.0.0/24
    extraConfig:
      maxPods: 512
  kernel:
    modules:
    - name: openvswitch
    - name: drbd
      parameters:
        - usermode_helper=disabled
    - name: zfs
    - name: spl
  install:
    image: ghcr.io/aenix-io/cozystack/talos:v1.8.1
  files:
  - content: |
      [plugins]
        [plugins."io.containerd.grpc.v1.cri"]
          device_ownership_from_security_context = true
        [plugins."io.containerd.cri.v1.runtime"]
          device_ownership_from_security_context = true
    path: /etc/cri/conf.d/20-customization.part
    op: create

cluster:
  network:
    cni:
      name: none
    dnsDomain: cozy.local
    podSubnets:
    - 10.244.0.0/16
    serviceSubnets:
    - 10.96.0.0/16 

k8s node ips

192.168.0.134 - srv1
192.168.0.197 - srv2
192.168.0.130 - srv3

2 weeks ago I installed 0.16.2 with similar setup and it was ok

@kevin880202
Copy link

kevin880202 commented Oct 23, 2024

I have the same issue while installing cozystack 1.7.1.
Downgrading the nodes to talos v1.8.0 solves my problem.
Other configs are same as the getting started doc.

@dkomchenko
Copy link
Author

dkomchenko commented Oct 23, 2024

my patch.yaml

machine:
  kubelet:
    nodeIP:
      validSubnets:
      - 10.33.4.1/22
    extraConfig:
      maxPods: 512
  kernel:
    modules:
    - name: openvswitch
    - name: drbd
      parameters:
        - usermode_helper=disabled
    - name: zfs
    - name: spl
  install:
    image: ghcr.io/aenix-io/cozystack/talos:v1.8.1
  files:
  - content: |
      [plugins]
        [plugins."io.containerd.grpc.v1.cri"]
          device_ownership_from_security_context = true
        [plugins."io.containerd.cri.v1.runtime"]
          device_ownership_from_security_context = true
    path: /etc/cri/conf.d/20-customization.part
    op: create
cluster:
  network:
    cni:
      name: none
    dnsDomain: cozy.local
    podSubnets:
    - 10.244.0.0/16
    serviceSubnets:
    - 10.96.0.0/16

my patch-controlplane.yaml

cluster:
  allowSchedulingOnControlPlanes: true
  controllerManager:
    extraArgs:
      bind-address: 0.0.0.0
  scheduler:
    extraArgs:
      bind-address: 0.0.0.0
  apiServer:
    certSANs:
    - 127.0.0.1
  proxy:
    disabled: true
  discovery:
    enabled: false
  etcd:
    advertisedSubnets:
    - 10.33.4.1/22

node ip's
srv1 - 10.33.5.138
srv2 - 10.33.5.87
srv3 - 10.33.4.22
gw 10.33.4.1

@kvaps
Copy link
Member

kvaps commented Oct 23, 2024

Could you check why kube-ovn-cni pods are in crashloopback?

Check the logs and events kubectl describe pod please

@kvaps
Copy link
Member

kvaps commented Oct 23, 2024

It seems this is related to issue #449, which was fixed in v0.17.1.

Unfortunately, we didn't update the cozystack-installer.yaml manifests, so you're probably still running v0.17.0.

The manifest was updated in #453 (merged this morning). Please try this version, and if it doesn’t work, feel free to reopen the issue.

@kvaps kvaps closed this as completed Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants