Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] New Cluster stuck on ContainerCreating #1449

Open
shanilhirani opened this issue Jun 6, 2024 · 9 comments
Open

[BUG] New Cluster stuck on ContainerCreating #1449

shanilhirani opened this issue Jun 6, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@shanilhirani
Copy link

shanilhirani commented Jun 6, 2024

What did you do

  • How was the cluster created?

    • k3d cluster create mycluster
  • What did you do afterwards?

    • Attempted to deploy a nginx container

What did you expect to happen

I was expecting a deploy of nginx to be started and ready for consumption however upon investigation is appears that mycluster does not appear to start correctly as pods stuck in a containercreating state, they seem to be failing to pull down container images.

NOTE: This issue DOES NOT OCCUR when using K3d 5.6.0, as I have rolled back to this version and the cluster bootstraps fine.

Screenshots or terminal output

 k3d cluster create mycluster
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-mycluster'              
INFO[0000] Created image volume k3d-mycluster-images    
INFO[0000] Starting new tools node...                   
INFO[0000] Starting node 'k3d-mycluster-tools'          
INFO[0001] Creating node 'k3d-mycluster-server-0'       
INFO[0001] Creating LoadBalancer 'k3d-mycluster-serverlb' 
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] Starting new tools node...                   
INFO[0001] Starting node 'k3d-mycluster-tools'          
INFO[0002] Starting cluster 'mycluster'                 
INFO[0002] Starting servers...                          
INFO[0002] Starting node 'k3d-mycluster-server-0'       
INFO[0006] All agents already running.                  
INFO[0006] Starting helpers...                          
INFO[0006] Starting node 'k3d-mycluster-serverlb'       
INFO[0012] Injecting records for hostAliases (incl. host.k3d.internal) and for 3 network members into CoreDNS configmap... 
INFO[0015] Cluster 'mycluster' created successfully!    
INFO[0015] You can now use it like this:                
kubectl cluster-info
kubectl cluster-info
Kubernetes control plane is running at https://0.0.0.0:49584
CoreDNS is running at https://0.0.0.0:49584/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://0.0.0.0:49584/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy
k get nodes
NAME                     STATUS   ROLES                  AGE     VERSION
k3d-mycluster-server-0   Ready    control-plane,master   2m10s   v1.28.8+k3s1
k get pods --all-namespaces
NAMESPACE     NAME                                      READY   STATUS              RESTARTS   AGE
kube-system   helm-install-traefik-crd-svjd2            0/1     ContainerCreating   0          8m46s
kube-system   helm-install-traefik-tbc2t                0/1     ContainerCreating   0          8m46s
kube-system   coredns-6799fbcd5-8mqf4                   0/1     ContainerCreating   0          8m46s
kube-system   metrics-server-54fd9b65b-4fqhg            0/1     ContainerCreating   0          8m46s
kube-system   local-path-provisioner-6c86858495-25nvr   0/1     ContainerCreating   0          8m46s
k events --all-namespaces
NAMESPACE     LAST SEEN              TYPE      REASON                           OBJECT                                         MESSAGE
default       14m                    Normal    Starting                         Node/k3d-mycluster-server-0                    Starting kubelet.
default       14m                    Warning   InvalidDiskCapacity              Node/k3d-mycluster-server-0                    invalid capacity 0 on image filesystem
default       14m (x2 over 14m)      Normal    NodeHasSufficientMemory          Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 status is now: NodeHasSufficientMemory
default       14m (x2 over 14m)      Normal    NodeHasNoDiskPressure            Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 status is now: NodeHasNoDiskPressure
default       14m (x2 over 14m)      Normal    NodeHasSufficientPID             Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 status is now: NodeHasSufficientPID
default       14m                    Normal    NodeAllocatableEnforced          Node/k3d-mycluster-server-0                    Updated Node Allocatable limit across pods
default       14m                    Normal    NodeReady                        Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 status is now: NodeReady
kube-system   14m                    Normal    ApplyingManifest                 Addon/auth-delegator                           Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/auth-delegator.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/ccm                                      Applying manifest at "/var/lib/rancher/k3s/server/manifests/ccm.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/ccm                                      Applied manifest at "/var/lib/rancher/k3s/server/manifests/ccm.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/local-storage                            Applying manifest at "/var/lib/rancher/k3s/server/manifests/local-storage.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/local-storage                            Applied manifest at "/var/lib/rancher/k3s/server/manifests/local-storage.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/aggregated-metrics-reader                Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/aggregated-metrics-reader.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/aggregated-metrics-reader                Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/aggregated-metrics-reader.yaml"
default       14m                    Normal    NodePasswordValidationComplete   Node/k3d-mycluster-server-0                    Deferred node password secret validation complete
kube-system   14m                    Normal    AppliedManifest                  Addon/auth-delegator                           Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/auth-delegator.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/auth-reader                              Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/auth-reader.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/auth-reader                              Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/auth-reader.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/metrics-apiservice                       Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-apiservice.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/metrics-apiservice                       Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-apiservice.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/metrics-server-deployment                Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-deployment.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/metrics-server-deployment                Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-deployment.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/metrics-server-service                   Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-service.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/metrics-server-service                   Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-service.yaml"
default       14m                    Normal    Starting                         Node/k3d-mycluster-server-0                    
kube-system   14m                    Normal    ApplyingManifest                 Addon/resource-reader                          Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/resource-reader.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/resource-reader                          Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/resource-reader.yaml"
default       14m                    Normal    Synced                           Node/k3d-mycluster-server-0                    Node synced successfully
kube-system   14m                    Normal    ApplyingManifest                 Addon/rolebindings                             Applying manifest at "/var/lib/rancher/k3s/server/manifests/rolebindings.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/rolebindings                             Applied manifest at "/var/lib/rancher/k3s/server/manifests/rolebindings.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/runtimes                                 Applying manifest at "/var/lib/rancher/k3s/server/manifests/runtimes.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/runtimes                                 Applied manifest at "/var/lib/rancher/k3s/server/manifests/runtimes.yaml"
kube-system   14m (x3 over 14m)      Normal    ApplyJob                         HelmChart/traefik-crd                          Applying HelmChart using Job kube-system/helm-install-traefik-crd
kube-system   14m (x4 over 14m)      Normal    ApplyJob                         HelmChart/traefik                              Applying HelmChart using Job kube-system/helm-install-traefik
kube-system   14m                    Normal    ApplyingManifest                 Addon/traefik                                  Applying manifest at "/var/lib/rancher/k3s/server/manifests/traefik.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/traefik                                  Applied manifest at "/var/lib/rancher/k3s/server/manifests/traefik.yaml"
default       14m                    Normal    RegisteredNode                   Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 event: Registered Node k3d-mycluster-server-0 in Controller
kube-system   14m                    Normal    ScalingReplicaSet                Deployment/coredns                             Scaled up replica set coredns-6799fbcd5 to 1
kube-system   14m                    Normal    SuccessfulCreate                 ReplicaSet/coredns-6799fbcd5                   Created pod: coredns-6799fbcd5-8mqf4
kube-system   14m                    Normal    SuccessfulCreate                 ReplicaSet/local-path-provisioner-6c86858495   Created pod: local-path-provisioner-6c86858495-25nvr
kube-system   14m                    Normal    SuccessfulCreate                 ReplicaSet/metrics-server-54fd9b65b            Created pod: metrics-server-54fd9b65b-4fqhg
kube-system   14m                    Normal    ScalingReplicaSet                Deployment/metrics-server                      Scaled up replica set metrics-server-54fd9b65b to 1
kube-system   14m                    Normal    SuccessfulCreate                 Job/helm-install-traefik-crd                   Created pod: helm-install-traefik-crd-svjd2
kube-system   14m                    Normal    ScalingReplicaSet                Deployment/local-path-provisioner              Scaled up replica set local-path-provisioner-6c86858495 to 1
kube-system   14m                    Normal    SuccessfulCreate                 Job/helm-install-traefik                       Created pod: helm-install-traefik-tbc2t
kube-system   14m                    Warning   FailedScheduling                 Pod/coredns-6799fbcd5-8mqf4                    0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
kube-system   14m                    Warning   FailedScheduling                 Pod/local-path-provisioner-6c86858495-25nvr    0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
kube-system   14m                    Warning   FailedScheduling                 Pod/metrics-server-54fd9b65b-4fqhg             0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
kube-system   14m                    Normal    Scheduled                        Pod/helm-install-traefik-crd-svjd2             Successfully assigned kube-system/helm-install-traefik-crd-svjd2 to k3d-mycluster-server-0
kube-system   14m                    Normal    Scheduled                        Pod/helm-install-traefik-tbc2t                 Successfully assigned kube-system/helm-install-traefik-tbc2t to k3d-mycluster-server-0
kube-system   14m                    Normal    Scheduled                        Pod/coredns-6799fbcd5-8mqf4                    Successfully assigned kube-system/coredns-6799fbcd5-8mqf4 to k3d-mycluster-server-0
kube-system   14m                    Normal    Scheduled                        Pod/metrics-server-54fd9b65b-4fqhg             Successfully assigned kube-system/metrics-server-54fd9b65b-4fqhg to k3d-mycluster-server-0
kube-system   14m                    Normal    Scheduled                        Pod/local-path-provisioner-6c86858495-25nvr    Successfully assigned kube-system/local-path-provisioner-6c86858495-25nvr to k3d-mycluster-server-0
kube-system   14m (x2 over 14m)      Normal    ApplyingManifest                 Addon/coredns                                  Applying manifest at "/var/lib/rancher/k3s/server/manifests/coredns.yaml"
kube-system   14m (x2 over 14m)      Normal    AppliedManifest                  Addon/coredns                                  Applied manifest at "/var/lib/rancher/k3s/server/manifests/coredns.yaml"
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/helm-install-traefik-crd-svjd2             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/metrics-server-54fd9b65b-4fqhg             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/local-path-provisioner-6c86858495-25nvr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/helm-install-traefik-tbc2t                 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/coredns-6799fbcd5-8mqf4                    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
❯ k describe pods --all-namespaces
Name:             helm-install-traefik-crd-svjd2
Namespace:        kube-system
Priority:         0
Service Account:  helm-traefik-crd
Node:             k3d-mycluster-server-0/172.18.0.3
Start Time:       Thu, 06 Jun 2024 13:25:04 +0100
Labels:           batch.kubernetes.io/controller-uid=784304da-3a35-4ea0-a851-0b8b4ef1faad
                  batch.kubernetes.io/job-name=helm-install-traefik-crd
                  controller-uid=784304da-3a35-4ea0-a851-0b8b4ef1faad
                  helmcharts.helm.cattle.io/chart=traefik-crd
                  job-name=helm-install-traefik-crd
Annotations:      helmcharts.helm.cattle.io/configHash: SHA256=E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855
Status:           Pending
SeccompProfile:   RuntimeDefault
IP:               
IPs:              <none>
Controlled By:    Job/helm-install-traefik-crd
Containers:
  helm:
    Container ID:  
    Image:         rancher/klipper-helm:v0.8.3-build20240228
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      install
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      NAME:                   traefik-crd
      VERSION:                
      REPO:                   
      HELM_DRIVER:            secret
      CHART_NAMESPACE:        kube-system
      CHART:                  https://%{KUBERNETES_API}%/static/charts/traefik-crd-25.0.2+up25.0.0.tgz
      HELM_VERSION:           
      TARGET_NAMESPACE:       kube-system
      AUTH_PASS_CREDENTIALS:  false
      NO_PROXY:               .svc,.cluster.local,10.42.0.0/16,10.43.0.0/16
      FAILURE_POLICY:         reinstall
    Mounts:
      /chart from content (rw)
      /config from values (rw)
      /home/klipper-helm/.cache from klipper-cache (rw)
      /home/klipper-helm/.config from klipper-config (rw)
      /home/klipper-helm/.helm from klipper-helm (rw)
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rsvf7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  klipper-helm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  klipper-cache:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  klipper-config:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  values:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  chart-values-traefik-crd
    Optional:    false
  content:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-content-traefik-crd
    Optional:  false
  kube-api-access-rsvf7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/helm-install-traefik-crd-svjd2 to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again


Name:             helm-install-traefik-tbc2t
Namespace:        kube-system
Priority:         0
Service Account:  helm-traefik
Node:             k3d-mycluster-server-0/172.18.0.3
Start Time:       Thu, 06 Jun 2024 13:25:04 +0100
Labels:           batch.kubernetes.io/controller-uid=da083364-1afc-4baf-8e36-08abc3161832
                  batch.kubernetes.io/job-name=helm-install-traefik
                  controller-uid=da083364-1afc-4baf-8e36-08abc3161832
                  helmcharts.helm.cattle.io/chart=traefik
                  job-name=helm-install-traefik
Annotations:      helmcharts.helm.cattle.io/configHash: SHA256=2C8876269AFB411F60BCDA289A1957C0126147D80F1B0AC6BD2C43C10FE296E9
Status:           Pending
SeccompProfile:   RuntimeDefault
IP:               
IPs:              <none>
Controlled By:    Job/helm-install-traefik
Containers:
  helm:
    Container ID:  
    Image:         rancher/klipper-helm:v0.8.3-build20240228
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      install
      --set-string
      global.systemDefaultRegistry=
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      NAME:                   traefik
      VERSION:                
      REPO:                   
      HELM_DRIVER:            secret
      CHART_NAMESPACE:        kube-system
      CHART:                  https://%{KUBERNETES_API}%/static/charts/traefik-25.0.2+up25.0.0.tgz
      HELM_VERSION:           
      TARGET_NAMESPACE:       kube-system
      AUTH_PASS_CREDENTIALS:  false
      NO_PROXY:               .svc,.cluster.local,10.42.0.0/16,10.43.0.0/16
      FAILURE_POLICY:         reinstall
    Mounts:
      /chart from content (rw)
      /config from values (rw)
      /home/klipper-helm/.cache from klipper-cache (rw)
      /home/klipper-helm/.config from klipper-config (rw)
      /home/klipper-helm/.helm from klipper-helm (rw)
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zl645 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  klipper-helm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  klipper-cache:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  klipper-config:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  values:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  chart-values-traefik
    Optional:    false
  content:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-content-traefik
    Optional:  false
  kube-api-access-zl645:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/helm-install-traefik-tbc2t to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again


Name:                 coredns-6799fbcd5-8mqf4
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 k3d-mycluster-server-0/172.18.0.3
Start Time:           Thu, 06 Jun 2024 13:25:05 +0100
Labels:               k8s-app=kube-dns
                      pod-template-hash=6799fbcd5
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-6799fbcd5
Containers:
  coredns:
    Container ID:  
    Image:         rancher/mirrored-coredns-coredns:1.10.1
    Image ID:      
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /etc/coredns/custom from custom-config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c4xng (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  custom-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns-custom
    Optional:  true
  kube-api-access-c4xng:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               kubernetes.io/os=linux
Tolerations:                  CriticalAddonsOnly op=Exists
                              node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                              node-role.kubernetes.io/master:NoSchedule op=Exists
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector k8s-app=kube-dns
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Warning  FailedScheduling        20m                default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/coredns-6799fbcd5-8mqf4 to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again


Name:                 metrics-server-54fd9b65b-4fqhg
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      metrics-server
Node:                 k3d-mycluster-server-0/172.18.0.3
Start Time:           Thu, 06 Jun 2024 13:25:05 +0100
Labels:               k8s-app=metrics-server
                      pod-template-hash=54fd9b65b
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/metrics-server-54fd9b65b
Containers:
  metrics-server:
    Container ID:  
    Image:         rancher/mirrored-metrics-server:v0.7.0
    Image ID:      
    Port:          10250/TCP
    Host Port:     0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=10250
      --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      --kubelet-use-node-status-port
      --metric-resolution=15s
      --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get https://:https/livez delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get https://:https/readyz delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vqc8w (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-vqc8w:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Warning  FailedScheduling        20m                default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/metrics-server-54fd9b65b-4fqhg to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again


Name:                 local-path-provisioner-6c86858495-25nvr
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      local-path-provisioner-service-account
Node:                 k3d-mycluster-server-0/172.18.0.3
Start Time:           Thu, 06 Jun 2024 13:25:05 +0100
Labels:               app=local-path-provisioner
                      pod-template-hash=6c86858495
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/local-path-provisioner-6c86858495
Containers:
  local-path-provisioner:
    Container ID:  
    Image:         rancher/local-path-provisioner:v0.0.26
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      local-path-provisioner
      start
      --config
      /etc/config/config.json
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      POD_NAMESPACE:  kube-system (v1:metadata.namespace)
    Mounts:
      /etc/config/ from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hg64p (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      local-path-config
    Optional:  false
  kube-api-access-hg64p:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Warning  FailedScheduling        20m                default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/local-path-provisioner-6c86858495-25nvr to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again

Which OS & Architecture

  • output of k3d runtime-info
arch: aarch64
cgroupdriver: cgroupfs
cgroupversion: "2"
endpoint: /var/run/docker.sock
filesystem: extfs
infoname: colima
name: docker
os: Ubuntu 24.04 LTS
ostype: linux
version: 26.1.1

Which version of k3d

  • output of k3d version
k3d version v5.6.3
k3s version v1.28.8-k3s1 (default)

Which version of docker

  • output of docker version and docker info
docker version
Client: Docker Engine - Community
 Version:           26.1.3
 API version:       1.45
 Go version:        go1.22.3
 Git commit:        b72abbb6f0
 Built:             Thu May 16 07:47:24 2024
 OS/Arch:           darwin/arm64
 Context:           colima

Server: Docker Engine - Community
 Engine:
  Version:          26.1.1
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.9
  Git commit:       ac2de55
  Built:            Tue Apr 30 11:48:47 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.31
  GitCommit:        e377cd56a71523140ca6ae87e30244719194a521
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Client: Docker Engine - Community
 Version:    26.1.3
 Context:    colima
 Debug Mode: false
 Plugins:
  compose: Docker Compose (Docker Inc.)
    Version:  2.27.1
    Path:     /Users/$USER/.docker/cli-plugins/docker-compose

Server:
 Containers: 3
  Running: 3
  Paused: 0
  Stopped: 0
 Images: 8
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-31-generic
 Operating System: Ubuntu 24.04 LTS
 OSType: linux
 Architecture: aarch64
 CPUs: 2
 Total Memory: 1.91GiB
 Name: colima
 ID: a09eda6a-75aa-4810-960d-0718469dc07d
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: $USER
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
@shanilhirani shanilhirani added the bug Something isn't working label Jun 6, 2024
@shanilhirani shanilhirani changed the title [BUG] ContainerCreating stuck on k3d 5.6.3 [BUG] New Cluster stuck on ContainerCreating Jun 6, 2024
@crobby
Copy link

crobby commented Jun 6, 2024

fwiw, I'm seeing this same issue starting today.
It was working correctly 2 days ago.

@crobby
Copy link

crobby commented Jun 6, 2024

fwiw, I'm seeing this same issue starting today. It was working correctly 2 days ago.

In my case, this was solved by disconnecting from my VPN. The docker container logs pointed me toward a networking issue, which it seems to be for me.

@shanilhirani
Copy link
Author

Yeah, it's not a network issue for me as I've experienced this on two devices. Just simply rolling back seems to work without other changes so it's difficult to work out this issue.

I've had a look at changing the k3s rancher image to see if this helps but no changes in behaviour.

K3d 5.6.3 seems to something strange about how it's mapping DNS in the container.

@adriaanm
Copy link

adriaanm commented Jun 7, 2024

Same here. 5.6.0 works but 5.6.2 does not (nor does 5.6.3). It seems to be using the wrong nameserver in /etc/resolv.conf inside the k3d container:

❯ k3d --version                                                                                                                                           ✘ 1 
k3d version v5.6.2
k3s version v1.28.8-k3s1 (default)

~/g/sandbox dev*
❯ colima ssh
me@colima:/Users/me/g/sandbox$ docker exec -it k3d-local-server-0 sh
/ # cat /etc/resolv.conf
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 192.168.5.2
search fritz.box
options ndots:0

# Based on host file: '/run/systemd/resolve/resolv.conf' (internal resolver)
# ExtServers: [192.168.5.2]
# Overrides: []
# Option ndots from: internal
/ # nslookup google.com
;; connection timed out; no servers could be reached

Rolling back to 5.6.0 (note how the nameserver is rewritten to 127.0.0.11):

❯ k3d --version
k3d version v5.6.0
k3s version v1.27.4-k3s1 (default)

❯ colima ssh
me@colima:/Users/me/g/sandbox$ docker exec -it k3d-local-server-0 sh
/ # cat /etc/resolv.conf
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 127.0.0.11
search fritz.box
options ndots:0

# Based on host file: '/run/systemd/resolve/resolv.conf' (internal resolver)
# ExtServers: [192.168.5.2]
# Overrides: []
# Option ndots from: internal
/ # nslookup google.com
Server:		127.0.0.11
Address:	127.0.0.11:53

Non-authoritative answer:

Non-authoritative answer:
Name:	google.com
Address: 142.250.203.110

@adriaanm
Copy link

adriaanm commented Jun 7, 2024

Fixed for me by disabling the dns fix when creating the cluster: K3D_FIX_DNS=0 k3d cluster create local

@shanilhirani
Copy link
Author

Fixed for me by disabling the dns fix when creating the cluster: K3D_FIX_DNS=0 k3d cluster create local

@adriaanm - This workaround suggested seems to have worked.

kubectl get pods --all-namespaces
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-6c86858495-kzdnd   1/1     Running     0          60s
kube-system   coredns-6799fbcd5-mcv7r                   1/1     Running     0          60s
kube-system   helm-install-traefik-crd-jgxlg            0/1     Completed   0          60s
kube-system   svclb-traefik-dc19675c-hm7d6              2/2     Running     0          35s
kube-system   helm-install-traefik-247hw                0/1     Completed   1          60s
kube-system   metrics-server-54fd9b65b-whff8            1/1     Running     0          60s
kube-system   traefik-f4564c4f4-k9b8v                   1/1     Running     0          35s

It would be good if was documented somewhere.

@2fxprogeeme
Copy link

Hi,

have a look at #1445, that might describe the reason why the use of K3D_FIX_DNS=0 is a workaround for this problem.

@nelyodev
Copy link

I used this workaround K3D_FIX_DNS=0 but after stopping and restarting the cluster once it didn't seem to work anymore. Well, I didn't want to lose my experimental cluster so I dove into /etc/resolv.conf and found out that there is a wrong IP (it was the one of my colima vm). I just replaced it with my real nameserver. MacOS / colima here btw

@bhavanki
Copy link

I'm seeing the same issue with colima on macOS. Adding K3D_FIX_DNS=0 resolves the problem. Notably, the problem doesn't happen when I use Docker Desktop; things are fine there with or without disabling the DNS fix.

When the DNS fix is enabled under colima, /etc/resolv.conf uses 192.168.5.2 for the nameserver, which is the loopback address for the lima host. This seems like it should work, and on other non-k3d containers it actually does, but on a k3d agent or server container it's unreachable. I see that the fix script makes changes to iptables, so maybe the cause lies there.

When the DNS fix is disabled (K3D_FIX_DNS=0) under colima, /etc/resolv.conf uses 127.0.0.11 for the nameserver, which is the Docker daemon's resolver, and lookups work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants