Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Created with kind Fails to Mount containerd HostPath #83

Closed
aauren opened this issue Apr 5, 2023 · 17 comments
Closed

Cluster Created with kind Fails to Mount containerd HostPath #83

aauren opened this issue Apr 5, 2023 · 17 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@aauren
Copy link

aauren commented Apr 5, 2023

What steps did you take and what happened:
When creating a kubemark cluster using kind and capd kubemark pods stay in ContainerCreating status with an error in the description saying that they have FailedMount:

Warning  FailedMount  10s (x6 over 26s)  kubelet            MountVolume.SetUp failed for volume "containerd-sock" : hostPath type check failed: unix:///run/containerd/containerd.sock is not a socket file

Going into the kubelet container in docker shows that the file exists and is a socket:

% kubectl get pods -o wide -n default
NAME                                 READY   STATUS              RESTARTS   AGE     IP       NODE              NOMINATED NODE   READINESS GATES
kube-node-mgmt-kubemark-md-0-62h9n   0/1     ContainerCreating   0          5m12s   <none>   kubemark-worker   <none>           <none>
kube-node-mgmt-kubemark-md-0-kks4r   0/1     ContainerCreating   0          5m12s   <none>   kubemark-worker   <none>           <none>
kube-node-mgmt-kubemark-md-0-mkt6m   0/1     ContainerCreating   0          5m12s   <none>   kubemark-worker   <none>           <none>
kube-node-mgmt-kubemark-md-0-wpdhp   0/1     ContainerCreating   0          5m12s   <none>   kubemark-worker   <none>           <none>

% docker exec -it kubemark-worker /bin/bash

root@kubemark-worker:/# ls -l /run/containerd/containerd.sock
srw-rw---- 1 root root 0 Apr  5 20:15 /run/containerd/containerd.sock

Steps to Reproduce:

  1. Add kubemark provider to clusterctl.yaml:
providers:
- name: "kubemark"
  url: "https://github.com/kubernetes-sigs/cluster-api-provider-kubemark/releases/v0.5.0/infrastructure-components.yaml"
  type: "InfrastructureProvider"
  1. Create kind cluster config:
% cat kubemark.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: kubemark
nodes:
- role: control-plane
  extraMounts:
  - containerPath: /var/run/docker.sock
    hostPath: /var/run/docker.sock
- role: worker
  extraMounts:
  - containerPath: /var/run/docker.sock
    hostPath: /var/run/docker.sock
  1. Create kind cluster using config:
kind create cluster --config kubemark.yaml
  1. Initialize CAPI cluster:
export CLUSTER_TOPOLOGY=true
clusterctl init --infrastructure kubemark,docker
  1. Wait until CAPI cluster pods are fully deployed
  2. Create kubeadm cluster and apply it
export SERVICE_CIDR=["172.17.0.0/16"]
export POD_CIDR=["192.168.122.0/24"]
clusterctl generate cluster kube-node-mgmt --infrastructure kubemark --flavor capd --kubernetes-version 1.26.3 --control-plane-machine-count=1 --worker-machine-count=4 | kubectl apply -f-
  1. Wait for CAPD controller to launch containers and for pods to be created in the default namespace and then watch them stop at ContainerCreating

What did you expect to happen:

I expected the kubemark / CAPD cluster to come up and for pods to enter running state

Anything else you would like to add:

I tried using minikube instead of kind to create the cluster and ran into the same issue with the containerd socket not mounting.

I was originally using Kubernetes 1.23.X to test against, but found the original issue where CAPD was switched to use the unix:/// style socket specification in the HostMount and it mentioned problems with 1.24.X versions of k8s so I switched to 1.26.3. But no matter what I try I can't seem to get past this error: kubernetes-sigs/cluster-api#6155

I'm using Docker version: 23.0.1

Environment:

  • cluster-api version:
% clusterctl version
clusterctl version: &version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.1", GitCommit:"39d87e91080088327c738c43f39e46a7f557d03b", GitTreeState:"clean", BuildDate:"2023-04-04T17:31:43Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
  • cluster-api-provider-kubemark version: v0.5.0
  • Kubernetes version: (use kubectl version):
% kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.26.3
Kustomize Version: v4.5.7
Server Version: v1.26.3
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.2

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubemark/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 5, 2023
@aauren
Copy link
Author

aauren commented Apr 5, 2023

I've noticed that if I manually edit the generated clusterctl cluster config before applying it and remove the type and format of the HostMount that I can get past this issue:

Before:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: KubemarkMachineTemplate
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: kube-node-mgmt
  name: kube-node-mgmt-kubemark-md-0
  namespace: default
spec:
  template:
    spec:
      extraMounts:
      - containerPath: unix:///run/containerd/containerd.sock
        hostPath: unix:///run/containerd/containerd.sock
        name: containerd-sock
        type: Socket

After:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: KubemarkMachineTemplate
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: kube-node-mgmt
  name: kube-node-mgmt-kubemark-md-0
  namespace: default
spec:
  template:
    spec:
      extraMounts:
      - containerPath: /run/containerd/containerd.sock
        hostPath: /run/containerd/containerd.sock
        name: containerd-sock

This is a separate issue, but I also found that I was not able to use Kubernetes version v1.26.3 because kubemark didn't have an image built for this version published here: https://quay.io/repository/elmiko/kubemark?tab=tags

My previous commands ended up with ErrImagePull

Instead, I needed to find a version that was supported by both kind and kubemark by cross-referencing the following image repositories:

I ended up settling on v1.25.3 as a baseline between the two systems.

@elmiko
Copy link
Contributor

elmiko commented Apr 5, 2023

interesting, i have not hit this yet, but perhaps we need to update those templates for the current versions?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2023
@aauren
Copy link
Author

aauren commented Jul 5, 2023

I know that this is at least still a problem for me. I currently run all of the manifests that clusterctl generate creates via a sed pipe that removes the type: Socket from the manifest before piping it into kubectl apply -f -.

This should be a super simple fix, I'm guessing that we just remove the line from here? https://github.com/kubernetes-sigs/cluster-api-provider-kubemark/blob/main/templates/cluster-template-capd.yaml#L120

However, it would be good to know that someone else is able to confirm this problem and that it isn't just something different about my local environment before changing something like this.

@aauren
Copy link
Author

aauren commented Jul 5, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2023
@elmiko
Copy link
Contributor

elmiko commented Jul 21, 2023

@aauren i'm assuming that this is still an issue for you?

i have been updating the make test-e2e target (which deploys a kubemark cluster) on ubuntu 22.0.4 and i'm not seeing this issue arise. i did confirm that the type: Socket field exists in my manifests. i will try testing with removing that line, perhaps it is superfluous.

@aauren
Copy link
Author

aauren commented Jul 21, 2023

Yup. Still an issue for me. I'm very open to the idea that something is just off about my setup. However, if it is superfluous and you're willing to remove it, that would help me a lot also.

@elmiko
Copy link
Contributor

elmiko commented Jul 24, 2023

i'll give it a try without the type field and see what happens, if we can remove it then i'm not opposed. although i am curious how your ubuntu is set up that is different than mine. i've been doing default ubuntu server 22.04 installs and then running an ansible script to configure the server. i have a feeling that maybe it's just a local config issue, but i'm not an ubuntu expert.

@aauren
Copy link
Author

aauren commented Jul 24, 2023

So I can tell you the process that I have been using to use kubemark:

  1. Add provider to \~/.cluster-api/clusterctl.yaml:
providers:
- name: "kubemark"
  url: "https://github.com/kubernetes-sigs/cluster-api-provider-kubemark/releases/v0.5.0/infrastructure-components.yaml"
  type: "InfrastructureProvider"
  1. Create kind cluster with config:
$ cat kubemark.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: kubemark
nodes:
- role: control-plane
  # The below add in a mount for configuring passing the docker socket into the containers
  extraMounts:
  - containerPath: /var/run/docker.sock
    hostPath: /var/run/docker.sock
- role: worker
  # The below add in a mount for configuring passing the docker socket into the containers
  extraMounts:
  - containerPath: /var/run/docker.sock
    hostPath: /var/run/docker.sock

$ kind create cluster --config kubemark.yaml
Creating cluster "kubemark" ...
 ✓ Ensuring node image (kindest/node:v1.26.3) 🖼
 ✓ Preparing nodes 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-kubemark"
You can now use your cluster with:

kubectl cluster-info --context kind-kubemark

Thanks for using kind! 😊
  1. Initialize CAPI:
$ export CLUSTER_TOPOLOGY=true
$ clusterctl init --infrastructure kubemark,docker
Fetching providers
Installing cert-manager Version="v1.11.0"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v1.4.1" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v1.4.1" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v1.4.1" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-kubemark" Version="v0.5.0" TargetNamespace="capk-system"
Installing Provider="infrastructure-docker" Version="v1.4.1" TargetNamespace="capd-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl generate cluster [name] --kubernetes-version [version] | kubectl apply -f -
  1. Generate and Apply kubemark CAPI config (this is the part that fails without sed'ing the socket)
$ export SERVICE_CIDR=["172.17.0.0/16"]
$ export POD_CIDR=["192.168.122.0/24"]
$ clusterctl generate cluster kube-node-mgmt --infrastructure kubemark --flavor capd --kubernetes-version 1.25.3 --control-plane-machine-count=1 --worker-machine-count=4 | kubectl apply -f-

@elmiko
Copy link
Contributor

elmiko commented Jul 25, 2023

thanks @aauren , i will try to reproduce from your instructions.

@elmiko
Copy link
Contributor

elmiko commented Sep 1, 2023

i've been working on reproducing this, and i do get a similar result when trying things as you have them listed here. but when i remove the type: Socket i get an error, i see the kubemark pods in the management cluster showing this error:

  - containerID: containerd://bfebceedc7bd541071f029a6d22105004ff39715c24a219afa18dcdaa6cd3d9f
    image: quay.io/cluster-api-provider-kubemark/kubemark:v1.25.3
    imageID: quay.io/cluster-api-provider-kubemark/kubemark@sha256:84d402c851014092eafecb880afa2863cbaa87362edfcf76b03a4c436dc8422d
    lastState:
      terminated:
        containerID: containerd://bfebceedc7bd541071f029a6d22105004ff39715c24a219afa18dcdaa6cd3d9f
        exitCode: 128
        finishedAt: "2023-09-01T19:02:47Z"
        message: 'failed to create containerd task: failed to create shim task: OCI
          runtime create failed: runc create failed: unable to start container process:
          error during container init: error mounting "/run/containerd/io.containerd.runtime.v2.task/k8s.io/bfebceedc7bd541071f029a6d22105004ff39715c24a219afa18dcdaa6cd3d9f/unix:/run/containerd/containerd.sock"
          to rootfs at "/unix:/run/containerd/containerd.sock": stat /run/containerd/io.containerd.runtime.v2.task/k8s.io/bfebceedc7bd541071f029a6d22105004ff39715c24a219afa18dcdaa6cd3d9f/unix:/run/containerd/containerd.sock:
          no such file or directory: unknown'
        reason: StartError
        startedAt: "1970-01-01T00:00:00Z"
    name: hollow-node
    ready: false
    restartCount: 3
    started: false
    state:
      waiting:
        message: back-off 40s restarting failed container=hollow-node pod=kube-node-mgmt-kubemark-md-0-4t56v_default(c0f464c2-4df9-4da6-aa09-5a4450bd5341)
        reason: CrashLoopBackOff

fwiw, i'm using capi 1.4.6 and kubernetes 1.25.3

@elmiko
Copy link
Contributor

elmiko commented Sep 1, 2023

ok, i think i've found the root cause here. for me, it's not the type: Socket but the actual file links.

i modified my cluster yaml to contain this for the KubemarkMachineTemplate

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: KubemarkMachineTemplate
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: kube-node-mgmt
  name: kube-node-mgmt-kubemark-md-0
  namespace: default
spec:
  template:
    spec:
      extraMounts:
      - containerPath: /run/containerd/containerd.sock
        hostPath: /run/containerd/containerd.sock
        name: containerd-sock
        type: Socket

could you try out that configuration @aauren ?

@elmiko
Copy link
Contributor

elmiko commented Sep 1, 2023

i think this is fixed in the 0.6.0 release, but i'm hitting a different issue there now

@elmiko
Copy link
Contributor

elmiko commented Sep 1, 2023

i've created #97 to capture the followup work here.

@elmiko
Copy link
Contributor

elmiko commented Sep 1, 2023

@aauren please give the 0.6.0 release a try with these instructions, i've fixed up the release artifacts and pushed a tagged image to the registry. it's working for me locally now.

@aauren
Copy link
Author

aauren commented Sep 26, 2023

Hey @elmiko! Sorry that it took me so long to get around to testing this one.

I can confirm that 0.6.0 fixes the issue that I was having with sockets!

Thanks for fixing this up for me! Cheers!

@aauren aauren closed this as completed Sep 26, 2023
@elmiko
Copy link
Contributor

elmiko commented Sep 29, 2023

great to hear @aauren !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants