Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem during driver installation: "path /var/lib/kubelet is mounted on / but it is not a shared mount" #335

Closed
ptitvert opened this issue Aug 10, 2021 · 5 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@ptitvert
Copy link

What happened:

We are trying to deploy this driver on our k8s environment, and it doesn't work.
We get the following error:

  Normal   Created  59m (x156 over 14h)   kubelet  Created container smb
  Warning  Failed   59m (x156 over 14h)   kubelet  Error: failed to start container "smb": Error response from daemon: path /var/lib/kubelet is mounted on / but it is not a shared mount
  Warning  BackOff  33m (x3669 over 14h)  kubelet  Back-off restarting failed container

What you expected to happen:

I would expect the container smb to be started, as stated in the installation guide.

How to reproduce it:

We have follow the installation guide :

https://github.com/kubernetes-csi/csi-driver-smb/blob/master/docs/install-csi-driver-v1.2.0.m

Then:

kubectl -n kube-system get pod -o wide --watch -l app=csi-smb-controller

the output

NAME                                 READY   STATUS    RESTARTS   AGE   IP           NODE                                   NOMINATED NODE   READINESS GATES
csi-smb-controller-c74858679-474m5   3/3     Running   0          98m   1.2.3.4   6c63deaf-6720-4b93-86f7-578e2f41021c   <none>           <none>
csi-smb-controller-c74858679-btbwd   3/3     Running   0          91m   1.2.3.4   dce37ac7-bfd9-4996-9c8d-a31002fec280   <none>           <none>

Now the node:

kubectl -n kube-system get pod -o wide --watch -l app=csi-smb-node

and output:

NAME                 READY   STATUS              RESTARTS   AGE   IP           NODE                                   NOMINATED NODE   READINESS GATES
csi-smb-node-4tfgk   2/3     CrashLoopBackOff    172        14h   1.2.3.10    6c63deaf-6720-4b93-86f7-578e2f41021c   <none>           <none>
csi-smb-node-7zp6r   2/3     CrashLoopBackOff    172        14h   1.2.3.11    dce37ac7-bfd9-4996-9c8d-a31002fec280   <none>           <none>
csi-smb-node-8gzwl   2/3     RunContainerError   155        14h   1.2.3.12   0c743158-5d9d-404f-94b4-372146e8a5e8   <none>           <none>
csi-smb-node-dn9pl   2/3     CrashLoopBackOff    153        14h   1.2.3.13   aa3ffe51-7a2e-4ee7-a6b3-f69002407f89   <none>           <none>
csi-smb-node-dz5vc   2/3     CrashLoopBackOff    162        14h   1.2.3.14    7df31584-42ae-4f4f-a146-fb597baab05d   <none>           <none>
csi-smb-node-tl7zb   2/3     CrashLoopBackOff    170        14h   1.2.3.15    1a9e7673-b7f8-472d-a002-c6ed88bbca86   <none>           <none>
csi-smb-node-w5cgc   2/3     CrashLoopBackOff    157        14h   1.2.3.16    12eda9df-bb3e-40ed-b6bd-356dc476694a   <none>           <none>
csi-smb-node-xj7zz   2/3     CrashLoopBackOff    165        14h   1.2.3.17   d0c628ce-0e92-48cd-b14c-09779dfd0333   <none>           <none>

and if I check one of one of the crashed pod:

Name:                 csi-smb-node-7zp6r
Namespace:            output-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 dce37ac7-bfd9-4996-9c8d-a31002fec280/1.2.3.10
Start Time:           Mon, 09 Aug 2021 23:22:01 +0200
Labels:               app=csi-smb-node
                      controller-revision-hash=7fb95cd7bd
                      pod-template-generation=2
Annotations:          kubernetes.io/psp: cnbb-privileged
Status:               Running
IP:                   1.2.3.10
IPs:
  IP:           1.2.3.10
Controlled By:  DaemonSet/csi-smb-node
Containers:
  liveness-probe:
    Container ID:  docker://d09a153149749650fa43a395dc52aaa827682adc2154a0955c835179bc59d273
    Image:         k8s-grc-docker-remote.artifactory.example.com/sig-storage/livenessprobe:v2.3.0
    Image ID:      docker-pullable://k8s-grc-docker-remote.artifactory.example.com/sig-storage/livenessprobe@sha256:1b7c978a792a8fa4e96244e8059bd71bb49b07e2e5a897fb0c867bdc6db20d5d
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/csi/csi.sock
      --probe-timeout=3s
      --health-port=29643
      --v=2
    State:          Running
      Started:      Mon, 09 Aug 2021 23:22:03 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from csi-smb-controller-sa-token-szcqh (ro)
  node-driver-registrar:
    Container ID:  docker://38eb4a83553a2783870bcc9ca4b84a68a80b4475c63f5613f1a3a739059cce33
    Image:         k8s-grc-docker-remote.artifactory.example.com/sig-storage/csi-node-driver-registrar:v2.2.0
    Image ID:      docker-pullable://k8s-grc-docker-remote.artifactory.example.com/sig-storage/csi-node-driver-registrar@sha256:2dee3fe5fe861bb66c3a4ac51114f3447a4cd35870e0f2e2b558c7a400d89589
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
      --v=2
    State:          Running
      Started:      Mon, 09 Aug 2021 23:22:05 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:     10m
      memory:  20Mi
    Environment:
      ADDRESS:               /csi/csi.sock
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/smb.csi.k8s.io/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from csi-smb-controller-sa-token-szcqh (ro)
  smb:
    Container ID:  docker://95e22c0234770dd7a6e080e42fd189273650306ac1c55663f0c8dfcc5ad162b8
    Image:         artifactory-mirror.example.com/k8s/csi/smb-csi:v1.2.0
    Image ID:      docker-pullable://artifactory-mirror.example.com/k8s/csi/smb-csi@sha256:dedf9b4fbf860e0933210583ee4b6b41b0c2c551bf296370873689ee60df2644
    Port:          29643/TCP
    Host Port:     29643/TCP
    Args:
      --v=5
      --endpoint=$(CSI_ENDPOINT)
      --nodeid=$(KUBE_NODE_NAME)
      --metrics-address=0.0.0.0:29645
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      path /var/lib/kubelet is mounted on / but it is not a shared mount
      Exit Code:    128
      Started:      Tue, 10 Aug 2021 13:42:36 +0200
      Finished:     Tue, 10 Aug 2021 13:42:36 +0200
    Ready:          False
    Restart Count:  173
    Limits:
      cpu:     400m
      memory:  200Mi
    Requests:
      cpu:     10m
      memory:  20Mi
    Liveness:  http-get http://:healthz/healthz delay=30s timeout=10s period=30s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:    unix:///csi/csi.sock
      KUBE_NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /csi from socket-dir (rw)
      /var/lib/kubelet/ from mountpoint-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from csi-smb-controller-sa-token-szcqh (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  socket-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/smb.csi.k8s.io
    HostPathType:  DirectoryOrCreate
  mountpoint-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/
    HostPathType:  DirectoryOrCreate
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  DirectoryOrCreate
  csi-smb-controller-sa-token-szcqh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  csi-smb-controller-sa-token-szcqh
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Normal   Pulled   21m (x169 over 14h)    kubelet  Container image "artifactory-mirror.example.com/k8s/csi/smb-csi:v1.2.0" already present on machine
  Warning  BackOff  116s (x3994 over 14h)  kubelet  Back-off restarting failed container

Anything else we need to know?:

We are using VMWare PKS for the Kubernetes cluster.

Environment:

  • CSI Driver version: V1.2
  • Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.12", GitCommit:"7cd5e9086de8ae25d6a1514d0c87bac67ca4a481", GitTreeState:"clean", BuildDate:"2020-11-12T09:18:55Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9+vmware.1", GitCommit:"f856d899461199c512c21d0fdc67d49cc70a8963", GitTreeState:"clean", BuildDate:"2021-03-19T23:57:11Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

  • OS (e.g. from /etc/os-release): Ubuntu 16.04 LTS
  • Kernel (e.g. uname -a):

Linux docbase-deployment-8584bc6979-9fpbb 4.15.0-142-generic #146~16.04.1-Ubuntu SMP Tue Apr 13 09:27:15 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

@andyzhangx
Copy link
Member

would you add /bin/mount --make-shared /var/lib/kubelet in your agent node config? That should solve the issue.

Full configuration of kubelet is like this on AKS:

[Unit]
Description=Kubelet
ConditionPathExists=/usr/local/bin/kubelet


[Service]
Restart=always
EnvironmentFile=/etc/default/kubelet
SuccessExitStatus=143
ExecStartPre=/bin/bash /opt/azure/containers/kubelet.sh
ExecStartPre=/bin/mkdir -p /var/lib/kubelet
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/bash -c "if [ $(mount | grep \"/var/lib/kubelet\" | wc -l) -le 0 ] ; then /bin/mount --bind /var/lib/kubelet /var/lib/kubelet ; fi"
ExecStartPre=/bin/mount --make-shared /var/lib/kubelet

ExecStartPre=-/sbin/ebtables -t nat --list
ExecStartPre=-/sbin/iptables -t nat --numeric --list

ExecStart=/usr/local/bin/kubelet \
        --enable-server \
        --node-labels="${KUBELET_NODE_LABELS}" \
        --v=2 --container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock \
        --volume-plugin-dir=/etc/kubernetes/volumeplugins \
        --kubeconfig /var/lib/kubelet/kubeconfig \
        --bootstrap-kubeconfig /var/lib/kubelet/bootstrap-kubeconfig \
        $KUBELET_FLAGS \
        $KUBELET_REGISTER_NODE $KUBELET_REGISTER_WITH_TAINTS

[Install]
WantedBy=multi-user.target
```

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants