Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to mount volumes : timeout expired waiting for volumes to attach/mount #60

Open
feresberbeche opened this issue Jun 1, 2018 · 2 comments

Comments

@feresberbeche
Copy link

feresberbeche commented Jun 1, 2018

Is this a request for help?: Yes


Is this a BUG REPORT or FEATURE REQUEST? Bug report

Version of Helm and Kubernetes:

kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} 
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-18T23:58:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} 
helm version                                                                                                                                     root@kubernetes
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}

Which chart: ceph-helm

What happened:

Unable to mount volumes for pod "mypod_default(e68c8e3e-6578-11e8-87c4-e83935e84dc8)": timeout expired waiting for volumes to attach/mount for pod "default"/"mypod". list of unattached/unmounted volumes=[vol1]

How to reproduce it (as minimally and precisely as possible):
http://docs.ceph.com/docs/master/start/kube-helm/

Anything else we need to know:

The ceph cluster is working fine

  ceph -s
  cluster:
    id:     88596d9e-b478-47a9-8208-3a6cea33d1d4
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum kubernetes
    mgr: kubernetes(active)
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-5696f9df5d-jbsgz=up:active}
    osd: 1 osds: 1 up, 1 in
    rgw: 1 daemon active
 
  data:
    pools:   7 pools, 176 pgs
    objects: 213 objects, 3391 bytes
    usage:   108 MB used, 27134 MB / 27243 MB avail
    pgs:     176 active+clean

Everything in th ceph namespace works fine
In the mon pod I got an image created for the pvc

rbd ls
kubernetes-dynamic-pvc-0077fdf9-6578-11e8-b1f8-b63c3e9e1eaa
kubectl get pvc                                                                                                                                  root@kubernetes
NAME                  STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ceph-pvc              Bound     pvc-c9d07cf9-6578-11e8-87c4-e83935e84dc8   1Gi        RWO            ceph-rbd       29m

I have changed resolv.conf and added the kube-dns as nameserver, I can resolve
ceph-mon.ceph and ceph-mon.ceph.svc.local from the host node

some kubelet logs that I found related
juin 01 11:24:19 kubernetes kubelet[32612]: E0601 11:24:19.587800 32612 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/rbd/[ceph-mon.ceph.svc.cluster.local:6789]:kubernetes-dynamic-pvc-0077fdf9-6578-11e8-b1f8-b63c3e9e1eaa\"" failed. No retries permitted until 2018-06-01 11:24:51.582365588 +0200 CEST m=+162261.330642194 (durationBeforeRetry 32s). Error: "MountVolume.WaitForAttach failed for volume \"pvc-004d66b7-6578-11e8-87c4-e83935e84dc8\" (UniqueName: \"kubernetes.io/rbd/[ceph-mon.ceph.svc.cluster.local:6789]:kubernetes-dynamic-pvc-0077fdf9-6578-11e8-b1f8-b63c3e9e1eaa\") pod \"ldap-ss-0\" (UID: \"f63432e0-6579-11e8-87c4-e83935e84dc8\") : error: exit status 1, rbd output: 2018-06-01 11:19:19.513914 7f1cf1f227c0 -1 did not load config file, using default settings.\n2018-06-01 11:19:19.579955 7f1cf1f20700 0 -- IP@:0/1002573 >> IP@:6789/0 pipe(0x3a2a3f0 sd=3 :53578 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000\n2018-06-01 11:19:19.580065 7f1cf1f20700 0 -- IP@:0/1002573 >> IP@:6789/0 pipe(0x3a2a3f0 sd=3 :53578 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).fault\n2018-06-01 11:19:19.580437 7f1cf1f20700 0 -- IP@:0/1002573 >> 10.1.0.146:6789/0 pipe(0x3a2a3f0 sd=3 :53580 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000\n2018-06-01 11:19:19.781427 7f1cf1f20700 0 -- 10.1.0.146:0/1002573 >> 10.1.0.146:6789/0 pipe(0x3a2a3f0 sd=3 :53584 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).**connect protocol feature mismatch**, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000\n2018-06-01 11:19:20.182401 7f1cf1f20700 0 -- 10.1.0.146:0/1002573 >> 10.1.0.146:6789/0 pipe(0x3a2a3f0 sd=3 :53588 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).**connect protocol feature mismatch**, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000\n2018-06-01 11:19:20.983428 7f1cf1f20700 0 -- IP@:0/1002573 >> ip@:6789/0 pipe(0x3a2a3f0 sd=3 :53610 s=1 pgs=0 cs=0 l=1 c=0x3a2e6e0).conne

Idon't know it tries to connect to my kubernetes node externalip:6789 that port is only opened to the ceph-mon headless svc which is

kubectl get svc -n ceph                                                                                                                    root@kubernetes
NAME       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
ceph-mon   ClusterIP   None            <none>        6789/TCP   1h

From the kubernetes node I can telnet to the port 6789

telnet ceph-mon.ceph 6789                                                                                                                  root@kubernetes 
Trying IP@ ... 
Connected to ceph-mon.ceph. 

connect protocol feature mismatch in the kubelet logs
Could have something to do with

Important Kubernetes uses the RBD kernel module to map RBDs to hosts. Luminous requires CRUSH_TUNABLES 5 (Jewel). The minimal kernel version for these tunables is 4.5. If your kernel does not support these tunables, run ceph osd crush tunables hammer

in the ceph-helm doc

@feresberbeche
Copy link
Author

feresberbeche commented Jun 1, 2018

and yes it was that you only need to run
ceph osd crush tunables hammer
on the ceph-mon pod.
I will leave this here if anybody else had the same issue 😃
/close

@lud97x
Copy link

lud97x commented Aug 10, 2018

Hello @feresberbeche , Thank you for this , very helpful. I was stuck because of my kernel version 4.4.0...
The upgrade has solved everything.

Details versions:
CEPH:
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)

Kubernetes:
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Linux Kernel:
4.15.0-30-generic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants