Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][2.0.0] Rook/Ceph - Pod remains in ContainerCreating state when using StorageClass from Rook/Ceph cluster #3184

Closed
5 of 18 tasks
cicharka opened this issue Jun 3, 2022 · 3 comments

Comments

@cicharka
Copy link
Contributor

cicharka commented Jun 3, 2022

Describe the bug
Scenario: simple Epiphany cluster with 1 master and 3 kubernetes worker nodes, configured to use Rook. See input_manifest.yml used for deploying the cluster: input_manifest.txt

After deploying the cluster, using example from Rook documentation - Block Storage - we try to deploy example app - wordpress.
After running kubectl apply -f wordpress.yml, wordpress pod remains ContainerCreating state. That pod is using default rook StorageClass.

Note that Object Storage (one that imitates S3 API) is working fine with the default configuration.
Note that Block Storage which is reason of that Bug is able to provision the volume to the pod if cluster has:

  • cordoned all of the worker nodes
  • taint for NoSchedule on control plane is removed
    See commands to enable that state:
kubectl cordon cicharka-rook-v4-ubu-kubernetes-node-vm-0
kubectl cordon cicharka-rook-v4-ubu-kubernetes-node-vm-1
kubectl cordon cicharka-rook-v4-ubu-kubernetes-node-vm-2
kubectl taint node cicharka-rook-v4-ubu-kubernetes-master-vm-0 node-role.kubernetes.io/master:NoSchedule-
kubectl rollout restart deployment wordpress

How to reproduce

  1. Prepare Epiphany cluster with input_manifest.txt
  2. Wait till all rook-ceph pods are completed and running
  3. Deploy example wordpress app (or any other) - you can use Provision Storage and Consume the Storage: Wordpress sample.
  4. Observe how wordpress pod from that deployment remains in ContainerCreating state
    Expected behavior

Expected Behavior
When deploying any app that will consume Block Storage, pod will be created, in ready state, and will be able to consume that persistent block storage.

Config files
input_manifest.txt

Environment

  • Cloud provider: All
  • OS: All

epicli version: [epicli --version]
2.0.0

Additional context


DoD checklist

  • Changelog
    • updated
    • not needed
  • COMPONENTS.md
    • updated
    • not needed
  • Schema
    • updated
    • not needed
  • Backport tasks
    • created
    • not needed
  • Documentation
    • added
    • updated
    • not needed
  • Feature has automated tests
  • Automated tests passed (QA pipelines)
    • apply
    • upgrade
    • backup/restore
  • Idempotency tested
  • All conversations in PR resolved
@norbix
Copy link

norbix commented Jun 8, 2022

K8s cluster provisioned within AWS and access 2 the control plane obtained.

shell root@prefix-7-norbix-kubernetes-master-vm-0:~# k get ns NAME STATUS AGE default Active 18h kube-node-lease Active 18h kube-public Active 18h kube-system Active 18h kubernetes-dashboard Active 18h root@prefix-7-norbix-kubernetes-master-vm-0:~#

@norbix
Copy link

norbix commented Jun 9, 2022

Rook operator deployed

`root@prefix-7-norbix-kubernetes-master-vm-0:~/persistent_storage# k get all -n rook-ceph
NAME READY STATUS RESTARTS AGE
pod/csi-cephfsplugin-7xl2t 3/3 Running 0 21h
pod/csi-cephfsplugin-kfndq 3/3 Running 0 21h
pod/csi-cephfsplugin-provisioner-8758b6bdf-6n58l 6/6 Running 0 21h
pod/csi-cephfsplugin-provisioner-8758b6bdf-ltwj7 6/6 Running 0 21h
pod/csi-rbdplugin-lldrs 3/3 Running 0 21h
pod/csi-rbdplugin-mztrd 3/3 Running 0 21h
pod/csi-rbdplugin-provisioner-6b5b4468d9-9qmc6 6/6 Running 0 21h
pod/csi-rbdplugin-provisioner-6b5b4468d9-gfbvt 6/6 Running 0 21h
pod/rook-ceph-operator-84866c778f-jjjhm 1/1 Running 0 21h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/csi-cephfsplugin-metrics ClusterIP 10.107.224.193 8080/TCP,8081/TCP 21h
service/csi-rbdplugin-metrics ClusterIP 10.111.173.203 8080/TCP,8081/TCP 21h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/csi-cephfsplugin 2 2 2 2 2 21h
daemonset.apps/csi-rbdplugin 2 2 2 2 2 21h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/csi-cephfsplugin-provisioner 2/2 2 2 21h
deployment.apps/csi-rbdplugin-provisioner 2/2 2 2 21h
deployment.apps/rook-ceph-operator 1/1 1 1 21h

NAME DESIRED CURRENT READY AGE
replicaset.apps/csi-cephfsplugin-provisioner-8758b6bdf 2 2 2 21h
replicaset.apps/csi-rbdplugin-provisioner-6b5b4468d9 2 2 2 21h
replicaset.apps/rook-ceph-operator-84866c778f 1 1 1 21h
root@prefix-7-norbix-kubernetes-master-vm-0:~/persistent_storage#`

@cicharka cicharka self-assigned this Jun 21, 2022
@cicharka
Copy link
Contributor Author

cicharka commented Jun 22, 2022

Closing the ticket since its mismatch in default epiphany configuration for kubernetes cluster.

Clarification:

  • default Epiphany configuration include extra argument when initializing worker nodes - enable-controller-attach-detach="false"
  • Setting this flag to false is disabling attach-detach operations by AD controller - the result is that the ceph csi-rbdplugin pod from rook-ceph cluster, which is responsible for attaching/mounting/unmounting/detaching is not able to perform these operations on the worker node
  • It is required for Rook-ceph cluster to have worker nodes running kubelet with flag enable-controller-attach-detach set to "True"

For users working with Rook on Epiphany cluster, and trying to use BlockStorage and FilesystemStorage it is required to update kubelet config parameters and set kubelet flag enable-controller-attach-detach to "True".

Fix in the code, which will enable setting that flag by users when initializing cluster will be added in Epiphany 2.0.1, see the ticket for more information: #3190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants