Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running into no topology key found on CSINode with 0.10.2 #848

Closed
tirumerla opened this issue Apr 23, 2021 · 3 comments
Closed

Running into no topology key found on CSINode with 0.10.2 #848

tirumerla opened this issue Apr 23, 2021 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tirumerla
Copy link
Contributor

/kind bug

What happened?

Hi @wongma7 , @ayberk

  • Controller is failing to provision a volume with storage class. Getting no topology key found on CSINode when using with 0.10.2. I tested with 0.9.1 and works fine without any issue.

What you expected to happen?

  • Volume should have been provisioned without any issue. Works fine with 0.9.1.

Anything else we need to know?:

Running two EKS node groups, 1st node group just for kube-system namespace and the second node group specifically for jupyterhub ( see csinode details below), packaged cluster autoscaler v9.9.2, csi-driver v0.10.2 and template for a storage class under single customized helm chart.

  • describing pvc
Type     Reason                Age                   From                                                                                      Message                                                                                                                                                 │
│   ----     ------                ----                  ----                                                                                      --- │
│ ----                                                                                                                                                 │
│   Normal   WaitForFirstConsumer  51m                   persistentvolume-controller                                                               waiting for first consumer to be created before binding                                                                                                 │
│   Warning  ProvisioningFailed    18m (x17 over 51m)    ebs.csi.aws.com_ebs-csi-controller-5f85996d68-4pt7c_8e28e5ca-eacb-41d8-85cf-ecf668d82a81  failed to provision volume with StorageClass "test": error generating accessibility requirements: no topology key found on CSINode <node_host_name>                                                                                                                                
│   Normal   Provisioning          3m27s (x21 over 51m)  ebs.csi.aws.com_ebs-csi-controller-5f85996d68-4pt7c_8e28e5ca-eacb-41d8-85cf-ecf668d82a81  External provisioner is provisioning volume for claim "test/claim-test"                                                             
│   Normal   ExternalProvisioning  115s (x202 over 51m)  persistentvolume-controller                                                               waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator 

Snippet from kubectl describe node/node3 ( respective node where i want the attachment to happen ) -

Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=c5.large
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=workers2
                    eks.amazonaws.com/nodegroup-image=ami-124567890
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-000000000
                    eks.amazonaws.com/sourceLaunchTemplateVersion=1
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1b
                    hub.jupyter.org/node-purpose=user
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=<host_name>
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=c5.large
                    topology.ebs.csi.aws.com/zone=us-east-1b
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1b
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
  • Storage class manifest
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: test
  namespace: kube-system
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
parameters:
  encrypted: "true"
provisioner: ebs.csi.aws.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
allowedTopologies:
  - matchLabelExpressions:
    - key: topology.ebs.csi.aws.com/zone
      {{- with .Values.storageclass.zone }}
      values:
      {{- toYaml . | nindent 6 }}
      {{- end -}}
  • Here is my values file
storageclass:
  zone: 
    - us-east-1b
    
cluster-autoscaler:
  autoDiscovery:
    clusterName: "my_eks_cluster"
  awsRegion: us-east-1
  resources:
    requests:
      memory: "256Mi"
      cpu: "100m"
    limits:
      memory: "512Mi"
      cpu: "300m"

aws-ebs-csi-driver:
  enableVolumeScheduling: true
  enableVolumeResizing: true
  enableVolumeSnapshot: true
  extraVolumeTags:
     env: dev
  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 50m
      memory: 64Mi
 

Environment

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.6-eks-49a6c0", GitCommit:"49a6c0bf091506e7bafcdb1b142351b69363355a", GitTreeState:"clean", BuildDate:"2020-12-23T22:10:21Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Driver version: 0.10.2
  • Helm version: 3.5.3

This is similar issue mentioned here #729 but i wasn't sure how to fix this may be i'm missing ignoring the labels in cluster autoscaler.

Any help would be appreciated.

Thanks

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 23, 2021
@aianus
Copy link

aianus commented Apr 24, 2021

I ran into this issue today and solved it by ensuring the ebs-csi-node DaemonSet was running on all nodes, even those with taints (which was not the default, needed to set node.tolerateAllTaints to true in the helm chart)

@tirumerla
Copy link
Contributor Author

I ran into this issue today and solved it by ensuring the ebs-csi-node DaemonSet was running on all nodes, even those with taints (which was not the default, needed to set node.tolerateAllTaints to true in the helm chart)

@aianus that was it. Somehow i missed it. Appreciate your help!

@jaggerwang
Copy link

jaggerwang commented Dec 16, 2021

How can I configure tolerateAllTaints from AWS EKS console, as there is no way to configure aws-ebs-csi-driver add-on's parameter in the update ui. I have two node groups, one is created by eks cloudformation, and the other is created by myself, the ebs-csi-node only runs on the nodes of the first node group.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants