[EKS] [EBS CSI addon]: Adding custom Toleration on EBS CSI Addon #1706

singhnix · 2022-04-10T11:41:42Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Adding custom toleration like "node.tolerateAllTaints = true" on EBS CSI addon so that it can tolerate taints put on nodes.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

It seems like there is potentially a missing setting on the DaemonSet created by this Add-on that will prevent it from being put on certain nodes with taints on them. People who manage this manually have fixed this by adding:

node.tolerateAllTaints = true

to the daemonset. Currently it does not look like the AWS Add-on allows for something like this. Because of this our move over to using this add on has caused issues in new persistent volumes from being created on certain nodes.

I am not fully sure if there is a workaround for this (I assumed modifying the daemonset after install would not be permanent or should really be a step in setting up).

kubernetes-sigs/aws-ebs-csi-driver#848

Specifically the error we saw that led to this is:

Warning ProvisioningFailed 18s (x7 over 81s) ebs.csi.aws.com_ebs-csi-controller-848fb4bd69-lnjp8_454f66d4-704c-4164-a5a0-283cb99d5688 failed to provision volume with StorageClass "gp3-encrypted": error generating accessibility requirements: no topology key found on CSINode ip-XX-XXX-XX-XX.ec2.internal
Normal ExternalProvisioning 8s (x6 over 81s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator

The label of:

topology.ebs.csi.aws.com/zone=

was only on nodes that the daemonset could run on instead of every node (due to the daemonset).

Are you currently working around this issue?
How are you currently solving this problem?

Additional context
Anything else we should know?

Attachments
Case ID:9858452761

harrisjoseph · 2022-04-14T08:38:30Z

Thank you for raising this ticket. I'm curious about why the current tolerations for the ebs-csi-node statefulset in the EBS CSI driver Add-on has been chosen?

We currently have this:

      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists
        tolerationSeconds: 300

Which seems to assume that the user/operator won't want the ebs-csi-node, or any EBS functionality on any node that has any other taint set. Node isolation via taints is a pretty standard feature of k8s, it seems odd that we wouldn't want to use EBS volumes on all tainted nodes, when the daemonset could be extended to all nodes very easily:

      tolerations:
      - operator: Exists

I'm sure my question is just a sign that I have some deeper misunderstanding of how the EBS CSI driver is intended to be used in conjunction with EKS, I'm more than happy to be corrected, or to find that the CSI driver add-on can be easily configured to use EKS' standard topology keys, for instance.

stephen-gardner · 2022-04-29T19:20:52Z

I've been able to extend the daemonset to tainted nodes by modifying the tolerations, which is enough to deploy pending persistent volume claims...

kubectl -n kube-system get daemonset ebs-csi-node -o yaml --show-managed-fields | grep f:tolerations -C 2

            f:priorityClassName: {}
            f:serviceAccountName: {}
            f:tolerations: {}
            f:volumes:
              k:{"name":"device-dir"}:

Unfortunately, tolerations are a fully managed field, so my changes are quickly overwritten. The only workaround I could think of is to remove Amazon EBS CSI add-on, but preserve its resources, effectively turning it into a self-managed add-on.

eksctl delete addon --cluster my-cluster --name aws-ebs-csi-driver --preserve

MartinEmrich · 2022-05-06T09:58:57Z

Have the same issue. In the "standalone" ebs driver, this apparently can be configured with a helm variable "tolerateAllTaints", too: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/templates/node.yaml#L46

I have opened a ticket with AWS support on why they chose to set this to the (IMHO) less intuitive/sensible option.

MartinEmrich · 2022-05-16T07:58:24Z

Just to report back: AWS support recommended to use a self-managed installation of the CSI driver for the time being, and requested me to watch this issue on Github.

ksquaredkey · 2022-06-06T19:36:40Z

I'm in the same boat. An AddOn DaemonSet to provide a universal service that has by default an exclusion of any nodes with a Taint is like running Up the Down escalator. StatefulSets with PVs in a Production multi-AZ architecture - you're gonna have Taints on Worker Nodes so the SS Pods run in the right AZ. I see the option was added to the EBS CSI Driver 2 years ago. Haven't found any discussion around keeping the old default yet, so maybe I'm missing something important. But sure seems like an illogical choice given what EBS CSI Driver does.

Chili-Man · 2022-07-11T18:34:00Z

We just upgraded from an older helm chart version of the EBS CSI to the EKS managed add on version and did not realize that this problem until now. I don't understand and am disappointed that the add on does not allow the toleration field to be customizable for the daemonset - it makes absolutely no sense to not allow some configuration of it.

idanl21 · 2022-07-21T08:33:35Z

Hey, facing the same issue. i want to add tolerations :
- effect: NoSchedule operator:Exists
`
to ebs-csi-node daemon set, but a few minutes after the change it deleted. i assuming that the ebs-csi-controller is causing it, but i cannot find the tolerations definition in the controller.

MartinEmrich · 2022-07-21T09:14:40Z

@idanl21 Indeed, currently the only solution is to remove the AWS-managed Add-In, and install the ebs-csi-controller yourself.

GrigorievNick · 2022-07-29T18:22:17Z

as I understand NodeSelector is also fullyManaged?

nthienan · 2022-08-04T17:09:12Z

this issue is preventing me to use AWS-managed addons. Any workarounds?

dfquaresma · 2022-08-04T18:11:50Z

I think I have solved this locally by editing ebs-csi-node daemonset by appending the following in the tolerations at line 365

- operator: Exists
  effect: NoSchedule

The toleration after the append will be like below:

      tolerations:
        - key: CriticalAddonsOnly
          operator: Exists
        - operator: Exists
          effect: NoExecute
          tolerationSeconds: 300
        - operator: Exists
          effect: NoSchedule

It's working, but I am almost sure that it will break again after the drive is updated.

johnjeffers · 2022-08-17T19:14:14Z

The solution above used drive on version 1.8. The latest version of the drive, 1.10, has already fixed the issue.

It's definitely not fixed in 1.10. I have 1.10 installed in my cluster and it won't deploy the daemonset to any nodes that have taints.

dfquaresma · 2022-08-17T19:21:35Z

It's definitely not fixed in 1.10. I have 1.10 installed in my cluster and it won't deploy the daemonset to any nodes that have taints.

@johnjeffers you are right, I checked only one of my clusters. The other one is still having the same issue. Thank you!

The workaround I shared before is still working.

johnjeffers · 2022-08-21T17:05:29Z

Could we please get an update on why this is taking so long to fix? All that has to be done is update the toleration on the daemonset to

      tolerations:
        - operator: Exists

The other EKS managed add-ons like kube-proxy and vpc-cni already do this. In fact, I copied and pasted that code block directly from the kube-proxy managed add-on's daemonset.

This is a blocker for upgrading to EKS 1.23, and the fix seems to be so simple. I can't understand why this issue has been open for so long. Is there more to this problem than there seems to be?

sriramranganathan · 2022-08-25T18:30:38Z

We recently released a behavior change that will NOT overwrite configuration changes made to EKS managed add-ons through the Kubernetes API. Previously, a reconciliation process ran every 15 minutes that overwrote configuration changes made to EKS managed add-ons through the Kubernetes API. Example – changes you make to the CoreDNS Config Map through the Kubernetes API will no longer be overwritten during steady state. However, if a managed add-on is upgraded, then any configuration changes made will not be retained at this time.

This change is a first step in ensuring configurations made to EKS add-ons are preserved. We are also working on additional changes to support advanced configuration of EKS add-ons directly through the EKS API, and the ability to preserve the configuration changes during add-on upgrades.

Toleration for the EBS-CSI driver is in our product backlog and is being evaluated by the team.

johnjeffers · 2022-08-25T19:32:53Z

OK just to make sure I understand, the permanent fix for this is in the backlog so we won't see it for a while, but if I manually update the daemonset with the tolerations I need, nothing will overwrite my changes until the next upgrade of the add-on?

youwalther65 · 2022-08-25T19:44:45Z

OK just to make sure I understand, the permanent fix for this is in the backlog so we won't see it for a while, but if I manually update the daemonset with the tolerations I need, nothing will overwrite my changes until the next upgrade of the add-on?

Yes, I already tested it applying the requested toleration and it wasn't reconciled.
Upgrading will lead to a merge conflict though and you have to use OVERWRITE and re- apply your previous change(s).

Joseph-Irving · 2022-09-06T14:56:37Z

As more and more people move to K8s 1.23 where the ebs-csi-driver is mandatory, I imagine a fair amount will try out this addon and realise it doesn't work nicely with taints. It's a shame as we would like to use the eks-addons where possible and it seems like AWS is recommending people do that. Other Daemonsets that are eks-addons don't have this issue, so I'm surprised this one acts differently, I would've thought most people would consider EBS mounting something that they would want on all nodes by default as that's how it works pre 1.23.

ksliu58 · 2022-09-06T19:57:01Z

Thanks for the feedback everyone.

Currently, you can modify your tolerations without been overwritten by EKS. We will be updating the CSI driver with EKS managed add-on in the next release, updating the daemonset to allow custom tolerations for all taints by default.

cilindrox · 2022-09-06T20:13:53Z

Just got bit by the same issue. Not having at least a warning on the docs or troubleshooting docs made things unnecessarily complicated. Having tolerations for every taint should be the default behavior for the DS.

youwalther65 · 2022-09-08T12:07:49Z

Amazon has just released EKS managed add-on EBS CSI "v1.11.2-eksbuild.1".
The new default for tolerations of DaemonSet "ebs-csi-node" was changed according to the requests from customers and community to:

$ k get ds -n kube-system ebs-csi-node -o yaml
...
      tolerations:
      - operator: Exists
...

This is now similar to other add-ons like kube-proxy or AWS VPC CNI.

GNSunny · 2022-09-08T14:35:02Z

I don't see it in the AWS console, nor from the AWS CLI.

youwalther65 · 2022-09-08T15:02:37Z

I don't see it in the AWS console, nor from the AWS CLI.

I tested in in eu-west-1 (Dublin/Ieland) in the morning. Usually it is a phased rollout per region. You should be able to see it in a couple of hours. Please re-try later!

GNSunny · 2022-09-08T15:18:02Z

I have tried at 16:00 BST from eu-west-1 region too. I didn't see any updated version, but I'll retry later

Chili-Man · 2022-09-08T18:24:46Z

@aleclerc-sonrai I'm on 1.23 and I don't see the new version yet either (us-west-2)

ksquaredkey · 2022-09-09T21:39:19Z

Still not available in us-west-2. Latest from "aws eks describe-addon-versions" is still: "v1.10.0-eksbuild.1". If I hadn't been so busy with other higher priority tickets, I'd have punted on waiting for the add-on and install and manage the drivers myself. The add-on is worthless with the Taint issue.

ksliu58 · 2022-09-09T22:28:24Z

Thanks everyone for your patience.
The upcoming release with the new default for tolerations of DaemonSet is in progress, now with 15 out of 27 regions completed. We will complete deployment in the remaining regions shortly.

duclm2609 · 2022-09-12T08:23:26Z

I hit the same problem and is waiting for new version rollout to finish to continue my work. Current ap-southeast-1 region is not available yet.

parraletz · 2022-09-14T16:35:34Z

@duclm2609 upgrade eks addon to v1.11.2-eksbuild.1 , in this version has been resolved the issue

My env:

eks cluster : kubernetes 1.23 version
region: oregon

yann-soubeyrand · 2022-11-16T17:07:44Z

It seems there’s a regression: after upgrading addon from v1.11.4-eksbuild.1 to v1.13.0-eksbuild.1, tolerations are gone on my ebs-csi-node DaemonSet.

rajaie-sg · 2022-11-16T17:44:53Z

^^ Running into the same issue. There are no tolerations set after upgrading to v1.13.0-eksbuild.1

yann-soubeyrand · 2022-11-16T18:27:58Z

v1.12.1-eksbuild.1 does not work either.

carlosrmendes · 2022-11-16T18:38:32Z

same here, after addon update to v1.13.0-eksbuild.1 I was unable to provision volumes on node with taints

youwalther65 · 2022-11-17T08:22:05Z

How did you manage the upgrade of the managed EBS CSI add-on? Did you use the new "preserve" option to preserve your changes to toleration? (in case you have customized them)
Here are the differences between ebs-csi-node DaemonSet between Helm and managed add-on:

EBS CSI installed via Helm (Fluxv2)

$ k get ds -n kube-system ebs-csi-node -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
    meta.helm.sh/release-name: aws-ebs-csi-driver
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2022-11-16T10:36:34Z"
  generation: 1
  labels:
    app.kubernetes.io/component: csi-driver
    app.kubernetes.io/instance: aws-ebs-csi-driver
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: aws-ebs-csi-driver
    app.kubernetes.io/version: 1.13.0
    helm.sh/chart: aws-ebs-csi-driver-2.13.0
    helm.toolkit.fluxcd.io/name: aws-ebs-csi-driver
    helm.toolkit.fluxcd.io/namespace: kube-system
  name: ebs-csi-node
  namespace: kube-system
...
      tolerations:
      - operator: Exists
...

empty output/no tolerations for managed add-on for a new installation of v1.13.0-eksbuild.1

$ k get ds -n kube-system ebs-csi-node -o yaml | grep tolerations

So managed add-on is missing tolerations at all but you can apply it yourself. I will contact AWS service team to check if this is expected!

yann-soubeyrand · 2022-11-17T09:27:55Z

How did you manage the upgrade of the managed EBS CSI add-on? Did you use the new "preserve" option to preserve your changes to toleration?

I didn’t use the preserve option since I didn’t modify the tolerations because v1.11.5-eksbuild.1 deploys the right ones.

andrei-korviakov · 2022-11-17T11:45:16Z

Can confirm that the addon with a version newer than v1.11.5 doesn’t have tolerations in the manifest and therefore cannot be scheduled on nodes with taints.

youwalther65 · 2022-11-17T13:17:38Z

We are working on a v1.12 and v1.13 managed add-on release which fixes the toleration issue.

youwalther65 · 2022-11-21T14:00:02Z

We have rolled out add-on versions EBS CSI v1.13.0-eksbuild.2 (default version now for EKS v1.22, v1.23 and v1.24) and v1.12.1-eksbuild.2 in all regions with toleration for ebs-csi-node DaemonSet as expected. Please check!

      tolerations:
      - operator: Exists

yann-soubeyrand · 2022-11-21T17:34:33Z

@youwalther65 I confirm it’s working, thanks!

ConnorJC3 · 2022-12-23T21:56:28Z

Hi everyone, we have recently rolled out support for custom tolerations for the EBS CSI Driver addon using the new Advanced Configuration feature for EKS Addons!

This has come in two steps:

Previous releases (v1.10.0-eksbuild.1 and up) have support for configuring whether the node tolerates all taints via node.tolerateAllTaints. The default behavior will be left the same as it was for each release prior to this change.
For today's release (v1.14.0-eksbuild.1) we also support full customization of the tolerations of both the controller and node pods via controller.tolerations and node.tolerations. For now, node.tolerateAllTaints will continue to default to true, it is recommended you explicitly set it to false if customizing the node tolerations.

jaimehrubiks · 2023-01-03T15:27:27Z

@ConnorJC3 Thanks for the update, that was very useful. Is the team considering adding tolerations support to CoreDNS?

Currently, this is the only thing preventing people from creating an EKS cluster with tainted-only nodes.

#1389

sriramranganathan · 2023-01-03T15:57:05Z

@jaimehrubiks - the configuration for CoreDNS addon to support tolerations is being evaluated.

sriramranganathan · 2023-01-03T16:08:52Z

Amazon EKS team recently announced the general availability of advanced configuration feature for managed add-ons. You can now pass in advanced configuration for cluster add-ons, enabling you to customize add-on properties not handled by default settings. Configuration can be applied to add-ons either during cluster creation or at any time after the cluster is created.

Using advanced configuration feature, you can now configure custom tolerations for Amazon EBS CSI driver addon starting from v1.14.0-eksbuild.1. Custom tolerations can be configured through controller.tolerations and node.tolerations. Note, node.tolerateAllTaints will continue to default to true.

To learn more about this feature, check out this blogpost - https://aws.amazon.com/blogs/containers/amazon-eks-add-ons-advanced-configuration/

Check out the Amazon EKS documentation - https://docs.aws.amazon.com/eks/latest/userguide/managing-add-ons.html

pcebul · 2023-01-23T15:15:39Z

@youwalther65 @ConnorJC3
is it also valid for other addons like coredns? I mean default tolerations as
tolerations: - operator: Exists

and also ability to customize tolerations?

Thanks

ConnorJC3 · 2023-01-23T16:24:56Z

cc @sriramranganathan should know the answer to that

youwalther65 · 2023-01-23T18:43:53Z

@pcebul: You can check the JSON configuration schema of the CoreDNS managed add-on to see if it supports customizing the toleration as described in the corresponding launch blog post here.
Beside that any customization you apply to the managed add-on will be preserved which was introduced in October 2020, see here.

singhnix · 2023-03-08T09:06:47Z

Now, I tried by using addon version 1.16 and 1.13, but it seems its not able to run on nodes with taints.
Below is the describe of ebs-csi-driver deployment:

kubectl describe pods ebs-csi-controller-765f496485-6xrl4 -n kube-system
Name: ebs-csi-controller-765f496485-6xrl4
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node:
Labels: app=ebs-csi-controller
app.kubernetes.io/component=csi-driver
app.kubernetes.io/managed-by=EKS
app.kubernetes.io/name=aws-ebs-csi-driver
app.kubernetes.io/version=1.13.0
pod-template-hash=765f496485
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/ebs-csi-controller-765f496485
Containers:
ebs-plugin:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-ebs-csi-driver:v1.13.0
Port: 9808/TCP
Host Port: 0/TCP
Args:
controller
--endpoint=$(CSI_ENDPOINT)
--k8s-tag-cluster-id=eksdemo1
--logtostderr
--v=2
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
Readiness: http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
Environment:
CSI_ENDPOINT: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
CSI_NODE_NAME: (v1:spec.nodeName)
AWS_ACCESS_KEY_ID: <set to the key 'key_id' in secret 'aws-secret'> Optional: true
AWS_SECRET_ACCESS_KEY: <set to the key 'access_key' in secret 'aws-secret'> Optional: true
AWS_EC2_ENDPOINT: <set to the key 'endpoint' of config map 'aws-meta'> Optional: true
AWS_REGION: us-east-1
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
csi-provisioner:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-provisioner:v3.3.0-eks-1-23-6
Port:
Host Port:
Args:
--csi-address=$(ADDRESS)
--v=2
--feature-gates=Topology=true
--extra-create-metadata
--leader-election=true
--default-fstype=ext4
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
csi-attacher:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-attacher:v4.0.0-eks-1-23-6
Port:
Host Port:
Args:
--csi-address=$(ADDRESS)
--v=2
--leader-election=true
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
csi-snapshotter:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-snapshotter:v6.1.0-eks-1-23-6
Port:
Host Port:
Args:
--csi-address=$(ADDRESS)
--leader-election=true
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
csi-resizer:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-resizer:v1.6.0-eks-1-23-6
Port:
Host Port:
Args:
--csi-address=$(ADDRESS)
--v=2
--handle-volume-inuse-error=false
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
liveness-probe:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/livenessprobe:v2.8.0-eks-1-24-4
Port:
Host Port:
Args:
--csi-address=/csi/csi.sock
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
kube-api-access-crqqw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoExecute op=Exists for 300s
CriticalAddonsOnly op=Exists
Events:
Type Reason Age From Message

Warning FailedScheduling 80s default-scheduler 0/2 nodes are available: 2 node(s) had untolerated taint {dedicated: gpuGroup}. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.

And the describe of node with taint

kubectl describe node ip-192-168-22-69.ec2.internal
Name: ip-192-168-22-69.ec2.internal
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.medium
beta.kubernetes.io/os=linux
eks.amazonaws.com/capacityType=ON_DEMAND
eks.amazonaws.com/nodegroup=test
eks.amazonaws.com/nodegroup-image=ami-01ced323515f177b0
failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1a
k8s.io/cloud-provider-aws=ecd4858b2f1606704d42ac1e8fb5beb2
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-192-168-22-69.ec2.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t3.medium
topology.ebs.csi.aws.com/zone=us-east-1a
topology.kubernetes.io/region=us-east-1
topology.kubernetes.io/zone=us-east-1a
Annotations: alpha.kubernetes.io/provided-node-ip: 192.168.22.69
csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-028d4dbf3ef17a637"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 08 Mar 2023 14:30:08 +0530
Taints: dedicated=gpuGroup:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: ip-192-168-22-69.ec2.internal
AcquireTime:
RenewTime: Wed, 08 Mar 2023 14:35:15 +0530
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message

MemoryPressure False Wed, 08 Mar 2023 14:31:10 +0530 Wed, 08 Mar 2023 14:30:07 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 08 Mar 2023 14:31:10 +0530 Wed, 08 Mar 2023 14:30:07 +0530 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 08 Mar 2023 14:31:10 +0530 Wed, 08 Mar 2023 14:30:07 +0530 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 08 Mar 2023 14:31:10 +0530 Wed, 08 Mar 2023 14:30:28 +0530 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.22.69
ExternalIP: 54.145.127.175
Hostname: ip-192-168-22-69.ec2.internal
InternalDNS: ip-192-168-22-69.ec2.internal
ExternalDNS: ec2-54-145-127-175.compute-1.amazonaws.com
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3955688Ki
pods: 17
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 1930m
ephemeral-storage: 18242267924
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3400680Ki
pods: 17
System Info:
Machine ID: ec25d974ebfe3816b53e289539ea44b6
System UUID: ec25d974-ebfe-3816-b53e-289539ea44b6
Boot ID: 8fbe5d0d-c66f-4289-b3f9-7eff8cb4cadb
Kernel Version: 5.10.165-143.735.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.6
Kubelet Version: v1.24.10-eks-48e63af
Kube-Proxy Version: v1.24.10-eks-48e63af
ProviderID: aws:///us-east-1a/i-028d4dbf3ef17a637
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age

amazon-cloudwatch fluent-bit-k7sfv 500m (25%) 0 (0%) 100Mi (3%) 200Mi (6%) 5m7s
kube-system aws-node-kmcvw 25m (1%) 0 (0%) 0 (0%) 0 (0%) 5m7s
kube-system ebs-csi-node-clhhl 30m (1%) 300m (15%) 120Mi (3%) 768Mi (23%) 4m24s
kube-system kube-proxy-64lxv 100m (5%) 0 (0%) 0 (0%) 0 (0%) 5m7s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits

cpu 655m (33%) 300m (15%)
memory 220Mi (6%) 968Mi (29%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message

Normal Starting 5m kube-proxy
Warning InvalidDiskCapacity 5m8s kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 5m8s (x2 over 5m8s) kubelet Node ip-192-168-22-69.ec2.internal status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 5m8s (x2 over 5m8s) kubelet Node ip-192-168-22-69.ec2.internal status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 5m8s (x2 over 5m8s) kubelet Node ip-192-168-22-69.ec2.internal status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 5m8s kubelet Updated Node Allocatable limit across pods
Normal Starting 5m8s kubelet Starting kubelet.
Normal NodeReady 4m47s kubelet Node ip-192-168-22-69.ec2.internal status is now: NodeReady

So, it seems that tolerations -op:exists doesnt work for all types or any taint on the node. Please confirm this.

ConnorJC3 · 2023-03-08T17:14:35Z

Hi @singhnix - the EBS CSI controller pods intentionally do not tolerate all taints by default (only the node pods do so by default as of right now). If you wish to set custom tolerations for the controller you can do so via the advanced configuration feature as mentioned here and here.

singhnix added the Proposed Community submitted issue label Apr 10, 2022

mikestef9 added EKS Amazon Elastic Kubernetes Service EKS Add-Ons labels Apr 11, 2022

archoversight mentioned this issue Apr 27, 2022

[EKS] [EBS CSI Addon]: Allow updates on nodeSelector/affinity #1724

Closed

sde-melo mentioned this issue Jul 19, 2022

Pod with taints (tolerations) and CSI EBS volume (gp3) cannot be started aws/karpenter-provider-aws#2150

Closed

torredil mentioned this issue Sep 3, 2022

error generating accessibility requirements: no topology key found on CSINode kubernetes-sigs/aws-ebs-csi-driver#1372

Closed

mattburgess mentioned this issue Sep 26, 2022

aws_eks_addon tolerations hashicorp/terraform-provider-aws#26811

Closed

sriramranganathan closed this as completed Jan 3, 2023

ldming mentioned this issue Apr 1, 2023

[BUG]EKS DEGRADED cause playground init failed apecloud/kubeblocks#2371

Closed

fulcrum29 mentioned this issue Nov 9, 2023

[EKS] [request]: Enable live reconciliation of managed fiels with EKS Managed Addons #2188

Open

[EKS] [EBS CSI addon]: Adding custom Toleration on EBS CSI Addon #1706

[EKS] [EBS CSI addon]: Adding custom Toleration on EBS CSI Addon #1706

Comments

singhnix commented Apr 10, 2022

Community Note

harrisjoseph commented Apr 14, 2022

stephen-gardner commented Apr 29, 2022

MartinEmrich commented May 6, 2022

MartinEmrich commented May 16, 2022

ksquaredkey commented Jun 6, 2022

Chili-Man commented Jul 11, 2022

idanl21 commented Jul 21, 2022

MartinEmrich commented Jul 21, 2022

GrigorievNick commented Jul 29, 2022 • edited Loading

nthienan commented Aug 4, 2022

dfquaresma commented Aug 4, 2022

johnjeffers commented Aug 17, 2022

dfquaresma commented Aug 17, 2022

johnjeffers commented Aug 21, 2022

sriramranganathan commented Aug 25, 2022

johnjeffers commented Aug 25, 2022 • edited Loading

youwalther65 commented Aug 25, 2022 • edited Loading

Joseph-Irving commented Sep 6, 2022

ksliu58 commented Sep 6, 2022

cilindrox commented Sep 6, 2022

youwalther65 commented Sep 8, 2022 • edited Loading

GNSunny commented Sep 8, 2022 • edited Loading

youwalther65 commented Sep 8, 2022

GNSunny commented Sep 8, 2022 • edited Loading

Chili-Man commented Sep 8, 2022

ksquaredkey commented Sep 9, 2022

ksliu58 commented Sep 9, 2022

duclm2609 commented Sep 12, 2022

parraletz commented Sep 14, 2022

yann-soubeyrand commented Nov 16, 2022

rajaie-sg commented Nov 16, 2022

yann-soubeyrand commented Nov 16, 2022

carlosrmendes commented Nov 16, 2022

youwalther65 commented Nov 17, 2022 • edited Loading

yann-soubeyrand commented Nov 17, 2022

andrei-korviakov commented Nov 17, 2022

youwalther65 commented Nov 17, 2022

youwalther65 commented Nov 21, 2022

yann-soubeyrand commented Nov 21, 2022 • edited Loading

ConnorJC3 commented Dec 23, 2022

jaimehrubiks commented Jan 3, 2023

sriramranganathan commented Jan 3, 2023

sriramranganathan commented Jan 3, 2023

pcebul commented Jan 23, 2023

ConnorJC3 commented Jan 23, 2023

youwalther65 commented Jan 23, 2023

singhnix commented Mar 8, 2023 • edited Loading

ConnorJC3 commented Mar 8, 2023 • edited Loading

GrigorievNick commented Jul 29, 2022 •

edited

Loading

johnjeffers commented Aug 25, 2022 •

edited

Loading

youwalther65 commented Aug 25, 2022 •

edited

Loading

youwalther65 commented Sep 8, 2022 •

edited

Loading

GNSunny commented Sep 8, 2022 •

edited

Loading

GNSunny commented Sep 8, 2022 •

edited

Loading

youwalther65 commented Nov 17, 2022 •

edited

Loading

yann-soubeyrand commented Nov 21, 2022 •

edited

Loading

singhnix commented Mar 8, 2023 •

edited

Loading

ConnorJC3 commented Mar 8, 2023 •

edited

Loading