Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [EBS CSI addon]: Adding custom Toleration on EBS CSI Addon #1706

Closed
singhnix opened this issue Apr 10, 2022 · 49 comments
Closed

[EKS] [EBS CSI addon]: Adding custom Toleration on EBS CSI Addon #1706

singhnix opened this issue Apr 10, 2022 · 49 comments
Labels
EKS Add-Ons EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@singhnix
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Adding custom toleration like "node.tolerateAllTaints = true" on EBS CSI addon so that it can tolerate taints put on nodes.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

It seems like there is potentially a missing setting on the DaemonSet created by this Add-on that will prevent it from being put on certain nodes with taints on them. People who manage this manually have fixed this by adding:

node.tolerateAllTaints = true

to the daemonset. Currently it does not look like the AWS Add-on allows for something like this. Because of this our move over to using this add on has caused issues in new persistent volumes from being created on certain nodes.

I am not fully sure if there is a workaround for this (I assumed modifying the daemonset after install would not be permanent or should really be a step in setting up).

kubernetes-sigs/aws-ebs-csi-driver#848

Specifically the error we saw that led to this is:

Warning ProvisioningFailed 18s (x7 over 81s) ebs.csi.aws.com_ebs-csi-controller-848fb4bd69-lnjp8_454f66d4-704c-4164-a5a0-283cb99d5688 failed to provision volume with StorageClass "gp3-encrypted": error generating accessibility requirements: no topology key found on CSINode ip-XX-XXX-XX-XX.ec2.internal
Normal ExternalProvisioning 8s (x6 over 81s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator

The label of:

topology.ebs.csi.aws.com/zone=

was only on nodes that the daemonset could run on instead of every node (due to the daemonset).

Are you currently working around this issue?
How are you currently solving this problem?

Additional context
Anything else we should know?

Attachments
Case ID:9858452761

@singhnix singhnix added the Proposed Community submitted issue label Apr 10, 2022
@mikestef9 mikestef9 added EKS Amazon Elastic Kubernetes Service EKS Add-Ons labels Apr 11, 2022
@harrisjoseph
Copy link

Thank you for raising this ticket. I'm curious about why the current tolerations for the ebs-csi-node statefulset in the EBS CSI driver Add-on has been chosen?

We currently have this:

      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists
        tolerationSeconds: 300

Which seems to assume that the user/operator won't want the ebs-csi-node, or any EBS functionality on any node that has any other taint set. Node isolation via taints is a pretty standard feature of k8s, it seems odd that we wouldn't want to use EBS volumes on all tainted nodes, when the daemonset could be extended to all nodes very easily:

      tolerations:
      - operator: Exists

I'm sure my question is just a sign that I have some deeper misunderstanding of how the EBS CSI driver is intended to be used in conjunction with EKS, I'm more than happy to be corrected, or to find that the CSI driver add-on can be easily configured to use EKS' standard topology keys, for instance.

@stephen-gardner
Copy link

I've been able to extend the daemonset to tainted nodes by modifying the tolerations, which is enough to deploy pending persistent volume claims...

kubectl -n kube-system get daemonset ebs-csi-node -o yaml --show-managed-fields | grep f:tolerations -C 2

            f:priorityClassName: {}
            f:serviceAccountName: {}
            f:tolerations: {}
            f:volumes:
              k:{"name":"device-dir"}:

Unfortunately, tolerations are a fully managed field, so my changes are quickly overwritten. The only workaround I could think of is to remove Amazon EBS CSI add-on, but preserve its resources, effectively turning it into a self-managed add-on.

eksctl delete addon --cluster my-cluster --name aws-ebs-csi-driver --preserve

@MartinEmrich
Copy link

Have the same issue. In the "standalone" ebs driver, this apparently can be configured with a helm variable "tolerateAllTaints", too: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/templates/node.yaml#L46

I have opened a ticket with AWS support on why they chose to set this to the (IMHO) less intuitive/sensible option.

@MartinEmrich
Copy link

Just to report back: AWS support recommended to use a self-managed installation of the CSI driver for the time being, and requested me to watch this issue on Github.

@ksquaredkey
Copy link

I'm in the same boat. An AddOn DaemonSet to provide a universal service that has by default an exclusion of any nodes with a Taint is like running Up the Down escalator. StatefulSets with PVs in a Production multi-AZ architecture - you're gonna have Taints on Worker Nodes so the SS Pods run in the right AZ. I see the option was added to the EBS CSI Driver 2 years ago. Haven't found any discussion around keeping the old default yet, so maybe I'm missing something important. But sure seems like an illogical choice given what EBS CSI Driver does.

@Chili-Man
Copy link

We just upgraded from an older helm chart version of the EBS CSI to the EKS managed add on version and did not realize that this problem until now. I don't understand and am disappointed that the add on does not allow the toleration field to be customizable for the daemonset - it makes absolutely no sense to not allow some configuration of it.

@idanl21
Copy link

idanl21 commented Jul 21, 2022

Hey, facing the same issue. i want to add tolerations :
- effect: NoSchedule operator:Exists
`
to ebs-csi-node daemon set, but a few minutes after the change it deleted. i assuming that the ebs-csi-controller is causing it, but i cannot find the tolerations definition in the controller.

@MartinEmrich
Copy link

@idanl21 Indeed, currently the only solution is to remove the AWS-managed Add-In, and install the ebs-csi-controller yourself.

@GrigorievNick
Copy link

GrigorievNick commented Jul 29, 2022

as I understand NodeSelector is also fullyManaged?

@nthienan
Copy link

nthienan commented Aug 4, 2022

this issue is preventing me to use AWS-managed addons. Any workarounds?

@dfquaresma
Copy link

I think I have solved this locally by editing ebs-csi-node daemonset by appending the following in the tolerations at line 365

- operator: Exists
  effect: NoSchedule

The toleration after the append will be like below:

      tolerations:
        - key: CriticalAddonsOnly
          operator: Exists
        - operator: Exists
          effect: NoExecute
          tolerationSeconds: 300
        - operator: Exists
          effect: NoSchedule

It's working, but I am almost sure that it will break again after the drive is updated.

@johnjeffers
Copy link

The solution above used drive on version 1.8. The latest version of the drive, 1.10, has already fixed the issue.

It's definitely not fixed in 1.10. I have 1.10 installed in my cluster and it won't deploy the daemonset to any nodes that have taints.

@dfquaresma
Copy link

It's definitely not fixed in 1.10. I have 1.10 installed in my cluster and it won't deploy the daemonset to any nodes that have taints.

@johnjeffers you are right, I checked only one of my clusters. The other one is still having the same issue. Thank you!

The workaround I shared before is still working.

@johnjeffers
Copy link

Could we please get an update on why this is taking so long to fix? All that has to be done is update the toleration on the daemonset to

      tolerations:
        - operator: Exists

The other EKS managed add-ons like kube-proxy and vpc-cni already do this. In fact, I copied and pasted that code block directly from the kube-proxy managed add-on's daemonset.

This is a blocker for upgrading to EKS 1.23, and the fix seems to be so simple. I can't understand why this issue has been open for so long. Is there more to this problem than there seems to be?

@sriramranganathan
Copy link

We recently released a behavior change that will NOT overwrite configuration changes made to EKS managed add-ons through the Kubernetes API. Previously, a reconciliation process ran every 15 minutes that overwrote configuration changes made to EKS managed add-ons through the Kubernetes API. Example – changes you make to the CoreDNS Config Map through the Kubernetes API will no longer be overwritten during steady state. However, if a managed add-on is upgraded, then any configuration changes made will not be retained at this time.

This change is a first step in ensuring configurations made to EKS add-ons are preserved. We are also working on additional changes to support advanced configuration of EKS add-ons directly through the EKS API, and the ability to preserve the configuration changes during add-on upgrades.

Toleration for the EBS-CSI driver is in our product backlog and is being evaluated by the team.

@johnjeffers
Copy link

johnjeffers commented Aug 25, 2022

OK just to make sure I understand, the permanent fix for this is in the backlog so we won't see it for a while, but if I manually update the daemonset with the tolerations I need, nothing will overwrite my changes until the next upgrade of the add-on?

@youwalther65
Copy link

youwalther65 commented Aug 25, 2022

OK just to make sure I understand, the permanent fix for this is in the backlog so we won't see it for a while, but if I manually update the daemonset with the tolerations I need, nothing will overwrite my changes until the next upgrade of the add-on?

Yes, I already tested it applying the requested toleration and it wasn't reconciled.
Upgrading will lead to a merge conflict though and you have to use OVERWRITE and re- apply your previous change(s).

@Joseph-Irving
Copy link

As more and more people move to K8s 1.23 where the ebs-csi-driver is mandatory, I imagine a fair amount will try out this addon and realise it doesn't work nicely with taints. It's a shame as we would like to use the eks-addons where possible and it seems like AWS is recommending people do that. Other Daemonsets that are eks-addons don't have this issue, so I'm surprised this one acts differently, I would've thought most people would consider EBS mounting something that they would want on all nodes by default as that's how it works pre 1.23.

@ksliu58
Copy link

ksliu58 commented Sep 6, 2022

Thanks for the feedback everyone.

Currently, you can modify your tolerations without been overwritten by EKS. We will be updating the CSI driver with EKS managed add-on in the next release, updating the daemonset to allow custom tolerations for all taints by default.

@cilindrox
Copy link

Just got bit by the same issue. Not having at least a warning on the docs or troubleshooting docs made things unnecessarily complicated. Having tolerations for every taint should be the default behavior for the DS.

@youwalther65
Copy link

youwalther65 commented Sep 8, 2022

Amazon has just released EKS managed add-on EBS CSI "v1.11.2-eksbuild.1".
The new default for tolerations of DaemonSet "ebs-csi-node" was changed according to the requests from customers and community to:

$ k get ds -n kube-system ebs-csi-node -o yaml
...
      tolerations:
      - operator: Exists
...

This is now similar to other add-ons like kube-proxy or AWS VPC CNI.

@GNSunny
Copy link

GNSunny commented Sep 8, 2022

I don't see it in the AWS console, nor from the AWS CLI.

image

image

@youwalther65
Copy link

I don't see it in the AWS console, nor from the AWS CLI.

image

image

I tested in in eu-west-1 (Dublin/Ieland) in the morning. Usually it is a phased rollout per region. You should be able to see it in a couple of hours. Please re-try later!

@GNSunny
Copy link

GNSunny commented Sep 8, 2022

I have tried at 16:00 BST from eu-west-1 region too. I didn't see any updated version, but I'll retry later

@Chili-Man
Copy link

@aleclerc-sonrai I'm on 1.23 and I don't see the new version yet either (us-west-2)

@ksquaredkey
Copy link

Still not available in us-west-2. Latest from "aws eks describe-addon-versions" is still: "v1.10.0-eksbuild.1". If I hadn't been so busy with other higher priority tickets, I'd have punted on waiting for the add-on and install and manage the drivers myself. The add-on is worthless with the Taint issue.

@ksliu58
Copy link

ksliu58 commented Sep 9, 2022

Thanks everyone for your patience.
The upcoming release with the new default for tolerations of DaemonSet is in progress, now with 15 out of 27 regions completed. We will complete deployment in the remaining regions shortly.

@duclm2609
Copy link

I hit the same problem and is waiting for new version rollout to finish to continue my work. Current ap-southeast-1 region is not available yet.

@parraletz
Copy link

@duclm2609 upgrade eks addon to v1.11.2-eksbuild.1 , in this version has been resolved the issue

My env:

eks cluster : kubernetes 1.23 version
region: oregon

@yann-soubeyrand
Copy link

It seems there’s a regression: after upgrading addon from v1.11.4-eksbuild.1 to v1.13.0-eksbuild.1, tolerations are gone on my ebs-csi-node DaemonSet.

@rajaie-sg
Copy link

^^ Running into the same issue. There are no tolerations set after upgrading to v1.13.0-eksbuild.1

@yann-soubeyrand
Copy link

v1.12.1-eksbuild.1 does not work either.

@carlosrmendes
Copy link

same here, after addon update to v1.13.0-eksbuild.1 I was unable to provision volumes on node with taints

@youwalther65
Copy link

youwalther65 commented Nov 17, 2022

How did you manage the upgrade of the managed EBS CSI add-on? Did you use the new "preserve" option to preserve your changes to toleration? (in case you have customized them)
Here are the differences between ebs-csi-node DaemonSet between Helm and managed add-on:

  1. EBS CSI installed via Helm (Fluxv2)
$ k get ds -n kube-system ebs-csi-node -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
    meta.helm.sh/release-name: aws-ebs-csi-driver
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2022-11-16T10:36:34Z"
  generation: 1
  labels:
    app.kubernetes.io/component: csi-driver
    app.kubernetes.io/instance: aws-ebs-csi-driver
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: aws-ebs-csi-driver
    app.kubernetes.io/version: 1.13.0
    helm.sh/chart: aws-ebs-csi-driver-2.13.0
    helm.toolkit.fluxcd.io/name: aws-ebs-csi-driver
    helm.toolkit.fluxcd.io/namespace: kube-system
  name: ebs-csi-node
  namespace: kube-system
...
      tolerations:
      - operator: Exists
...
  1. empty output/no tolerations for managed add-on for a new installation of v1.13.0-eksbuild.1
$ k get ds -n kube-system ebs-csi-node -o yaml | grep tolerations

So managed add-on is missing tolerations at all but you can apply it yourself. I will contact AWS service team to check if this is expected!

@yann-soubeyrand
Copy link

How did you manage the upgrade of the managed EBS CSI add-on? Did you use the new "preserve" option to preserve your changes to toleration?

I didn’t use the preserve option since I didn’t modify the tolerations because v1.11.5-eksbuild.1 deploys the right ones.

@andrei-korviakov
Copy link

Can confirm that the addon with a version newer than v1.11.5 doesn’t have tolerations in the manifest and therefore cannot be scheduled on nodes with taints.

@youwalther65
Copy link

We are working on a v1.12 and v1.13 managed add-on release which fixes the toleration issue.

@youwalther65
Copy link

We have rolled out add-on versions EBS CSI v1.13.0-eksbuild.2 (default version now for EKS v1.22, v1.23 and v1.24) and v1.12.1-eksbuild.2 in all regions with toleration for ebs-csi-node DaemonSet as expected. Please check!

      tolerations:
      - operator: Exists

@yann-soubeyrand
Copy link

yann-soubeyrand commented Nov 21, 2022

@youwalther65 I confirm it’s working, thanks!

@ConnorJC3
Copy link

Hi everyone, we have recently rolled out support for custom tolerations for the EBS CSI Driver addon using the new Advanced Configuration feature for EKS Addons!

This has come in two steps:

  1. Previous releases (v1.10.0-eksbuild.1 and up) have support for configuring whether the node tolerates all taints via node.tolerateAllTaints. The default behavior will be left the same as it was for each release prior to this change.
  2. For today's release (v1.14.0-eksbuild.1) we also support full customization of the tolerations of both the controller and node pods via controller.tolerations and node.tolerations. For now, node.tolerateAllTaints will continue to default to true, it is recommended you explicitly set it to false if customizing the node tolerations.

@jaimehrubiks
Copy link

@ConnorJC3 Thanks for the update, that was very useful. Is the team considering adding tolerations support to CoreDNS?

Currently, this is the only thing preventing people from creating an EKS cluster with tainted-only nodes.

#1389

@sriramranganathan
Copy link

@jaimehrubiks - the configuration for CoreDNS addon to support tolerations is being evaluated.

@sriramranganathan
Copy link

Amazon EKS team recently announced the general availability of advanced configuration feature for managed add-ons. You can now pass in advanced configuration for cluster add-ons, enabling you to customize add-on properties not handled by default settings. Configuration can be applied to add-ons either during cluster creation or at any time after the cluster is created.

Using advanced configuration feature, you can now configure custom tolerations for Amazon EBS CSI driver addon starting from v1.14.0-eksbuild.1. Custom tolerations can be configured through controller.tolerations and node.tolerations. Note, node.tolerateAllTaints will continue to default to true.

To learn more about this feature, check out this blogpost - https://aws.amazon.com/blogs/containers/amazon-eks-add-ons-advanced-configuration/

Check out the Amazon EKS documentation - https://docs.aws.amazon.com/eks/latest/userguide/managing-add-ons.html

@pcebul
Copy link

pcebul commented Jan 23, 2023

@youwalther65 @ConnorJC3
is it also valid for other addons like coredns? I mean default tolerations as
tolerations: - operator: Exists

and also ability to customize tolerations?

Thanks

@ConnorJC3
Copy link

cc @sriramranganathan should know the answer to that

@youwalther65
Copy link

@pcebul: You can check the JSON configuration schema of the CoreDNS managed add-on to see if it supports customizing the toleration as described in the corresponding launch blog post here.
Beside that any customization you apply to the managed add-on will be preserved which was introduced in October 2020, see here.

@singhnix
Copy link
Author

singhnix commented Mar 8, 2023

Now, I tried by using addon version 1.16 and 1.13, but it seems its not able to run on nodes with taints.
Below is the describe of ebs-csi-driver deployment:

kubectl describe pods ebs-csi-controller-765f496485-6xrl4 -n kube-system
Name: ebs-csi-controller-765f496485-6xrl4
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node:
Labels: app=ebs-csi-controller
app.kubernetes.io/component=csi-driver
app.kubernetes.io/managed-by=EKS
app.kubernetes.io/name=aws-ebs-csi-driver
app.kubernetes.io/version=1.13.0
pod-template-hash=765f496485
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/ebs-csi-controller-765f496485
Containers:
ebs-plugin:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-ebs-csi-driver:v1.13.0
Port: 9808/TCP
Host Port: 0/TCP
Args:
controller
--endpoint=$(CSI_ENDPOINT)
--k8s-tag-cluster-id=eksdemo1
--logtostderr
--v=2
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
Readiness: http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
Environment:
CSI_ENDPOINT: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
CSI_NODE_NAME: (v1:spec.nodeName)
AWS_ACCESS_KEY_ID: <set to the key 'key_id' in secret 'aws-secret'> Optional: true
AWS_SECRET_ACCESS_KEY: <set to the key 'access_key' in secret 'aws-secret'> Optional: true
AWS_EC2_ENDPOINT: <set to the key 'endpoint' of config map 'aws-meta'> Optional: true
AWS_REGION: us-east-1
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
csi-provisioner:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-provisioner:v3.3.0-eks-1-23-6
Port:
Host Port:
Args:
--csi-address=$(ADDRESS)
--v=2
--feature-gates=Topology=true
--extra-create-metadata
--leader-election=true
--default-fstype=ext4
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
csi-attacher:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-attacher:v4.0.0-eks-1-23-6
Port:
Host Port:
Args:
--csi-address=$(ADDRESS)
--v=2
--leader-election=true
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
csi-snapshotter:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-snapshotter:v6.1.0-eks-1-23-6
Port:
Host Port:
Args:
--csi-address=$(ADDRESS)
--leader-election=true
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
csi-resizer:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-resizer:v1.6.0-eks-1-23-6
Port:
Host Port:
Args:
--csi-address=$(ADDRESS)
--v=2
--handle-volume-inuse-error=false
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
liveness-probe:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/livenessprobe:v2.8.0-eks-1-24-4
Port:
Host Port:
Args:
--csi-address=/csi/csi.sock
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 40Mi
Environment:
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-crqqw (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
kube-api-access-crqqw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoExecute op=Exists for 300s
CriticalAddonsOnly op=Exists
Events:
Type Reason Age From Message


Warning FailedScheduling 80s default-scheduler 0/2 nodes are available: 2 node(s) had untolerated taint {dedicated: gpuGroup}. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.

And the describe of node with taint

kubectl describe node ip-192-168-22-69.ec2.internal
Name: ip-192-168-22-69.ec2.internal
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.medium
beta.kubernetes.io/os=linux
eks.amazonaws.com/capacityType=ON_DEMAND
eks.amazonaws.com/nodegroup=test
eks.amazonaws.com/nodegroup-image=ami-01ced323515f177b0
failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1a
k8s.io/cloud-provider-aws=ecd4858b2f1606704d42ac1e8fb5beb2
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-192-168-22-69.ec2.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t3.medium
topology.ebs.csi.aws.com/zone=us-east-1a
topology.kubernetes.io/region=us-east-1
topology.kubernetes.io/zone=us-east-1a
Annotations: alpha.kubernetes.io/provided-node-ip: 192.168.22.69
csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-028d4dbf3ef17a637"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 08 Mar 2023 14:30:08 +0530
Taints: dedicated=gpuGroup:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: ip-192-168-22-69.ec2.internal
AcquireTime:
RenewTime: Wed, 08 Mar 2023 14:35:15 +0530
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Wed, 08 Mar 2023 14:31:10 +0530 Wed, 08 Mar 2023 14:30:07 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 08 Mar 2023 14:31:10 +0530 Wed, 08 Mar 2023 14:30:07 +0530 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 08 Mar 2023 14:31:10 +0530 Wed, 08 Mar 2023 14:30:07 +0530 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 08 Mar 2023 14:31:10 +0530 Wed, 08 Mar 2023 14:30:28 +0530 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.22.69
ExternalIP: 54.145.127.175
Hostname: ip-192-168-22-69.ec2.internal
InternalDNS: ip-192-168-22-69.ec2.internal
ExternalDNS: ec2-54-145-127-175.compute-1.amazonaws.com
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3955688Ki
pods: 17
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 1930m
ephemeral-storage: 18242267924
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3400680Ki
pods: 17
System Info:
Machine ID: ec25d974ebfe3816b53e289539ea44b6
System UUID: ec25d974-ebfe-3816-b53e-289539ea44b6
Boot ID: 8fbe5d0d-c66f-4289-b3f9-7eff8cb4cadb
Kernel Version: 5.10.165-143.735.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.6
Kubelet Version: v1.24.10-eks-48e63af
Kube-Proxy Version: v1.24.10-eks-48e63af
ProviderID: aws:///us-east-1a/i-028d4dbf3ef17a637
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age


amazon-cloudwatch fluent-bit-k7sfv 500m (25%) 0 (0%) 100Mi (3%) 200Mi (6%) 5m7s
kube-system aws-node-kmcvw 25m (1%) 0 (0%) 0 (0%) 0 (0%) 5m7s
kube-system ebs-csi-node-clhhl 30m (1%) 300m (15%) 120Mi (3%) 768Mi (23%) 4m24s
kube-system kube-proxy-64lxv 100m (5%) 0 (0%) 0 (0%) 0 (0%) 5m7s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 655m (33%) 300m (15%)
memory 220Mi (6%) 968Mi (29%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message


Normal Starting 5m kube-proxy
Warning InvalidDiskCapacity 5m8s kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 5m8s (x2 over 5m8s) kubelet Node ip-192-168-22-69.ec2.internal status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 5m8s (x2 over 5m8s) kubelet Node ip-192-168-22-69.ec2.internal status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 5m8s (x2 over 5m8s) kubelet Node ip-192-168-22-69.ec2.internal status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 5m8s kubelet Updated Node Allocatable limit across pods
Normal Starting 5m8s kubelet Starting kubelet.
Normal NodeReady 4m47s kubelet Node ip-192-168-22-69.ec2.internal status is now: NodeReady

So, it seems that tolerations -op:exists doesnt work for all types or any taint on the node. Please confirm this.

@ConnorJC3
Copy link

ConnorJC3 commented Mar 8, 2023

Hi @singhnix - the EBS CSI controller pods intentionally do not tolerate all taints by default (only the node pods do so by default as of right now). If you wish to set custom tolerations for the controller you can do so via the advanced configuration feature as mentioned here and here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Add-Ons EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests