Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting a volume attach limit per node #710

Closed
ffilippopoulos opened this issue Mar 1, 2022 · 4 comments
Closed

Allow setting a volume attach limit per node #710

ffilippopoulos opened this issue Mar 1, 2022 · 4 comments

Comments

@ffilippopoulos
Copy link

It is a common concept for CSI drivers to advertise the maximum allowed number of volumes per node, so that kube-scheduler can honor that limit and cap pvcs per node/host.
This is briefly documented by Kubernetes here: https://kubernetes.io/docs/concepts/storage/storage-limits/.
The CSI spec caters for that via a max_volumes_per_node attribute in NodeGetInfo https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetinfo.

Trident should provide a flag to allow cluster admins to set limits according to their environments and ideally propose a default value for the flag.

Related approach has been followed in other CSI drivers like aws-ebs-csi-driver: kubernetes-sigs/aws-ebs-csi-driver#522

This is currently affecting us, as we are trying to limit pvcs to ~20 per node in a busy cluster, after profiling our nodes and workloads during events such as a link loss, and could really help with cluster stability in some cases.

@ffilippopoulos
Copy link
Author

Hi, I patched and tried the following change in one of our lower environments: 2d9888f and seems to do what we are requesting. Setting the specified flag (based on our patch) to -volume_attach_limit=10 will modify the drivers allocatables count, while not setting it will allow unlimited vols per node. For example, kubectl describe csinodes.storage.k8s.io gives:

Spec:
  Drivers:
    csi.trident.netapp.io:
      Node ID:  worker-0.exp-1.merit.uw.systems
      Allocatables:
        Count:        10
      Topology Keys:  [topology.kubernetes.io/zone]

and trying to schedule more trident volumes to a node will result in something like:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  32s   default-scheduler  0/8 nodes are available: 1 node(s) exceed max volume count, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 5 node(s) didn't match Pod's node affinity/selector.

Let me know if you find this useful and want me to raise a PR against trident.

@ffilippopoulos
Copy link
Author

Closing. NetApp support suggested that we use Kubernetes taints and tolerations to spread load in the cluster and do not intend supporting a limit from the driver.

@bswartz
Copy link
Contributor

bswartz commented Mar 18, 2022

The spread constraints feature is probably the best way to prevent too much I/O load from ending up on any one node. Alternatively, other forms of pod anti-affinity can be used.

The reason I'm not in favor of using volume-attach limits is that it's designed for hard limits, where the difference between N volumes and N+1 volumes is make or break, regardless of I/O load. It would be nice if, some day, Kubernetes supported IOPS quotas and I/O load could be managed like other resources, but for now we can use the spread constrains to get a mostly good outcome.

@ffilippopoulos
Copy link
Author

thanks for the reply @bswartz. One of the cases we are trying to protect against is an attachment and path reinstating storm after a link loss is resolved, so we think that an hard upper volumes limit per node could be valuable for our setup. It will give us an extra layer of protection and will allow us to do operations with less or even no downtime at all. Regardless, we are exploring other existing kube mechanisms to achieve the same, as you mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants