-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow setting a volume attach limit per node #710
Comments
Hi, I patched and tried the following change in one of our lower environments: 2d9888f and seems to do what we are requesting. Setting the specified flag (based on our patch) to
and trying to schedule more trident volumes to a node will result in something like:
Let me know if you find this useful and want me to raise a PR against trident. |
Closing. NetApp support suggested that we use Kubernetes taints and tolerations to spread load in the cluster and do not intend supporting a limit from the driver. |
The spread constraints feature is probably the best way to prevent too much I/O load from ending up on any one node. Alternatively, other forms of pod anti-affinity can be used. The reason I'm not in favor of using volume-attach limits is that it's designed for hard limits, where the difference between N volumes and N+1 volumes is make or break, regardless of I/O load. It would be nice if, some day, Kubernetes supported IOPS quotas and I/O load could be managed like other resources, but for now we can use the spread constrains to get a mostly good outcome. |
thanks for the reply @bswartz. One of the cases we are trying to protect against is an attachment and path reinstating storm after a link loss is resolved, so we think that an hard upper volumes limit per node could be valuable for our setup. It will give us an extra layer of protection and will allow us to do operations with less or even no downtime at all. Regardless, we are exploring other existing kube mechanisms to achieve the same, as you mentioned above. |
It is a common concept for CSI drivers to advertise the maximum allowed number of volumes per node, so that kube-scheduler can honor that limit and cap pvcs per node/host.
This is briefly documented by Kubernetes here: https://kubernetes.io/docs/concepts/storage/storage-limits/.
The CSI spec caters for that via a
max_volumes_per_node
attribute inNodeGetInfo
https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetinfo.Trident should provide a flag to allow cluster admins to set limits according to their environments and ideally propose a default value for the flag.
Related approach has been followed in other CSI drivers like
aws-ebs-csi-driver
: kubernetes-sigs/aws-ebs-csi-driver#522This is currently affecting us, as we are trying to limit pvcs to ~20 per node in a busy cluster, after profiling our nodes and workloads during events such as a link loss, and could really help with cluster stability in some cases.
The text was updated successfully, but these errors were encountered: