-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managed Kubernetes: Reliable way to monitor capacity and free bytes of persistent volumes #166
Comments
The last time I checked (several months ago, maybe almost a year ago), the openstack cinder csi driver did not report metrics yet - did it change since ? Cf:
A few other issues related to metrics and cinder-csi seems fixed however - so hard to have a real status on this. |
I fear that not much has changed since. |
This is an output of cinder-csi-plugin in verbose mode:
The last line clearly shows that the volume metrics were provided by this pod. So in principle that should work for OVH managed Kubernetes clusters too. |
I got confirmation from the team that this will be solved with the CSI update planned within a month ! @BernhardGruen @nsteinmetz |
Perfect - thank you for keeping us updated. |
Hello to everyone following this issue ! The CSI update that will enable this metrics avilable is to be prodded within the next 10 days. Note that this will enable the feature for all Openstack regions running Stein. This is the case for most regions, and will be the case for all regions within summer. |
We just upgraded the cinder CSI. Hot snapshot ; capacity to snapshot a volume in use : This is key as it enable the use of all K8s compliant Backup and DRP tools such as Trillio and Kasten . This means that Kubernetes can call the snapshot feature from Cinder, while a block is being used. NOTE THAT THIS REQUIRES TO BE ON A STEIN REGION . On other regions, only cold snapshots are supported. : #77 Documentation will be updated in the upcoming weeks to reflect those changes. Here is the list of regions (where the Managed Kubernetes product is present) on OpenStack Stein: GRA5 , GRA9, SBG5 Regions still on OpenStack Newton (upgrade will be finalized this summer, more info will be published here : https://public-cloud.status-ovhcloud.com/ and by email to customers with these active regions) GRA7 (Stein upgrade planned on 2022-05-31) |
@mhurtrel Just confirmed that the volume metrics are now being collected, thanks! |
Currently there is no monitoring support for persistent volume claims (inside a managed Kubernetes cluster).
On most clusters this is done using an extension to the CSI driver that exports those metrics to the kubelet.
In Prometheus these metrics are then available as:
kubelet_volume_stats_available_bytes
kubelet_volume_stats_capacity_bytes
kubelet_volume_stats_used_bytes
kubelet_volume_stats_inodes
kubelet_volume_stats_inodes_free
kubelet_volume_stats_inodes_used
Without those metrics it is not possible to know and alert in advance if a persistent volume is near full and this could lead to severe outages of the hosted services.
Unfortunately I did not find a reliable workaround either. The one workaround that half way works but needs manual interaction often is to monitor using the node-exporter (
node_filesystem_free_bytes
). Unfortunately with this variant one has to restart all node-exporter every time a new StatefulSet is created or moves from one node to any other node. This just is not feasible and therefore it is currently not safe to host services with persistent volumes on OVH managed Kubernetes clusters.The text was updated successfully, but these errors were encountered: