STOR-2040: oc adm top persistentvolumeclaim #1704

tsmetana · 2024-10-23T11:31:20Z

This is an enhancement for the new oc CLI option to display the PVC percentual usage.

The feature is being implemented in openshift/oc#1854

openshift-ci-robot · 2024-10-23T11:31:24Z

@tsmetana: This pull request references STOR-2040 which is a valid jira issue.

In response to this:

This is an enhancement for the new oc CLI option to display the PVC percentual usage.

The feature is being implemented in openshift/oc#1854

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

tsmetana · 2024-10-23T11:32:05Z

@gmeghnag, @dobsonj, @ardaguclu PTAL

ardaguclu

This command sounds very useful to me and agree that it is a natural extension. However, getting data directly from metrics server by discarding API server has natural implications. So I dropped some questions.

ardaguclu · 2024-10-24T06:51:18Z

enhancements/oc/top-pvc-usage.md

+
+### Implementation Details/Notes/Constraints
+
+There are the `kubelet_volume_stats_used_bytes` and


inspect-alerts command followed the same way by fetching the stats from metrics endpoint (in my opinion discarding the API server is not a good idea but I understand the motivation). However, inspect-alerts appeals to less people comparing to this command. Users (without checking metrics is enabled or not) will trigger this command and if metrics server is not enabled, what is the expected behavior?. I think we need to catch this earlier and should print an explanatory warning about how this command can be used.

In a case where Prometheus pods are both down (which means that the ClusterOperator/monitoring is Degraded) the command would fail with the following error message:

$ oc adm top persistentvolumeclaims error: failed to get persistentvolumeclaims from Prometheus: unable to get /api/v1/query from URI in the openshift-monitoring/prometheus-k8s Route: prometheus-k8s-openshift-monitoring.apps.sharedocp415.lab.local->GET status code=503

In the same case the oc adm top pod and oc adm top node would fail as well:

$ oc adm top pod error: Metrics not available for pod openshift-monitoring/alertmanager-main-0, age: 81h52m43.855409s $ oc adm top nodes error: metrics not available yet

I think, we should raise a better error message like metrics not available yet for this case.

FWIW;
If I was the driver of this KEP, I'd first get the feedback from upstream about adding this first in there by extending https://github.com/kubernetes/kubernetes/blob/18f3941c24b9ed40eecd384f78778d2ff185c31d/staging/src/k8s.io/metrics/pkg/apis/metrics/v1beta1/types.go#L50 . In my opinion, this is a useful feature and I believe upstream would consider this. Also if this is accepted by upstream, we'll have a natural client instead of sending a direct request to metrics server.

"I think, we should raise a better error message like metrics not available yet for this case."
Ok, I can work on this.

ardaguclu · 2024-10-24T06:52:27Z

enhancements/oc/top-pvc-usage.md

+### Implementation Details/Notes/Constraints
+
+There are the `kubelet_volume_stats_used_bytes` and
+`kubelet_volume_stats_capacity_bytes` Prometheus metrics which can be used to


I assume that kubelet_volume_stats_capacity_bytes is a standard even in vanilla Kubernetes and we are certain that it will be there if Prometheus is enabled?.

Yes, kubelet_volume_stats_capacity_bytes is a metric that Kubernetes exports by default, even in a vanilla cluster.

ardaguclu · 2024-10-24T06:56:29Z

enhancements/oc/top-pvc-usage.md

+
+#### Story 1
+
+As an OCP project user, I want to see a list of all PVC's in my namespace and


The metrics fetching request will be sent to Prometheus with a bearerToken. What is the expected permission set for this bearerToken that can successfully fetch the required metrics in here?. Can a standard user use this command smoothly? or limited only to cluster admins?. If it is limited to cluster admins, wouldn't be less useful?, since we are providing a command to the users but only few of them can use it actually?.

"What is the expected permission set for this bearerToken that can successfully fetch the required metrics in here?"
At least the ClusterRole/cluster-monitoring-view and the get/list verbs on Route objects in openshift-monitoring namespace are needed to execute the query.

"Can a standard user use this command smoothly?"
A user without the above permissions can't use this command.

"or limited only to cluster admins?"
Being created under oc adm where most of the commands used for administrative tasks on the cluster reside, this command is firstly intended for cluster admins.

"If it is limited to cluster admins, wouldn't be less useful?, since we are providing a command to the users but only few of them can use it actually?"
I don't think so:

Being the cluster admins the people in charge of expanding volumes this command benefits them.

If needed, in a few steps, the following, anyone can execute such a command:
$ oc create clusterrole routes-view --verb=get,list --resource=routes -n openshift-monitoring $ oc adm policy add-cluster-role-to-user routes-view <USER> $ oc adm policy add-cluster-role-to-user cluster-monitoring-view <USER>

ardaguclu · 2024-10-24T07:00:32Z

enhancements/oc/top-pvc-usage.md

+
+1. Provide a simple CLI option to display filesystem usage of
+   PersistentVolumeClaims.
+2. Display only the percentual usage for a given PersistentVolumeClaim or all


If user invokes this command with --all-namespaces, can we get all the required data in 1 prometheus query? or are we going to send multiple queries per each namespace?. If user has 100 namespaces, sending 100 queries to metrics server would be impractical.

"If user invokes this command with --all-namespaces, can we get all the required data in 1 prometheus query?"
Yes, and this is how it works when you execute the command with --all-namespaces, a single query will be executed:

100*kubelet_volume_stats_used_bytes{persistentvolumeclaim=~".*"}/kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=~".*"}

"or are we going to send multiple queries per each namespace?"
No, this way would overload the metric server, and as you said would be impractical.

ardaguclu · 2024-10-24T07:04:46Z

enhancements/oc/top-pvc-usage.md

+value. This could be implemented by a single call to Prometheus API using PromQL
+and would not need additional API calls.
+
+The additional columns (e.g. volume capacity, absolute value of the free space)


I saw in the PR in oc about the queries by getting Route, Ingress objects to construct metrics server endpoint. Are we certain that these objects are always created if metrics server is enabled?. Is there any possibility that metrics server is enabled but route and ingress are not defined by cluster admin. So user can easily think that metrics server is enabled but command still does not work?. If we are not certain, users can only use iff metrics server is enabled, route and ingress are created and required role permissions are correctly given.

Additionally are we certain that this implementation method would also work on disconnected?.

"Are we certain that these objects are always created if metrics server is enabled?"
Yes, the Route/prometheus-k8s is created by default during cluster installation.

"Is there any possibility that metrics server is enabled but route and ingress are not defined by cluster admin."
If the admin for any reason deletes the Route/prometheus-k8s object the command will fail with the following error, which is self-explanatory:

$ oc adm top persistentvolumeclaims -A error: failed to get persistentvolumeclaims from Prometheus: routes.route.openshift.io "prometheus-k8s" not found

"are we certain that this implementation method would also work on disconnected?"
Yes, I've tested it.

ardaguclu · 2024-10-25T12:40:21Z

Thank you for answering my questions. I still strongly recommend to ask feedback in upstream adding this feature in there first. But this is not a blocker for me.
LGTM
but markdownlint failure seems to be real.

tsmetana · 2024-10-29T10:21:45Z

Thanks for the review. I'll fix the linter issues.

tsmetana · 2024-10-29T14:41:31Z

The linter looks happy now. I have also tried to incorporate the comments from the review itself into the document.

dobsonj · 2024-10-29T16:35:35Z

/lgtm
Thanks for putting this together and adding some details from the questions in this PR.

tsmetana · 2024-10-30T09:24:10Z

/assign trozet

dobsonj · 2024-11-18T16:25:41Z

/lgtm

ardaguclu · 2024-11-18T18:13:59Z

/lgtm

ardaguclu · 2024-11-19T06:05:22Z

I don't think I have privileges to approve this PR but since I'm in the approver list;
/approve

jsafrane · 2024-11-19T09:39:28Z

/approve

openshift-ci · 2024-11-19T09:40:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ardaguclu, gmeghnag, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jsafrane]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-11-19T10:15:14Z

@tsmetana: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 23, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2024

openshift-ci bot requested review from cybertron and tjungblu October 23, 2024 11:31

ardaguclu reviewed Oct 24, 2024

View reviewed changes

tsmetana force-pushed the oc-adm-top-pvc branch 2 times, most recently from 24ee6e1 to 176bef1 Compare October 29, 2024 13:57

tsmetana changed the title ~~[WIP] STOR-2040: oc adm top persistentvolumeclaim~~ STOR-2040: oc adm top persistentvolumeclaim Oct 29, 2024

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 29, 2024

gmeghnag approved these changes Oct 29, 2024

View reviewed changes

openshift-ci bot assigned dobsonj Oct 29, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2024

openshift-ci bot assigned trozet Oct 30, 2024

gmeghnag mentioned this pull request Nov 9, 2024

STOR-2040: CLI command to display bound pvc filesystem usage percentage openshift/oc#1854

Open

Add oc adm top pvc enhancement

4b405d3

tsmetana force-pushed the oc-adm-top-pvc branch from 176bef1 to 4b405d3 Compare November 18, 2024 14:02

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2024

openshift-ci bot assigned ardaguclu Nov 18, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2024

openshift-merge-bot bot merged commit 8d48fb5 into openshift:master Nov 19, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STOR-2040: oc adm top persistentvolumeclaim #1704

STOR-2040: oc adm top persistentvolumeclaim #1704

tsmetana commented Oct 23, 2024

openshift-ci-robot commented Oct 23, 2024 •

edited by openshift-ci bot

Loading

tsmetana commented Oct 23, 2024

ardaguclu left a comment

ardaguclu Oct 24, 2024

gmeghnag Oct 24, 2024

ardaguclu Oct 24, 2024

gmeghnag Oct 24, 2024 •

edited

Loading

ardaguclu Oct 24, 2024

gmeghnag Oct 24, 2024

ardaguclu Oct 24, 2024

gmeghnag Oct 24, 2024

ardaguclu Oct 24, 2024

gmeghnag Oct 24, 2024

ardaguclu Oct 24, 2024

ardaguclu Oct 24, 2024

gmeghnag Oct 24, 2024 •

edited

Loading

ardaguclu commented Oct 25, 2024

tsmetana commented Oct 29, 2024

tsmetana commented Oct 29, 2024

dobsonj commented Oct 29, 2024

tsmetana commented Oct 30, 2024

dobsonj commented Nov 18, 2024

ardaguclu commented Nov 18, 2024

ardaguclu commented Nov 19, 2024

jsafrane commented Nov 19, 2024

openshift-ci bot commented Nov 19, 2024

openshift-ci bot commented Nov 19, 2024


		### Implementation Details/Notes/Constraints

		There are the `kubelet_volume_stats_used_bytes` and


		#### Story 1

		As an OCP project user, I want to see a list of all PVC's in my namespace and

STOR-2040: oc adm top persistentvolumeclaim #1704

STOR-2040: oc adm top persistentvolumeclaim #1704

Conversation

tsmetana commented Oct 23, 2024

openshift-ci-robot commented Oct 23, 2024 • edited by openshift-ci bot Loading

tsmetana commented Oct 23, 2024

ardaguclu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmeghnag Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmeghnag Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

ardaguclu commented Oct 25, 2024

tsmetana commented Oct 29, 2024

tsmetana commented Oct 29, 2024

dobsonj commented Oct 29, 2024

tsmetana commented Oct 30, 2024

dobsonj commented Nov 18, 2024

ardaguclu commented Nov 18, 2024

ardaguclu commented Nov 19, 2024

jsafrane commented Nov 19, 2024

openshift-ci bot commented Nov 19, 2024

openshift-ci bot commented Nov 19, 2024

openshift-ci-robot commented Oct 23, 2024 •

edited by openshift-ci bot

Loading

gmeghnag Oct 24, 2024 •

edited

Loading

gmeghnag Oct 24, 2024 •

edited

Loading