-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STOR-2040: oc adm top persistentvolumeclaim #1704
Conversation
@tsmetana: This pull request references STOR-2040 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@gmeghnag, @dobsonj, @ardaguclu PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command sounds very useful to me and agree that it is a natural extension. However, getting data directly from metrics server by discarding API server has natural implications. So I dropped some questions.
enhancements/oc/top-pvc-usage.md
Outdated
|
||
### Implementation Details/Notes/Constraints | ||
|
||
There are the `kubelet_volume_stats_used_bytes` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inspect-alerts
command followed the same way by fetching the stats from metrics endpoint (in my opinion discarding the API server is not a good idea but I understand the motivation). However, inspect-alerts
appeals to less people comparing to this command. Users (without checking metrics is enabled or not) will trigger this command and if metrics server is not enabled, what is the expected behavior?. I think we need to catch this earlier and should print an explanatory warning about how this command can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a case where Prometheus pods are both down (which means that the ClusterOperator/monitoring
is Degraded
) the command would fail with the following error message:
$ oc adm top persistentvolumeclaims
error: failed to get persistentvolumeclaims from Prometheus: unable to get /api/v1/query from URI in the openshift-monitoring/prometheus-k8s Route: prometheus-k8s-openshift-monitoring.apps.sharedocp415.lab.local->GET status code=503
In the same case the oc adm top pod
and oc adm top node
would fail as well:
$ oc adm top pod
error: Metrics not available for pod openshift-monitoring/alertmanager-main-0, age: 81h52m43.855409s
$ oc adm top nodes
error: metrics not available yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should raise a better error message like metrics not available yet
for this case.
FWIW;
If I was the driver of this KEP, I'd first get the feedback from upstream about adding this first in there by extending https://github.com/kubernetes/kubernetes/blob/18f3941c24b9ed40eecd384f78778d2ff185c31d/staging/src/k8s.io/metrics/pkg/apis/metrics/v1beta1/types.go#L50 . In my opinion, this is a useful feature and I believe upstream would consider this. Also if this is accepted by upstream, we'll have a natural client instead of sending a direct request to metrics server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "I think, we should raise a better error message like
metrics not available yet
for this case."
Ok, I can work on this.
enhancements/oc/top-pvc-usage.md
Outdated
### Implementation Details/Notes/Constraints | ||
|
||
There are the `kubelet_volume_stats_used_bytes` and | ||
`kubelet_volume_stats_capacity_bytes` Prometheus metrics which can be used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that kubelet_volume_stats_capacity_bytes
is a standard even in vanilla Kubernetes and we are certain that it will be there if Prometheus is enabled?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, kubelet_volume_stats_capacity_bytes
is a metric that Kubernetes exports by default, even in a vanilla cluster.
enhancements/oc/top-pvc-usage.md
Outdated
|
||
#### Story 1 | ||
|
||
As an OCP project user, I want to see a list of all PVC's in my namespace and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics fetching request will be sent to Prometheus with a bearerToken
. What is the expected permission set for this bearerToken that can successfully fetch the required metrics in here?. Can a standard user use this command smoothly? or limited only to cluster admins?. If it is limited to cluster admins, wouldn't be less useful?, since we are providing a command to the users but only few of them can use it actually?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "What is the expected permission set for this bearerToken that can successfully fetch the required metrics in here?"
At least theClusterRole/cluster-monitoring-view
and theget/list
verbs onRoute
objects inopenshift-monitoring
namespace are needed to execute the query. - "Can a standard user use this command smoothly?"
A user without the above permissions can't use this command. - "or limited only to cluster admins?"
Being created underoc adm
where most of the commands used for administrative tasks on the cluster reside, this command is firstly intended for cluster admins. - "If it is limited to cluster admins, wouldn't be less useful?, since we are providing a command to the users but only few of them can use it actually?"
I don't think so:- Being the cluster admins the people in charge of expanding volumes this command benefits them.
- If needed, in a few steps, the following, anyone can execute such a command:
$ oc create clusterrole routes-view --verb=get,list --resource=routes -n openshift-monitoring $ oc adm policy add-cluster-role-to-user routes-view <USER> $ oc adm policy add-cluster-role-to-user cluster-monitoring-view <USER>
enhancements/oc/top-pvc-usage.md
Outdated
|
||
1. Provide a simple CLI option to display filesystem usage of | ||
PersistentVolumeClaims. | ||
2. Display only the percentual usage for a given PersistentVolumeClaim or all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user invokes this command with --all-namespaces
, can we get all the required data in 1 prometheus query? or are we going to send multiple queries per each namespace?. If user has 100 namespaces, sending 100 queries to metrics server would be impractical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
"If user invokes this command with
--all-namespaces
, can we get all the required data in 1 prometheus query?"
Yes, and this is how it works when you execute the command with--all-namespaces
, a single query will be executed:100*kubelet_volume_stats_used_bytes{persistentvolumeclaim=~".*"}/kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=~".*"}
-
"or are we going to send multiple queries per each namespace?"
No, this way would overload the metric server, and as you said would be impractical.
enhancements/oc/top-pvc-usage.md
Outdated
value. This could be implemented by a single call to Prometheus API using PromQL | ||
and would not need additional API calls. | ||
|
||
The additional columns (e.g. volume capacity, absolute value of the free space) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw in the PR in oc about the queries by getting Route, Ingress objects to construct metrics server endpoint. Are we certain that these objects are always created if metrics server is enabled?. Is there any possibility that metrics server is enabled but route and ingress are not defined by cluster admin. So user can easily think that metrics server is enabled but command still does not work?. If we are not certain, users can only use iff metrics server is enabled, route and ingress are created and required role permissions are correctly given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally are we certain that this implementation method would also work on disconnected?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
"Are we certain that these objects are always created if metrics server is enabled?"
Yes, theRoute/prometheus-k8s
is created by default during cluster installation. -
"Is there any possibility that metrics server is enabled but route and ingress are not defined by cluster admin."
If the admin for any reason deletes theRoute/prometheus-k8s
object the command will fail with the following error, which is self-explanatory:$ oc adm top persistentvolumeclaims -A error: failed to get persistentvolumeclaims from Prometheus: routes.route.openshift.io "prometheus-k8s" not found
-
"are we certain that this implementation method would also work on disconnected?"
Yes, I've tested it.
Thank you for answering my questions. I still strongly recommend to ask feedback in upstream adding this feature in there first. But this is not a blocker for me. |
Thanks for the review. I'll fix the linter issues. |
24ee6e1
to
176bef1
Compare
The linter looks happy now. I have also tried to incorporate the comments from the review itself into the document. |
/lgtm |
/assign trozet |
176bef1
to
4b405d3
Compare
/lgtm |
/lgtm |
I don't think I have privileges to approve this PR but since I'm in the approver list; |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ardaguclu, gmeghnag, jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@tsmetana: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is an enhancement for the new oc CLI option to display the PVC percentual usage.
The feature is being implemented in openshift/oc#1854