Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STOR-2040: CLI command to display bound pvc filesystem usage percentage #1854

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

gmeghnag
Copy link

The code makes use of a Prometheus query to implement the oc adm top persistentvolumeclaims and show usage statistics for bound persistentvolumeclaims, as follows:

oc adm top pvc --insecure-skip-tls-verify=true -n reproducer-pvc
NAMESPACE      NAME               USAGE(%) 
reproducer-pvc pvc-reproducer-pvc 98.28    
reproducer-pvc pvc-test-pvc       14.56    

It supports the following flags:

    -A, --all-namespaces=false:
	If present, list the pvc usage across all namespaces. Namespace in current context is ignored even if
	specified with --namespace

    --insecure-skip-tls-verify=false:
	If true, the server's certificate will not be checked for validity. This will make your HTTPS connections
	insecure

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 22, 2024

@gmeghnag: This pull request references STOR-2040 which is a valid jira issue.

In response to this:

The code makes use of a Prometheus query to implement the oc adm top persistentvolumeclaims and show usage statistics for bound persistentvolumeclaims, as follows:

oc adm top pvc --insecure-skip-tls-verify=true -n reproducer-pvc
NAMESPACE      NAME               USAGE(%) 
reproducer-pvc pvc-reproducer-pvc 98.28    
reproducer-pvc pvc-test-pvc       14.56    

It supports the following flags:

   -A, --all-namespaces=false:
  If present, list the pvc usage across all namespaces. Namespace in current context is ignored even if
  specified with --namespace

   --insecure-skip-tls-verify=false:
  If true, the server's certificate will not be checked for validity. This will make your HTTPS connections
  insecure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 22, 2024
@openshift-ci openshift-ci bot requested review from ardaguclu and soltysh August 22, 2024 21:45
Copy link
Contributor

openshift-ci bot commented Aug 22, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gmeghnag
Once this PR has been reviewed and has the lgtm label, please assign ardaguclu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ardaguclu
Copy link
Member

Thank you for spending time on this nice and useful feature. But that requires a KEP first to sync on the feature and implementation.

@gmeghnag
Copy link
Author

Hey @ardaguclu, what is a KEP? And how to create one? Thanks :)

@ardaguclu
Copy link
Member

Hey @ardaguclu, what is a KEP? And how to create one? Thanks :)

I think, you'd write an enhancement proposal under https://github.com/openshift/enhancements/tree/master/enhancements/oc and we'll discuss the design and align on it. Once it merges, we can return to this PR.

Copy link
Member

@dobsonj dobsonj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @gmeghnag :) This is nice work, I just have a few suggestions.

Comment on lines +203 to +207
urlParams := url.Values{}
urlParams.Set("query", query)
uri.RawQuery = urlParams.Encode()

persistentVolumeClaimsBytes, err := getWithBearer(ctx, getRoute, "openshift-monitoring", "prometheus-k8s", uri, bearerToken, insecureTLS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this error when using system:admin KUBECONFIG, but not for the other oc adm top subcommands:

$ ./oc adm top pvc
error: failed to get persistentvolumeclaims from Prometheus: no token is currently in use for this session

Is it possible to avoid this error by using the metrics API interface instead of constructing a raw request w/ bearer token?

func getMetricsFromMetricsAPI(metricsClient metricsclientset.Interface, namespace, resourceName string, allNamespaces bool, labelSelector labels.Selector, fieldSelector fields.Selector) (*metricsapi.PodMetricsList, error) {
var err error
ns := metav1.NamespaceAll
if !allNamespaces {
ns = namespace
}
versionedMetrics := &metricsv1beta1api.PodMetricsList{}
if resourceName != "" {
m, err := metricsClient.MetricsV1beta1().PodMetricses(ns).Get(context.TODO(), resourceName, metav1.GetOptions{})
if err != nil {
return nil, err
}
versionedMetrics.Items = []metricsv1beta1api.PodMetrics{*m}
} else {
versionedMetrics, err = metricsClient.MetricsV1beta1().PodMetricses(ns).List(context.TODO(), metav1.ListOptions{LabelSelector: labelSelector.String(), FieldSelector: fieldSelector.String()})
if err != nil {
return nil, err
}
}
metrics := &metricsapi.PodMetricsList{}
err = metricsv1beta1api.Convert_v1beta1_PodMetricsList_To_metrics_PodMetricsList(versionedMetrics, metrics, nil)
if err != nil {
return nil, err
}
return metrics, nil
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU the metricsclientset API allows consumers to access only resource metrics (CPU and memory) for pods and nodes.

So if we strictly want system:admin users authenticated using the KUBECONFIG file to use that command we should evaluate:

  • getting a token from a different place, maybe a secret containing it?! (I will try to check if any exists)
  • using a different workflow to obtain this information rather than Prometheus API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out that the token from the secret localhost-recovery-client-token in openshift-kube-apiserver namespace is a valid one to use:

oc get secret -n openshift-kube-apiserver localhost-recovery-client-token -o json | jq '.data.token' -r | base64 -d

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this modification before next week so we can try to merge this PR, as @tsmetana agreed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

o.BearerToken = o.RESTConfig.BearerToken
if len(o.RESTConfig.BearerToken) == 0 {
klog.V(4).Info("no token is currently in use for this session")
klog.V(5).Info("attempting to retrieve token from secret \"localhost-recovery-client-token\" in namespace \"openshift-kube-apiserver\"")
secret, err := o.ClientSet.CoreV1().Secrets("openshift-kube-apiserver").Get(context.TODO(), "localhost-recovery-client-token", metav1.GetOptions{})
if err != nil {
klog.V(4).Info(fmt.Errorf("error retrieving secret: %s", err.Error()))
return fmt.Errorf("no token is currently in use for this session")
}
localhostRecoveryToken, exist := secret.Data["token"]
if !exist {
klog.V(4).Info(fmt.Errorf("\"token\" key not found in secret \"localhost-recovery-client-token\" in namespace \"openshift-kube-apiserver\""))
return fmt.Errorf("no token is currently in use for this session")
}
o.BearerToken = string(localhostRecoveryToken)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

}

}
headers := []string{"NAMESPACE", "NAME", "USAGE(%)"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some additional fields? My suggestion would be NAMESPACE, NAME, VOLUME, CAPACITY, USED, USAGE(%).

VOLUME: name of the PV attached to the PVC so the user can easily see which volume it corresponds to.
CAPACITY: total capacity of the volume in multiple of bytes (same as oc get pv, for example 10Gi or 2Ti).
USED: total used space of the volume in multiple of bytes (same format as capacity bytes).

This would be useful to mention in the enhancement that @ardaguclu requested, in case anyone has other suggestions (or objections).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For VOLUME, CAPACITY, USED it would require code modification using a different approach, maybe we can do that in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this one. The default outuput format should not change once this gets GA'd. It would efficiently become an API (a script parsing the output should not get broken by the update), so adding new columns at least to the default output might not be as simple.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, so the option we have in the future to implement these other column fields without changing the default output is to use -o wide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan sounds fine to me, thanks for considering it.

Comment on lines 158 to 172
// if more pvc are requested as args but one of them does not not exist
if len(args) != 0 && len(promOutput.Data.Result) != len(args) {
resultingPvc := make(map[string]bool)
for _, _promOutputDataResult := range promOutput.Data.Result {
pvcName, _ := _promOutputDataResult.Metric["persistentvolumeclaim"]
resultingPvc[pvcName] = true
}
for _, arg := range args {
_, pvcPresentInResult := resultingPvc[arg]
if !pvcPresentInResult {
return fmt.Errorf("persistentvolumeclaim %q not found in %s namespace.", arg, o.Namespace)
}
}

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will immediately return an error if one PVC is not found. Could it instead return the valid ones + errors for the ones that do not exist?

For example:

$ oc get pvc -n testns claim1 foo bar
NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
claim1   Bound    pvc-ecf81046-51c4-4fa2-ada3-7797faa670b4   1Gi        RWO            gp3-csi        <unset>                 5h14m
Error from server (NotFound): persistentvolumeclaims "foo" not found
Error from server (NotFound): persistentvolumeclaims "bar" not found

You could add resultingPvc[pvcName] = true to the promOutput.Data.Result loop below so you only have to loop through it once, then check for missing PVC's to print error messages at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I still think this would be a useful change to make, but I don't consider it a blocking issue for this PR.

@dobsonj
Copy link
Member

dobsonj commented Oct 29, 2024

/retest

@ropatil010
Copy link

/test e2e-agnostic-ovn-cmd

@gmeghnag
Copy link
Author

/retest

@gmeghnag
Copy link
Author

DO NOT MERGE, I have to fix first https://issues.redhat.com/browse/OCPBUGS-44011

@gmeghnag
Copy link
Author

gmeghnag commented Nov 6, 2024

https://issues.redhat.com/browse/OCPBUGS-44011 fixed.

@gmeghnag
Copy link
Author

gmeghnag commented Nov 7, 2024

/retest

persistentVolumeClaimsBytes, err := getWithBearer(ctx, getRoute, "openshift-monitoring", "prometheus-k8s", uri, bearerToken, insecureTLS)
if err != nil {
klog.V(4).Info(fmt.Errorf("failed to get persistentvolumeclaims from Prometheus: %w", err))
return persistentVolumeClaimsBytes, fmt.Errorf("metrics not available yet")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from the pre-merge testing by @ropatil010: The "metrics not available yet" might be quite misleading message, i.e. in the case the user lacks permissions to get to the metrics. Wouldn't it be easier to just return the the original error here as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "Wouldn't it be easier to just return the the original error here as well?"
    Yes, I agree and it was like this before, then I changed the behaviour to accommodate the request made here.
    My mistake for not insisting on keeping the old behaviour, let me know whether to proceed with the change @tsmetana .

Copy link
Member

@ardaguclu ardaguclu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe that this is a cool feature and thanks for spending time on this. I think it requires major code refactorings though.

/label tide/merge-method-squash

@@ -0,0 +1,382 @@
package top
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you'll be the maintainer of this command, there should be a separate OWNERS file like this https://github.com/openshift/oc/blob/master/pkg/cli/admin/inspectalerts/OWNERS

pkg/cli/admin/top/persistentvolumeclaims.go Outdated Show resolved Hide resolved
pkg/cli/admin/top/persistentvolumeclaims.go Show resolved Hide resolved
pkg/cli/admin/top/persistentvolumeclaims.go Outdated Show resolved Hide resolved
pkg/cli/admin/top/persistentvolumeclaims.go Outdated Show resolved Hide resolved
pkg/cli/admin/top/persistentvolumeclaims.go Show resolved Hide resolved
pkg/cli/admin/top/persistentvolumeclaims.go Outdated Show resolved Hide resolved
pkg/cli/admin/top/persistentvolumeclaims.go Show resolved Hide resolved
pkg/cli/admin/top/persistentvolumeclaims.go Outdated Show resolved Hide resolved
pkg/cli/admin/top/persistentvolumeclaims.go Show resolved Hide resolved
@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 19, 2024
@gmeghnag
Copy link
Author

@ardaguclu Thank you a lot for the code review!

Before spending time answering your questions and proceeding with the code modifications I would like to ask @tsmetana if doing so will make this PR merged, because, from our last chat I understood that you (with the storage leads) chose to go upstream first with this functionality, and doing so will probably need a different approach since Prometheus is not provided by default in vanilla Kubernetes.

In short, I don't want to spend time modifying the code if this PR will not be merged anyway.
Instead, if you think we can merge this PR in a future oc release, I'll work on that tomorrow.

Thanks

@tsmetana
Copy link
Member

@gmeghnag labelling the feature as experimental is exactly what would make it merge even if we want to implement this also in upstream kubectl. The experimental / TP nature gives us the possibility to have the feature early in oc, but be able to change it later in case the kubectl implementation would differ in an incompatible way.

Thanks for the awesome work!

@gmeghnag
Copy link
Author

@ardaguclu @tsmetana I've updated the PR, please let me know if anything else is needed! (:

@gmeghnag
Copy link
Author

/retest-required

Copy link
Contributor

openshift-ci bot commented Nov 21, 2024

@gmeghnag: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn f3eebfa link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ardaguclu
Copy link
Member

Thank you. I quickly skimmed over the current code but we need to iterate a couple of times to be ready for merge. I think, it would be safer to defer this to 4.19 as unfortunately I don't have time for 4.18.

},
}

cmd.Flags().BoolVarP(&o.allNamespaces, "all-namespaces", "A", o.allNamespaces, "If present, list the pvc usage across all namespaces. Namespace in current context is ignored even if specified with --namespace")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't -A (aka --all-namespaces) coming as built-in just like --namespace and we don't have to define in here additionally?.

type persistentVolumeClaimInfo struct {
Namespace string
Name string
UsagePercentage string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this field is exported, although persistentVolumeClaimInfo is not exported?

for _, promOutputDataResult := range promOutput.Data.Result {
namespaceName := promOutputDataResult.Metric["namespace"]
pvcName := promOutputDataResult.Metric["persistentvolumeclaim"]
usagePercentage := promOutputDataResult.Value[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know promOutputDataResult.Value's length is greater than 1 and we don't panic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants