Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add volume kubelet_volume_stats_health_abnormal to kubelet #105585

Merged
merged 13 commits into from
Mar 15, 2022

Conversation

fengzixu
Copy link
Contributor

@fengzixu fengzixu commented Oct 9, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR add kubelet_volume_stats_health_abnormal metrics to kubelet based on this kep

Which issue(s) this PR fixes:

Fixes kubernetes/enhancements#2900

Special notes for your reviewer:

Does this PR introduce a user-facing change?

add one metrics(`kubelet_volume_stats_health_abnormal`) of volume health state to kubelet

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/pull/2900

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 9, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @fengzixu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Oct 9, 2021
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. area/kubelet labels Oct 9, 2021
@k8s-ci-robot k8s-ci-robot added sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 9, 2021
@fengzixu fengzixu force-pushed the improvement-volume-health branch from b05c338 to 3bff32b Compare October 9, 2021 06:37
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 9, 2021
@fengzixu
Copy link
Contributor Author

fengzixu commented Oct 9, 2021

/ok-to-test

@k8s-ci-robot
Copy link
Contributor

@fengzixu: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fengzixu
Copy link
Contributor Author

fengzixu commented Oct 9, 2021

/assign @xing-yang

@fengzixu
Copy link
Contributor Author

fengzixu commented Oct 9, 2021

/assign @fengzixu

@pacoxu
Copy link
Member

pacoxu commented Oct 9, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 9, 2021
@pacoxu
Copy link
Member

pacoxu commented Oct 9, 2021

/sig storage
/priority important-soon
it should be in 1.23 scope as kubernetes/enhancements#2900.
/cc msau42 gnufied

@k8s-ci-robot k8s-ci-robot requested a review from gnufied October 9, 2021 10:29
@fengzixu
Copy link
Contributor Author

/retest

@dgrisonnet
Copy link
Member

Looks good from my side in terms of instrumentation. Even though the metric that is introduced has two unbounded dimensions (namespace and PVC), the cardinality is limited to the number of PVC created in the cluster, so it is fine by me.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 26, 2022
@xing-yang
Copy link
Contributor

/assign @dashpole
@dashpole Can you review this?


// VolumeHealthStats contains data about volume health
// +optional
VolumeHealthStats *VolumeHealthStats `json:"volumeHealthStats,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be included in the summary API? In general, the summary API contains metrics about resource usage, whereas the kubelet's /metrics endpoint contains the kubelet's operational metrics (e.g. health, latency, etc). Based on usage i'm aware of, people tend to scrape either the summary API or /metrics/cadvisor, and then also scrape /metrics as well. For users that do this, they would end up with duplicate metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dashpole, let me know if I missed anything.
I see that Summary API contains PodStats:
https://github.com/kubernetes/kubernetes/blob/v1.24.0-alpha.3/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go#L28
PodStats contains VolumeStats:
https://github.com/kubernetes/kubernetes/blob/v1.24.0-alpha.3/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go#L127
VolumeHealthStats is added in VolumeStats here. Should this be sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying you should probably add these metrics to either /metrics or the summary API, but probably not both

Copy link
Contributor

@dashpole dashpole Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see... looks we've already done that for a bunch of volume metrics. You can resolve this comment in that case.

@dashpole
Copy link
Contributor

dashpole commented Mar 9, 2022

/approve

@gnufied
Copy link
Member

gnufied commented Mar 9, 2022

/assign @mrunalp

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, fengzixu, jonyhy96, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

2 similar comments
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 1a5abe5 into kubernetes:master Mar 15, 2022
@@ -120,7 +126,16 @@ func (collector *volumeStatsCollector) CollectWithStability(ch chan<- metrics.Me
addGauge(volumeStatsInodesDesc, pvcRef, float64(*volumeStat.Inodes))
addGauge(volumeStatsInodesFreeDesc, pvcRef, float64(*volumeStat.InodesFree))
addGauge(volumeStatsInodesUsedDesc, pvcRef, float64(*volumeStat.InodesUsed))
addGauge(volumeStatsHealthAbnormalDesc, pvcRef, convertBoolToFloat64(volumeStat.VolumeHealthStats.Abnormal))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generates

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1d5d77b]
goroutine 2564 [running]:
k8s.io/kubernetes/pkg/kubelet/metrics/collectors.(*volumeStatsCollector).CollectWithStability(0x0?, 0xc0016df860)
        pkg/kubelet/metrics/collectors/volume_stats.go:129 +0x89b

#108715 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that makes sense since volumeStat.VolumeHealthStats looks like an optional field:

    // +optional
	VolumeHealthStats *VolumeHealthStats `json:"volumeHealthStats,omitempty"`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest reverting this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fengzixu Can you take a look at this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let me check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File this PR to fix: #108758

fengzixu pushed a commit to fengzixu/kubernetes that referenced this pull request Mar 17, 2022
…-health

add volume kubelet_volume_stats_health_abnormal to kubelet
k8s-ci-robot added a commit that referenced this pull request Mar 30, 2022
re-push "add volume kubelet_volume_stats_health_abnormal to kubelet #105585"
muyangren2 pushed a commit to muyangren2/kubernetes that referenced this pull request Jul 14, 2022
…-health

add volume kubelet_volume_stats_health_abnormal to kubelet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.