-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add volume kubelet_volume_stats_health_abnormal to kubelet #105585
add volume kubelet_volume_stats_health_abnormal to kubelet #105585
Conversation
Hi @fengzixu. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
b05c338
to
3bff32b
Compare
/ok-to-test |
@fengzixu: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @xing-yang |
/assign @fengzixu |
/ok-to-test |
/sig storage |
/retest |
Looks good from my side in terms of instrumentation. Even though the metric that is introduced has two unbounded dimensions (namespace and PVC), the cardinality is limited to the number of PVC created in the cluster, so it is fine by me. /lgtm |
|
||
// VolumeHealthStats contains data about volume health | ||
// +optional | ||
VolumeHealthStats *VolumeHealthStats `json:"volumeHealthStats,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be included in the summary API? In general, the summary API contains metrics about resource usage, whereas the kubelet's /metrics endpoint contains the kubelet's operational metrics (e.g. health, latency, etc). Based on usage i'm aware of, people tend to scrape either the summary API or /metrics/cadvisor, and then also scrape /metrics as well. For users that do this, they would end up with duplicate metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dashpole, let me know if I missed anything.
I see that Summary API contains PodStats:
https://github.com/kubernetes/kubernetes/blob/v1.24.0-alpha.3/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go#L28
PodStats contains VolumeStats:
https://github.com/kubernetes/kubernetes/blob/v1.24.0-alpha.3/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go#L127
VolumeHealthStats is added in VolumeStats here. Should this be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying you should probably add these metrics to either /metrics
or the summary API, but probably not both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... looks we've already done that for a bunch of volume metrics. You can resolve this comment in that case.
/approve |
/assign @mrunalp |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dashpole, fengzixu, jonyhy96, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
2 similar comments
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
@@ -120,7 +126,16 @@ func (collector *volumeStatsCollector) CollectWithStability(ch chan<- metrics.Me | |||
addGauge(volumeStatsInodesDesc, pvcRef, float64(*volumeStat.Inodes)) | |||
addGauge(volumeStatsInodesFreeDesc, pvcRef, float64(*volumeStat.InodesFree)) | |||
addGauge(volumeStatsInodesUsedDesc, pvcRef, float64(*volumeStat.InodesUsed)) | |||
addGauge(volumeStatsHealthAbnormalDesc, pvcRef, convertBoolToFloat64(volumeStat.VolumeHealthStats.Abnormal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generates
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1d5d77b]
goroutine 2564 [running]:
k8s.io/kubernetes/pkg/kubelet/metrics/collectors.(*volumeStatsCollector).CollectWithStability(0x0?, 0xc0016df860)
pkg/kubelet/metrics/collectors/volume_stats.go:129 +0x89b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that makes sense since volumeStat.VolumeHealthStats
looks like an optional field:
// +optional
VolumeHealthStats *VolumeHealthStats `json:"volumeHealthStats,omitempty"`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest reverting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fengzixu Can you take a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File this PR to fix: #108758
…-health add volume kubelet_volume_stats_health_abnormal to kubelet
re-push "add volume kubelet_volume_stats_health_abnormal to kubelet #105585"
…-health add volume kubelet_volume_stats_health_abnormal to kubelet
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR add
kubelet_volume_stats_health_abnormal
metrics to kubelet based on this kepWhich issue(s) this PR fixes:
Fixes kubernetes/enhancements#2900
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: