-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metric for skipped scaling events #5059
Conversation
This change adds a new metric, skipped_scale_events_count, which will record the number of times that the CA has chosen to skip a scaling event. The metric contains a label for the scaling direction (up or down) and the reason. This patch includes usages for the new metric based on CPU or Memory limits being reached in eiter a scale up or down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko, mwielgus The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
this change updates the resource limit alerts to use the new metric introduced in kubernetes/autoscaler#5059.
this change updates the resource limit alerts to use the new metric introduced in kubernetes/autoscaler#5059.
add metric for skipped scaling events
Which component this PR applies to?
cluster-autoscaler
What type of PR is this?
/kind feature
What this PR does / why we need it:
This change adds a new metric, skipped_scale_events_count, which will
record the number of times that the CA has chosen to skip a scaling
event. The metric contains a label for the scaling direction (up or down)
and the reason.
This patch includes usages for the new metric based on CPU or Memory
limits being reached in eiter a scale up or down.
User Story
As a cluster autoscaler user, I would like to create alerts to inform me when my cluster is exceeding its resource limits. Having a Prometheus metric to measure this would allow me to create automated alerting.
Which issue(s) this PR fixes:
n/a
Special notes for your reviewer:
none
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: