add metric for skipped scaling events #5059

elmiko · 2022-07-28T15:37:42Z

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind feature

What this PR does / why we need it:

This change adds a new metric, skipped_scale_events_count, which will
record the number of times that the CA has chosen to skip a scaling
event. The metric contains a label for the scaling direction (up or down)
and the reason.

This patch includes usages for the new metric based on CPU or Memory
limits being reached in eiter a scale up or down.

User Story

As a cluster autoscaler user, I would like to create alerts to inform me when my cluster is exceeding its resource limits. Having a Prometheus metric to measure this would allow me to create automated alerting.

Which issue(s) this PR fixes:

n/a

Special notes for your reviewer:

none

Does this PR introduce a user-facing change?

A new metric ("cluster_autoscaler_skipped_scale_events_count") has been added to monitor when CPU and memory resource limits have been exceeded.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

This change adds a new metric, skipped_scale_events_count, which will record the number of times that the CA has chosen to skip a scaling event. The metric contains a label for the scaling direction (up or down) and the reason. This patch includes usages for the new metric based on CPU or Memory limits being reached in eiter a scale up or down.

mwielgus

/lgtm
/approve

k8s-ci-robot · 2022-08-08T12:18:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [mwielgus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

this change updates the resource limit alerts to use the new metric introduced in kubernetes/autoscaler#5059.

add metric for skipped scaling events

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 28, 2022

k8s-ci-robot requested review from aleksandra-malinowska and feiskyer July 28, 2022 15:38

jbartosik added the area/cluster-autoscaler label Aug 2, 2022

mwielgus approved these changes Aug 8, 2022

View reviewed changes

k8s-ci-robot assigned mwielgus Aug 8, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 8, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2022

k8s-ci-robot merged commit 3e25023 into kubernetes:master Aug 8, 2022

elmiko deleted the skipped-scale-metric branch August 9, 2022 18:45

elmiko added a commit to elmiko/cluster-autoscaler-operator that referenced this pull request Sep 14, 2022

update alerts for resource limits

14a6c32

this change updates the resource limit alerts to use the new metric introduced in kubernetes/autoscaler#5059.

elmiko mentioned this pull request Sep 14, 2022

Bug 1997396: update alerts for resource limits openshift/cluster-autoscaler-operator#250

Merged

elmiko added a commit to elmiko/cluster-autoscaler-operator that referenced this pull request Sep 14, 2022

update alerts for resource limits

7fbde2f

this change updates the resource limit alerts to use the new metric introduced in kubernetes/autoscaler#5059.

navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022

Merge pull request kubernetes#5059 from elmiko/skipped-scale-metric

ae5e774

add metric for skipped scaling events

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add metric for skipped scaling events #5059

add metric for skipped scaling events #5059

elmiko commented Jul 28, 2022

mwielgus left a comment

k8s-ci-robot commented Aug 8, 2022

add metric for skipped scaling events #5059

add metric for skipped scaling events #5059

Conversation

elmiko commented Jul 28, 2022

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

mwielgus left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 8, 2022