-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster resource consumption metrics #3983
Cluster resource consumption metrics #3983
Conversation
i took an approach of calculating the current counts once per interval in the I have left the memory count in bytes as that is what we use internally, but I am not 100% sure this is the best practice for the metrics and I noticed the numbers get quite large. happy for any suggestions about making this better, perhaps use Gibibytes or Mebibytes here? cc @gjtempleton ptal |
@@ -27,6 +27,11 @@ All the metrics are prefixed with `cluster_autoscaler_`. | |||
| nodes_count | Gauge | `state`=<node-state> | Number of nodes in cluster. | | |||
| unschedulable_pods_count | Gauge | | Number of unschedulable ("Pending") pods in the cluster. | | |||
| node_groups_count | Gauge | `node_group_type`=<node-group-type> | Number of node groups managed by CA. | | |||
| max_nodes_count | Gauge | | Maximum number of nodes in all node groups. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this metric was missing from the doc, but since it's somewhat related to resource counting i decided to add the description.
I'm slightly uncomfortable with the metric naming as is. I realise using The metrics as written also don't make clear the unit being represented as Prometheus docs suggest is best practice and is consistent with metrics like kube state metrics' pod metrics, I wonder if these could both be solved by the name being something like:
|
In terms of how often the metrics are evaluated and using bytes as the unit, I'm all in favour of both as they currently are. Using mebibytes or larger would lead to us being inconsistent with a number of existing metrics across the kubernetes ecosystem (e.g. kube state metrics) and most tooling for querying these metrics is smart enough to handle the large number aspect for users and translate into the most relevant unit for them. |
thanks for the review @gjtempleton , happy to change the names. i tend to agree with your reasoning. |
This change adds 4 metrics that can be used to monitor the minimum and maximum limits for CPU and memory, as well as the current counts in cores and bytes, respectively. The four metrics added are: * `cluster_autoscaler_cpu_limits_cores` * `cluster_autoscaler_cluster_cpu_current_cores` * `cluster_autoscaler_memory_limits_bytes` * `cluster_autoscaler_cluster_memory_current_bytes` This change also adds the `max_cores_total` metric to the metrics proposal doc, as it was previously not recorded there. User story: As a cluster autoscaler user, I would like to monitor my cluster through metrics to determine when the cluster is nearing its limits for cores and memory usage.
cc12a63
to
a24ea6c
Compare
updated
|
Sorry for the delay in reviewing this again. Thanks for taking the feedback on board. /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko, mwielgus The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This change adds 4 new metrics to the cluster autoscaler that help a user to determine the CPU cores and memory usage in their cluster.
There are 2 metrics devoted to the limits for cores and memory, with labels for maximum and minimum.
cluster_autoscaler_cpu_limits_cores
cluster_autoscaler_memory_limits_bytes
There are also 2 metrics devoted to the current count of cores and memory in bytes.
cluster_autoscaler_cluster_cpu_current_cores
cluster_autoscaler_cluster_memory_current_bytes
Sample scrape of new metrics on a running cluster:
User Story
As a cluster autoscaler user, I would like to monitor my cluster through metrics to determine when the cluster is nearing its limits for cores and memory usage. By having metrics for the current counts as well as maximum and minimum limits, I will be able to effectively monitor my cluster.