-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPUThrottlingHigh alert for metrics-server-nanny (addon-resizer:1.8.11-gke.0) #4141
Comments
We are seeing the same issue on GKE. > 90% CPU throttling on the metric server pod and same logs as you have posted. Will let you know if I discover anything. This sounds familiar to a known issue surrounding CFS quotas. kubernetes/kubernetes#67577 |
I just ran into this as well. For now I've disabled the alert, but it would be nice to get some sort of fix. Strangely this suddenly started happening on a cluster that is over a year old. 8 hours ago the metrics-server restarted (same version) and these alerts started popping up... no idea why it didn't happen before. |
I came across this a couple months ago and I believe the problem should be resolved by #3833 (in 1.8.12) and/or #4112 (which
the poll period was broken and as a result it is attempting to scrape the apiserver metrics endpoint (which exports tens of thousands of metrics) every 10 s rather than the intended 5 min: - command:
- /pod_nanny
- --config-dir=/etc/config
- --cpu=40m
- --extra-cpu=0.5m
- --memory=35Mi
- --extra-memory=4Mi
- --threshold=5
- --deployment=metrics-server-v0.3.6
- --container=metrics-server
- --poll-period=300000
- --estimator=exponential
- --scale-down-delay=24h
- --minClusterSize=5
- --use-metrics=true |
Release 1.8.14 has just come out that contains the fix for both #3833 and #4112: https://github.com/kubernetes/autoscaler/releases/tag/addon-resizer-1.8.14 Could you verify if this fixes the issue with CPU? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Which component are you using?: addon-resizer and metrics-server
What version of the component are you using?: 1.8.11-gke.0
Component version: 1.8.11-gke.0
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
GKE version 1.19.9-gke.1900
What did you expect to happen?:
Expected no Alert from this component.
What happened instead?:
Received the Alert, and the metrics seem to indicate that throttling is regularly over 90%.
How to reproduce it (as minimally and precisely as possible):
Create basic GKE cluster and install the kube-prometheus monitoring stack.
Anything else we need to know?:
Could be related to issue #3833
The metrics-server-nanny logs look fine, however the metrics-server in the same pod has a lot of errors in the logs saying that metrics cannot be collected.
I created a separate issue for the metrics-server: kubernetes-sigs/metrics-server#783
The text was updated successfully, but these errors were encountered: