-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My pod is in the CrashLoopBackOff state after configuring cluster-autoscaler #4220
Comments
Same issue here with version |
Hi. was in same situation, do you use requests / limits settings for cluster-autoscaler? About 2 weeks ago, there was enough limit 300M RAM, now 500M is required. If you use lower number, OOM kills appear. |
I do use a |
@rubroboletus It worked, thanks! :) |
* Fixes: Young-ook#88 * Original issue: kubernetes/autoscaler#4220
Confirmed I am running an 1.20 cluster with a node group using the 22-07-2021 AMI linux. thanks. |
* Fixes: #88 * Original issue: kubernetes/autoscaler#4220 Co-authored-by: Abel Garcia Dorta <[email protected]>
Update default values for memory consumption to fix this issue: kubernetes/autoscaler#4220
Still seeing this problem here with no limit (also tried with limits 600MiB as suggested by AWS docs)
Container logs show this repeated go routine:
Does anyone have any idea what's going on here? |
This was caused by the incorrect AWS role trust policy - would've been a bit easier to debug if there were helpful error messages but my fault for not following the aws instructions carefully. |
@seunggs did you get any solution
|
My issue was due to an incorrect IAM role trust policy (I was using Pulumi to generate it and discovered that my Pulumi code was not generating the policy correctly) - your issue also seems related to permissions. See this issue that seems to be related: #3216 |
Same issue here. Fix the policy ARN in cluster-autoscale service account solved. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
For people with the same issue using terraform and aws this can help to auto-generate the correct policy for the service account of the autoscaler: |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi, I am facing the same issue with eks kubernetes version 1.21 cluster-autoscaler-5dd6459897-mpqf8 0/1 CrashLoopBackOff 7 13m this is the log I see, I applied the changed from the memory from 300Mi to 500Mi but still getting the same error, also open id is in the trusted relationships goroutine 285 [sync.Cond.Wait]: |
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
v1.17.3
Component version:
What k8s version are you using (
kubectl version
)?:kubectl -n kube-system version --short
Client Version: v1.21.1
Server Version: v1.17.17-eks-c5067d
WARNING: version difference between client (1.21) and server (1.17) exceeds the supported minor version skew of +/-1
What environment is this in?:
aws eks
What did you expect to happen?:
I'm expecting to get my first cluster-autoscaler to be setup and working. Meaning replace my ASG.
What happened instead?:
Getting exactly this error reported by https://aws.amazon.com/premiumsupport/knowledge-center/eks-pod-status-troubleshooting/
$ kubectl describe po crash-app-6847947bf8-28rq6
Name: crash-app-6847947bf8-28rq6
Namespace: default
Priority: 0
PriorityClassName:
Node: ip-192-168-6-51.us-east-2.compute.internal/192.168.6.51
Start Time: Wed, 22 Jan 2020 08:42:20 +0200
Labels: pod-template-hash=6847947bf8
run=crash-app
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.29.73
Controlled By: ReplicaSet/crash-app-6847947bf8
Containers:
main:
Container ID: docker://6aecdce22adf08de2dbcd48f5d3d8d4f00f8e86bddca03384e482e71b3c20442
Image: alpine
Image ID: docker-pullable://alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d
Port: 80/TCP
Host Port: 0/TCP
Command:
/bin/sleep
1
State: Waiting
Reason: CrashLoopBackOff
...
Events:
Type Reason Age From Message
Normal Scheduled 47s default-scheduler Successfully assigned default/crash-app-6847947bf8-28rq6 to ip-192-168-6-51.us-east-2.compute.internal
Normal Pulling 28s (x3 over 46s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Pulling image "alpine"
Normal Pulled 28s (x3 over 46s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Successfully pulled image "alpine"
Normal Created 28s (x3 over 45s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Created container main
Normal Started 28s (x3 over 45s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Started container main
Warning BackOff 12s (x4 over 42s) kubelet, ip-192-168-6-51.us-east-2.compute.internal Back-off restarting failed container
How to reproduce it (as minimally and precisely as possible):
You need to have the same eks . kubectl, cluster-autoscaler versions
kubectl -n kube-system version --short
Client Version: v1.21.1
Server Version: v1.17.17-eks-c5067d
WARNING: version difference between client (1.21) and server (1.17) exceeds the supported minor version skew of +/-1
kubectl -n kube-system get node -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion
NAME VERSION
ip-10-44-17-206.us-west-2.compute.internal v1.17.12-eks-7684af
ip-10-44-20-171.us-west-2.compute.internal v1.17.12-eks-7684af
The text was updated successfully, but these errors were encountered: