Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster autoscaler was broken on AWS #969

Open
scottyhq opened this issue Jul 22, 2021 · 0 comments
Open

cluster autoscaler was broken on AWS #969

scottyhq opened this issue Jul 22, 2021 · 0 comments

Comments

@scottyhq
Copy link
Member

tried to log into https://aws-uswest2.pangeo.io today but was not getting a server. turns out our core node (running on spot) was shut down and when the cluster-autoscaler pod was recreated it was crashing due to kubernetes/autoscaler#3506.

just wanted to document here another case of things running on k8s will eventually break due to version incompatibilities or other things. The fix was pretty easy, and configuration files are here #968.

Just had to give the autoscaler pod more memory, so we went from:

        - image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.5
           name: cluster-autoscaler
           resources:
             limits:
               cpu: 100m
               memory: 300Mi
             requests:
               cpu: 100m
               memory: 300Mi

to:

        - image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.7
           name: cluster-autoscaler
           resources:
             limits:
               cpu: 100m
               memory: 600Mi
             requests:
               cpu: 100m
               memory: 600Mi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant