You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tried to log into https://aws-uswest2.pangeo.io today but was not getting a server. turns out our core node (running on spot) was shut down and when the cluster-autoscaler pod was recreated it was crashing due to kubernetes/autoscaler#3506.
just wanted to document here another case of things running on k8s will eventually break due to version incompatibilities or other things. The fix was pretty easy, and configuration files are here #968.
Just had to give the autoscaler pod more memory, so we went from:
- image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.5
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
to:
- image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.7
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 600Mi
requests:
cpu: 100m
memory: 600Mi
The text was updated successfully, but these errors were encountered:
tried to log into https://aws-uswest2.pangeo.io today but was not getting a server. turns out our core node (running on spot) was shut down and when the cluster-autoscaler pod was recreated it was crashing due to kubernetes/autoscaler#3506.
just wanted to document here another case of things running on k8s will eventually break due to version incompatibilities or other things. The fix was pretty easy, and configuration files are here #968.
Just had to give the autoscaler pod more memory, so we went from:
to:
The text was updated successfully, but these errors were encountered: