KCP resilience to machine disk space issues #3289
Labels
area/control-plane
Issues or PRs related to control-plane lifecycle management
kind/feature
Categorizes issue or PR as related to a new feature.
Milestone
What steps did you take and what happened:
Adapted this space quota tutorial to a KCP cluster a slightly modified version of this gist to fill up etcd. After a while I saw kube-apiserver crash on one of the machines, I tried to delete the bad machine but then ran into #2331 since the pods could not be drained.
I'm not really sure what went wrong, I think we could probably develop a faster method of filling up the etcd and maybe have some kind of simulation test of this that we can run periodically or something.
What did you expect to happen:
etcd should stop accepting writes, not sure how k8s is intended to behave when etcd can't accept new writes.
Anything else you would like to add:
This came up as part of #3185, we were looking at space quotas and the alarms. I didn't see any alarms getting raised before the apiserver crashed.
Environment:
kubectl version
): 1.18.2/kind feature
/area control-plane
The text was updated successfully, but these errors were encountered: