improve kubeadm's preflight and cluster health assurance #2096
Labels
area/test
kind/design
Categorizes issue or PR as related to design.
kind/feature
Categorizes issue or PR as related to a new feature.
priority/important-longterm
Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone
with the recent failures that kubeadm exposed in cluster-api (see kubernetes-sigs/cluster-api#2769) next to retries and generic robustness there is something else we can improve.
as suggested by @timothysc we have the potential of extending the kubeadm assurance that a cluster is a good state using preflight or a tool such as the node-problem-detector (NPD) with the idea that a node should fail early instead of retrying everywhere in it's phases.
however, from my investigation some time ago the NPD was not very actively maintained.
we have some options that can be discussed:
related issues about kubeadm join robustness and retries (that @fabriziopandini recently logged):
#2094
#2093
#2092
#2091
#2095
The text was updated successfully, but these errors were encountered: