You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would you like to be added:
Add a webhook to prevent eviction of pods on kosmos NotReady nodes
Why is this needed:
When the kosmos node remains not ready for more than 5 minutes, the node-controller of the controller-manager initiates eviction, which is equivalent to deleting pods. However, this approach may not always be appropriate because when the cluster reconnects, it leads to pod restarts.
The NotReady state of a node is more likely due to a kosmos service outage or cross-cluster network issues rather than a physical node failure. Therefore, there is a need for a mechanism to prevent the node-controller from deleting pods.
Since deletion is irreversible, one proposed solution is to intercept the pod deletion operation for the system:serviceaccount:kube-system:node-controller. Certain conditions need to be met before interception, such as utils.IsKosmosNode(node) && utils.IsNotReady(node) && v.needToPrevent(req.UserInfo.Username).
The text was updated successfully, but these errors were encountered:
What would you like to be added:
Add a webhook to prevent eviction of pods on kosmos NotReady nodes
Why is this needed:
When the kosmos node remains not ready for more than 5 minutes, the node-controller of the controller-manager initiates eviction, which is equivalent to deleting pods. However, this approach may not always be appropriate because when the cluster reconnects, it leads to pod restarts.
The NotReady state of a node is more likely due to a kosmos service outage or cross-cluster network issues rather than a physical node failure. Therefore, there is a need for a mechanism to prevent the node-controller from deleting pods.
Since deletion is irreversible, one proposed solution is to intercept the pod deletion operation for the system:serviceaccount:kube-system:node-controller. Certain conditions need to be met before interception, such as
utils.IsKosmosNode(node) && utils.IsNotReady(node) && v.needToPrevent(req.UserInfo.Username)
.The text was updated successfully, but these errors were encountered: