-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use --disable-eviction
option when drain node
#6929
Comments
On first look it seems strange to use Cluster API to implicitly overwrite user intention expressed in a pod disruption budget. It seems like a better solution to this issue would be to surface the reason for the lack of rollout of machines at the Cluster API level so users can better configure their workloads. @Bo0km4n have you got a toy example I could test to see the impact of PDB blocking rollouts? |
@killianmuldoon I think you can just deploy something like that:
Selector has to match some pods that you have (e.g. just let it match the capi controller on a self-hosted cluster). If you want a always blocking PDB just use minAvailable 1 and a Deployment with 1 replica. Current behavior is probably:
|
@killianmuldoon
I didn't know that. If I set the timeout to machine I want drain, is it possible to forcibly delete a Node that is running a Pod that cannot be evicted? |
That is my understanding, yes. |
Thanks @sbueringer . |
Not sure. This might depend on your infrastructure. CAPI will just delete the node object and then the corresponding infra. |
To clarify, capi will always enforce the underlying infra is gone before deleting the Node to avoid potential stateful issues, #2565.
At the moment CAPI will wait indefinitely for volumes to be dettached #4945, your kcm cloud provider should take care of it. There's also ongoing discussion about enabling and optional timeout while waiting for the volume #6285.
Yes. This issue is asking for the behaviour supported via nodeDrainTimeout. @Bo0km4n if this makes sense we can close this as it's supported by nodeDrainTimeout and keep any related discussion in the issued linked above. |
@enxebre Thank you for your information. I will try discuss about above issues. Thank you guys. I close this issue. |
User Story
If cluster user had created PDB resource, machine deleting would stack at process of node drain
So after a few drain attempts, I hope that cluster-api machine controller try drain with
--disable-eviction
option.This problem often occurs when a user performs a RollingUpdate of MachineDeployment.
Detailed Description
I propose my idea to implement above opinion.
Check node drain timeout with using nodeDrainTimeoutExceeded.
Next, If the elapsed time by drain node is exceeded, machine controller enable
--disable-eviction
optionin drainNode function.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
/kind feature
The text was updated successfully, but these errors were encountered: