-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A deleted Machine can block cluster initialization indefinitely (flake) #5814
Comments
/assign @killianmuldoon /milestone v1.1 |
For a hint on reproducability, create X number of workload clusters simultaneously above the controller concurrency limit. The issue was reported when CAPV was being used. Would be interesting to see if it can be reproduced using CAPD. |
I can't seem to reproduce on CAPD as described - I've got the concurrency limits for clusters, machines and kubeadmbootstrap set to 1 (and a couple of combinations of them set and unset) and I'm able to initialize 10 cluster control planes simultaneously (though it's heating the room nicely 😄 ) |
I haven't been able to reproduce the flake - but there is no unlocking mechanism for the ControlPlaneInitMutex if a machine is deleted - I've been able to reproduce that locally and put a fix into the locking mechanism to check if the exiting lock is valid in #5824 @gab-satchi do you have some way to reproduce this flake? |
#5824 may be the source of this issue. This issue should be reopened if the test continues to be flaky. |
What steps did you take and what happened:
"A control plane is already being initialized, requeing until control plane is ready"
The theory is the lock failed to release here and the machine + kubeadm config was removed before the next reconcile which left the outdated configMap lock behind.
What did you expect to happen:
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
The text was updated successfully, but these errors were encountered: