-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KCP is scaling continuosly #3353
Comments
Adding some more data to this scenario for better understanding. So even KCP replicas is set to 1, KCP is provisioning new controlplane, get provisioned and starts deleting the first controlplane. When its deleting the controlplane, KCP starts to provision new controlplane.
|
@smoshiur1237 is it possble to get the output of |
bmh is not the actual infrastructure object, that would be the Metal3Machine. BareMetalHost is one level under. We do not run any command to trigger a rollout of KCP. it directly starts when we apply the manifests. |
I think it would still be helpful to see the output of the kcp resource, machine resources, and infrastructure machine resources to try to get a better idea of what could be causing the controller to be confused and consistently trigger a rolling upgrade |
An example of resources is here : https://kubernetes.slack.com/files/UF98WRP8R/F0177HNULUS/rollout-debug.yaml |
Followup from Slack: we think that it's the result of the KCP not having a We want to handle |
Fix in flight; possible workaround: set |
What steps did you take and what happened:
We have been debugging KCP problem in Metal3 last two days. We deploy KCP with one replica and after the infrastructure is ready and our baremetal node is provisioned, it stays up for a while. But then KCP starts to scale up another replica and deletes the first one. This stay in the loop, so each time new replica is ready, KCP starts to scale again.
What did you expect to happen:
KCP to be provisioned with single replica.
Anything else you would like to add:
KCP controller logs are full of:
Problem is seen in KCP status as well.
Environment:
/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
The text was updated successfully, but these errors were encountered: