-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When running a workload with a single control plane node the load balancers take 15 mins to provision #857
Comments
Guess this is another code path related to not using Availability Sets? We should probably consider that as mitigation. Anything that tries to look up IDs will fail with our current setup. It's hard to track down all the places individually. |
It is related but not the root cause, The root cause is the cache used in the controller-manager. I provided more details in kubernetes-sigs/cloud-provider-azure#363. I believe this could cause delays in a customer scenario where a node is added after the the cluster is provisioned causing a delay of the LB provisioning due the fact that the cache doesn't know about the node. |
I ran into this again trying to set up a single control plan test for windows. This appears to be an issue only in the VMAS sceario. There is VMSS tests that is a single node: cluster-api-provider-azure/test/e2e/azure_test.go Lines 260 to 261 in cb486c3
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
FYI, I hit this issue today. The only change I made from the default capi-quickstart.yaml was Let me know if you'd like for me to provide the full yaml. The public IP is available now and the ingress works; however, as you can see from the logs, it took ~10 minutes.
|
Oh, I also tried installing the Flannel CNI, but I don't think that should have impacted it. |
@lastcoolnameleft I think if you use the external cloud provider there are fixes available for this issue (see above in thread where #1216 is linked). Rather than using the default template, you'd use As an aside, perhaps, we should use the out of tree provider by default... |
@devigned @lastcoolnameleft the current version of external-cloud-provider we're using in the example template in CAPZ v0.4 doesn't have the fix yet unfortunately, the PR to bump the version (#1323) and enable the test that validates this behavior was blocked by another regression in cloud-provider, which is now released. You can work around it for now by editing your template to use version v0.7.4+ of cloud-provider, until we update the reference template. The in-tree fix will be in k8s 1.22+. Regarding using out of tree by default, v1.0.0 of out-of-tree provider just got released so it might be a good time that, tracking in #715 |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
I think we can close this now. This was fixed in the external cloud provider v0.7.4+ and k8s 1.22+. /close |
@CecileRobertMichon: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
status (as of 6/10/21):
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
When running a workload with a single control plane node the load balancers take 15 mins to provision.
Add the following to the
Creating a single control-plane cluster with 1 worker node
e2e test after cluster creation:Run the e2e test and the test will fail:
If you connect to the workload cluster you will see the service with the load balancer is there and after 15 mins will provision. Subsequent services with load balancers will provision quickly. The logs of the controller manager will contain:
What did you expect to happen:
That the tests should be able to provision a workload cluster and pass a e2e test that creates a load balancer.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
This is related to kubernetes-sigs/cloud-provider-azure#338
Environment:
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: