Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CAPD machine controller to reduce noisy error messages #8086

Closed
sbueringer opened this issue Feb 8, 2023 · 3 comments · Fixed by #8090
Closed

Optimize CAPD machine controller to reduce noisy error messages #8086

sbueringer opened this issue Feb 8, 2023 · 3 comments · Fixed by #8090
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sbueringer
Copy link
Member

sbueringer commented Feb 8, 2023

When implementing #7963 we discoverd that the CAPD machine reconciler is producing a lot of error messages when the APIserver of the workload cluster is not reachable yet. Roughly here: https://github.com/kubernetes-sigs/cluster-api/blob/main/test/infrastructure/docker/internal/controllers/dockermachine_controller.go#L337-L340

We tried to add the following statement above, but this breaks the standalone machine case (covered by the KCP adoption e2e test):

	// If the control plane is not yet initialized, there is no API server to contact to get the ProviderID for the Node
	// hosted on this machine, so return early.
	// NOTE: we are using RequeueAfter with a short interval in order to make test execution time more stable.
	if !conditions.IsTrue(cluster, clusterv1.ControlPlaneInitializedCondition) {
		return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
	}

The problem is that in the standalone case we have to set the providerID and only afterwards the ControlPlaneInitialized condition is set to true. So with this statement we're in a deadlock.

Options:

  • only requeue if the cluster is using a control plane provider

/kind cleanup

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 8, 2023
@sbueringer
Copy link
Member Author

cc @fabriziopandini
Hope I captured it correctly

@fabriziopandini
Copy link
Member

/triage accepted

only requeue if the cluster is using a control plane provider

Makes sense for me

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 8, 2023
@sbueringer
Copy link
Member Author

/assign

Should have a few minutes to open a PR relatively soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants