-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a CRD with broken converter webhook prevents GC controller from initialization #101078
Comments
/sig api-machinery |
please follow the issue template correctly: including k8s version is important. |
Sure, it happens on current K8s master branch - exact commit is
|
/assign @yliaog |
i think this is the same issue as reported in #90597 |
It might share root cause - informer sync in case of GC should non-block on CRDs (and possibly on other resources?) I guess we would need some metrics about unsynced informers to handle this properly. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
This seems to be this way by design, see this duplicate #96066 (comment) |
This is problematic for our environments as well. Unstable user-defined conversion webhooks break GC for unrelated resources, those unrelated resources eventually hit quota limits and render the environment unusable. Is there a recommended approach to this from the community? One naive solution that comes to mind is a config option for marking a CRD as non-blocking for GC. Then GC would only respect Without this, it's hard to allow users to specify conversion webhooks because k8s then takes a dependency on those services (which in our case, already take a dependency on k8s). |
I think I'd push to make gc stop blocking on discovery or informer sync at all, and make blockOwnerDeletion even more best effort. |
I'd like to stop honoring blockOwnerDeletion. :) |
cc @tkashem |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Is there any ongoing work for this? |
None that I know of. At first glance, removing the requirement that all informers be fully synced before GC starts/resumes seems reasonable to me, and would resolve this issue. |
OK, I'll try to work out a patch |
I am now working on this, will fix it soon. |
Hi @tossmilestone, What is the status of this? One year has passed. Is there any short term plan to fix this? Thanks! |
Sorry, I don't have the time right now to continue fixing this issue. If you're willing, you can help continue this work. Thank you! |
@rauferna not likely in a short term. A quick solution is that you delete the converter webhook when you find your CRD controller is not working. and add it back when it recovers. Or you deploy multiple replicas to avoid the downtime as possible. |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
/triage accepted |
What happened:
Creating a CRD with broken converter webhook prevents GC controller from initialization, which breaks on informer sync. Additionally, this issue is not visible until gc controller restarts - dynamically added crd resources with non-working converter webhook do not break running GC.
What you expected to happen:
GC controller should initialize with available informers. CRDs with broken converter webhook should not prevent GC controller from working on other resources.
How to reproduce it (as minimally and precisely as possible):
gc-bug.zip
The text was updated successfully, but these errors were encountered: