-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop using full CRD list as a fallback to get ObjectMetadata for UpdateReferenceAPIContract #5686
Comments
+1, but we should discuss this with providers implementers at the office hours and understand how this forcing function impacts on them in order to figure out how fast this can be rolled out |
Result from the CAPI meeting today: Tasks:
|
/milestone v1.2 |
Short update: I will definitely continue with this issue in v1.2. It's just relatively far down on the priority queue. |
/assign |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen |
I implemented a unit test on https://github.com/sbueringer/cluster-api/blob/poc-clusterctl-crd/cmd/clusterctl/client/config_test.go#L92-L141 to check if at least all builtin providers are using correct CRD names. It turns out almost all of them do, only one single CRD is wrong:
On one side that is good news on the other side it brings up the question how this CRD could be renamed. Names of resources are immutable. This means the only way to rename the resource is to delete the CRD (and thus the corresponding CRs and then create it again. Which means you're losing all your CRs. I'm not sure if that is something that we can force providers (metal3 + potentially providers we don't know about) to do at this point. Does anyone have other ideas how CRDs could be renamed without losing CRs? I think otherwise we have to close this issue and live with our current implementation. |
@fabriziopandini @CecileRobertMichon @enxebre @vincepri Any opinions / suggestions? |
@sbueringer I just checked it in metal3 provider repo, I think we do define the kind for that specific CRD correctly https://github.com/metal3-io/cluster-api-provider-metal3/blob/main/config/crd/bases/infrastructure.cluster.x-k8s.io_metal3datas.yaml#L14, would you mind pointing out where pluralization is coming incorrectly? AFAIU |
No worries. I think the issue is that Afaik our assumption is that kubebuilder is also using Btw kind and group is absolutely correct, it's literally just the |
ou yeah, since |
I think we should continue with the plan described in #5686 (comment), add documentation, create awareness via email/reminder in the office hours, add warnings, and sometime in the future, we can finally drop support for the fallback mechanism. |
I'll take another look today. I just realized that the metal3 CR might even be okay (as we probably use the relevant func only under certain circumstances) |
Thinking about "clusterctl should emit a warning (with the hint that support will be dropped in a future release)" The idea is to add a pre-check to It's easy to identify an invalid name, but not all CRDs deployed by providers have to comply with our naming conventions. Only the ones referenced by ClusterClass, Cluster, KCP, MD, MS, MachinePool, Machine. This includes:
To avoid false positives it would be nice to only validate these types of CRDs. The problem is there is no way to identify them. We could try to only validate CRDs which:
This still leaves room for false positives:
Considering all this, I would just validate the names of all CRDs and print a warning which states when exactly the CRD is really problematic. Over time we can add known false positives to an allow list of the precheck (e.g. metal3data) to avoid the warning. @fabriziopandini WDYT? |
what about validating all CRDs except ones with a well know annotation, so provider authors can "silence" the warning without sending a PR in CAPI note: I think this is acceptable because research work on all the listed providers provided evidences that there are very few (one) CRD not compliant with the rule |
Sure, can do. |
/triage accepted |
Detailed Description
Context: We automatically update references in CAPI to newer versions if there is a newer version with the same contract for a CRD.
Example: The MachineSet controller automatically updates references to a InfrastructureMachineTemplate if there is a newer apiVersion for the InfrastructureMachineTemplate which complies to the same CAPI contract.
This functionality is implemented via the
UpdateReferenceAPIContract
func. It uses the labels on a CRD to calculate if there is a newer version of a CRD complying to the same CAPI contract.To avoid retrieving/caching whole CRDs, the func uses
GetGVKMetadata
to retrieve the PartialObjectMetadata of a CRD.GetGVKMetadata
depends on that the CRD is correctly named, i.e. the name is :(e.g.
infrastructure.cluster.x-k8s.io/DockerMachineTemplate
=>dockermachinetemplates.infrastructure.cluster.x-k8s.io
)In cases where the name is different we do a full CRD list and identify the correct CRD by checking the GK inside the CRD. The problem is that in this case the controller is listing/watching/caching all CRDs in the cluster which leads to high memory usage of the controller. This affects the CAPI core and the KCP controller. It looks like the func is not used outside of the core repo: https://cs.k8s.io/?q=UpdateReferenceAPIContract&i=nope&files=&excludeFiles=&repos=
To avoid running into this issue we would propose to drop the fallback mechanism. This is also a forcing function to name CRDs correctly.
WDYT?
Tasks are roughly:
UpdateReferenceAPIContract
(probably create a copy of the func, deprecate the old one (?))/kind cleanup
The text was updated successfully, but these errors were encountered: