-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KCP triggers rollout after upgrade #8124
Comments
/triage accepted |
Thanks for the analysis 🙏🏼 . We should consider adding a linter check to enforce that we don't accidentally add defaulting using OpenAPI in KCP to avoid have the same problem again in the future. |
@ykakarap Great idea. I'll open a follow-up issue for the linter. |
If I understood this correctly, the issue happens because the diffing logic in KCP does not take into account the OpenAI defaulting logic? |
I believe you mean OpenAPI 😅. Yes. KCP logic runs the defaulting webhook code but that does not take care of applying any of the OpenAPI defaulting. |
Yup and then it's just racy when/if defaulting is done on the KCP and KubeadmConfig objects in the apiserver |
Yeah OpenAPI 😄 , I'm wondering if we can have a way to disable the use for any tree of fields that we compare in code |
Hm I think on our side in CAPI we could have either a linter (based on controller-tools libraries or just parsing the final CRDs) or maybe a unit test. I guess the most hacky / trivial thing is just to write a small go binary which parses the relevant CRDs and then fails if it finds a default value in a specific sub-tree. On controller-tools side I could imagine some sort of marker which disallows the default marker for a struct and everything below. |
/reopen for follow-ups |
@sbueringer: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sbueringer @killianmuldoon thanks for the great effort on fixing the tests, perhaps we can close this as well (assuming follow-up #8139 is also merged) |
I was referring to additional measures to ensure this doesn't happen again with follow-ups. But I've created a separate issue for that now: #8147 Makes sense to close this issue to signal that the issue is resolved /close |
@sbueringer: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
As described in #8101 our e2e-main test is currently flaky. The test case which is responsible for that is the clusterctl upgrade test (v0.3 => main). The problem is that once we upgrade Cluster API to the version from main, Machines are rolled out. After a bit of investigation we found out that the KCP controller triggers the rollout.
[Spoiler alert] The test is failing roughly since 2 weeks ago when we merged #7772
Some Context:
KubeadmConfigSpec.InitConfiguration.NodeRegistration
andKubeadmConfigSpec.JoinConfiguration.NodeRegistration
IfNotPresent
) which is applied via OpenAPIExample error case:
InitConfiguration.NodeRegistration.ImagePullPolicy: IfNotPresent != ""
The solution is to implement defaulting in a way that it doesn't make a difference when we calculate if we need a rollout or not:
P.S. This bug doesn't affect any released version of Cluster API as #7772 was just merged on main.
/kind bug
/area control-plane
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
The text was updated successfully, but these errors were encountered: