-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 [WIP] [DNR] Reproduce clusterctl upgrade e2e test flake #8120
🌱 [WIP] [DNR] Reproduce clusterctl upgrade e2e test flake #8120
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I looked into the test artifacts of a failed test. It looks like KCP rolls out a new Machine for some reason after the upgrade to main. I improved the logging in KCP and in the e2e test. That should give us more data as soon as we hit the flake again. |
/test pull-cluster-api-e2e-full-main |
1 similar comment
/test pull-cluster-api-e2e-full-main |
Hey @sbueringer thanks a lot, I was testing just the timeout increase in #8119 and it seems to be passing in a row. But I do agree with the ^, it is most likely a rollout issue rather than a timeout. |
My theory is that it's a race condition which produces conditions under which KCP triggers a rollout after upgrade |
Wrong test case failed |
okay, so we detected a rollout as per logs:
|
The diff which leads to a rollout is detected here:
Just impossible to tell from the YAML why reflect.DeepEqual detects a difference |
Added a lot more logs. let's see if that's good enough to find something |
/test pull-cluster-api-e2e-full-main |
@furkatgofurov7 I found the issue, it's a fun one... Context:
Example error case:
The solution is essentially to set the default value in a way that it doesn't make a difference when we calculate if we need a rollout or not Solution:
|
@killianmuldoon ^^ that's an interesting reason why we can't do all defaulting via the OpenAPI schema 😂 |
998f5f3
to
5f7adfb
Compare
/test pull-cluster-api-e2e-main |
/test pull-cluster-api-e2e-full-main |
/test pull-cluster-api-e2e-main |
1 similar comment
/test pull-cluster-api-e2e-main |
/test pull-cluster-api-e2e-main Found another issue, this time ImagePullPolicy defaulting triggers a rollout by the Cluster topology reconciler (ImagePullPolicy is not defaulted for the existing KubeadmConfigTemplate, but for the new/desired one => diff => rollout) |
Other flake |
1 similar comment
Other flake |
d6d0c74
to
196d93c
Compare
/test pull-cluster-api-e2e-full-main |
8ce0ae6
to
27f5fe1
Compare
/test pull-cluster-api-e2e-full-main |
/test pull-cluster-api-e2e-informing-main |
1 similar comment
/test pull-cluster-api-e2e-informing-main |
Signed-off-by: Stefan Büringer [email protected]
319fddd
to
fec5bd3
Compare
/test pull-cluster-api-e2e-informing-main |
@sbueringer: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test pull-cluster-api-e2e-informing-main |
6 similar comments
/test pull-cluster-api-e2e-informing-main |
/test pull-cluster-api-e2e-informing-main |
/test pull-cluster-api-e2e-informing-main |
/test pull-cluster-api-e2e-informing-main |
/test pull-cluster-api-e2e-informing-main |
/test pull-cluster-api-e2e-informing-main |
/close CI should be stable again |
@sbueringer: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
Currently contains:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #