-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 Fix self-hosted flakes in E2E tests #3639
🌱 Fix self-hosted flakes in E2E tests #3639
Conversation
/milestone v0.3.10 |
/retest |
343f90f
to
9121d59
Compare
test/e2e/data/infrastructure-docker/cluster-template-kcp-adoption.yaml
Outdated
Show resolved
Hide resolved
9121d59
to
c5f15b1
Compare
/hold |
From the PR's description, seems we're now splitting manifests in order to separate MHC components? Would this be something that we would expect users to do as well? During move, we should be able to pause controllers and look at the state of the cluster if it's safe to move, is there anything stopping us to do so? Can we split this PR in multiple ones? In particular:
These changes seems outside of the PR's scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
This PR is also addressing this issue: #3461
I am fine with adding it in the title and the issues fixed by this PR or making this separate PRs.
test/e2e/data/infrastructure-docker/cluster-template/kustomization.yaml
Outdated
Show resolved
Hide resolved
@fabriziopandini Locally, works for me as well. /test pull-cluster-api-e2e |
In
should be
|
No. The current MHC object in E2E tests is specifically configured to always trigger remediation after 30s the node is started, and I removed it from the other e2e tests in order to avoid flakes due to remediation kicking in case of slow execution
This is not how the move logics works. Currently we are pausing reconciliation on clusters included in the Move scope, but the controller will continue to run.
Some of them are nits, but refactoring templates for MHC is potentially related to self-hosted flakes, see the note above about MHC and #3589 (comment). |
c5f15b1
to
e2a5f89
Compare
/hold cancel |
I assume this change has moved to the other PR, I'll take a look there before commenting
We should probably consider stopping all controllers before move, for safety purposes and to avoid any running slow worker to perform actions while move is running. |
/test pull-cluster-api-e2e-full |
/lgtm pending e2e pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR aims to fix self-hosted flakes by ensuring that mhc is not run during this test and by adding two delay in order to avoid doing move too aggressively.
In order to make this happen the entire management of cluster templates was re-architected so:
On top of that, this PR includes, a set of nits for improving test logs (mostly uppercase)
Which issue(s) this PR fixes:
Fixes #3589