-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛[e2e] Avoid deleting the shared cluster after each test #2655
Conversation
Hi @sedefsavas. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
2fd92d8
to
6acec28
Compare
/hold |
cc |
/hold cancel |
/assign |
/lgtm |
/assign @detiber |
framework.DeleteCluster(ctx, deleteClusterInput) | ||
|
||
waitForClusterDeletedInput := framework.WaitForClusterDeletedInput{ | ||
Getter: mgmtClient, | ||
Cluster: cluster, | ||
} | ||
framework.WaitForClusterDeleted(ctx, waitForClusterDeletedInput) | ||
|
||
assertAllClusterAPIResourcesAreGoneInput := framework.AssertAllClusterAPIResourcesAreGoneInput{ | ||
Lister: mgmtClient, | ||
Cluster: cluster, | ||
} | ||
framework.AssertAllClusterAPIResourcesAreGone(ctx, assertAllClusterAPIResourcesAreGoneInput) | ||
|
||
ensureDockerDeletedInput := ensureDockerArtifactsDeletedInput{ | ||
Lister: mgmtClient, | ||
Cluster: cluster, | ||
} | ||
ensureDockerArtifactsDeleted(ensureDockerDeletedInput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these helpers fail (somewhat) gracefully when mgmtClient
or cluster
are nil? If not, then this might fail during teardown prior to getting the logs below or tearing down the Management cluster.
Though it looks like we'll already fail before tearing down the management cluster if we fail to get the logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, they do fail. Trying to find a way to make it continue on failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe deletion shouldn't be in AfterSuite and should instead be its own set of assertions that run after the tests run.
It'd be cool if we could do something like FOCUS='Create|Upgrade|ScaleDown|ScaleUp|Delete'. Basically be able to control exactly which operations are running. Unfortunately ginkgo doesn't give us a great way to do that except try and abuse FOCUS, but that is surely fraught with peril.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chuckha the problem here is that you should not assume ordering between the various specs that match focus.
In general specs that are defined withing the same Describe
are run in the order they are defined (unless running in parallel), but it does not hold up across different Describe
blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is the general assumption I broke when originally writing these tests, mostly for the sake of test-running-speed. I really don't want to spin up a new cluster for each test, but working with ginkgo instead of against it will likely yield us better results code wise and, depending on how much prow we can use, worse or similar results speed wise.
Perhaps one option (getting way out of scope of the PR now...) is to make the before each is as quick as possible by spinning up a single node control plane cluster with no workers instead of a 3 node control plane. That would put the runtime somewhere around 4 minutes for each test's start up time plus the time the test takes to run (an additional ~2 minutes per control plane that gets created). But it would make the code organization much better and allow for parallelization.
The other idea I had was to manage the ordering of tests internally, pass some kind of flag to the test run that lists the tests you want to run (-tests="upgrade,scaleup,scaledown") and then the tests keep track of the order and run the tests in the correct order skipping tests that are not selected. However, this ruins any possibility of parallelization.
@sedefsavas I think the only way forward here is to remove the assertion failures on delete :/ let's have a wait of the same amount of time, but if it times out there is no exception or assertion failure until after the logs have been dumped. |
/milestone v0.3.2 |
/approve Approving / merging this for now given the immediate improvement / gain that we get from it, @sedefsavas can you open up an issue with the follow-up items from above? /milestone v0.3.3 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sedefsavas, vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR avoid deleting the shared cluster after each test.
Which issue(s) this PR fixes
Fixes #2654