Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛[e2e] Avoid deleting the shared cluster after each test #2655

Merged
merged 1 commit into from
Mar 23, 2020

Conversation

sedefsavas
Copy link

What this PR does / why we need it:
This PR avoid deleting the shared cluster after each test.

Which issue(s) this PR fixes
Fixes #2654

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 12, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @sedefsavas. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 12, 2020
@k8s-ci-robot k8s-ci-robot requested review from justinsb and ncdc March 12, 2020 15:32
@vincepri
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 12, 2020
@sedefsavas sedefsavas force-pushed the e2efix branch 2 times, most recently from 2fd92d8 to 6acec28 Compare March 12, 2020 16:22
@sedefsavas
Copy link
Author

/hold
I am combining this into the AfterSuite in docker_suite_test, as per @detiber 's suggestion.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2020
@chuckha
Copy link
Contributor

chuckha commented Mar 13, 2020

cc

@sedefsavas
Copy link
Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 13, 2020
@chuckha
Copy link
Contributor

chuckha commented Mar 13, 2020

/assign

@chuckha
Copy link
Contributor

chuckha commented Mar 13, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 13, 2020
@chuckha
Copy link
Contributor

chuckha commented Mar 13, 2020

/assign @detiber

Comment on lines +120 to +138
framework.DeleteCluster(ctx, deleteClusterInput)

waitForClusterDeletedInput := framework.WaitForClusterDeletedInput{
Getter: mgmtClient,
Cluster: cluster,
}
framework.WaitForClusterDeleted(ctx, waitForClusterDeletedInput)

assertAllClusterAPIResourcesAreGoneInput := framework.AssertAllClusterAPIResourcesAreGoneInput{
Lister: mgmtClient,
Cluster: cluster,
}
framework.AssertAllClusterAPIResourcesAreGone(ctx, assertAllClusterAPIResourcesAreGoneInput)

ensureDockerDeletedInput := ensureDockerArtifactsDeletedInput{
Lister: mgmtClient,
Cluster: cluster,
}
ensureDockerArtifactsDeleted(ensureDockerDeletedInput)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these helpers fail (somewhat) gracefully when mgmtClient or cluster are nil? If not, then this might fail during teardown prior to getting the logs below or tearing down the Management cluster.

Though it looks like we'll already fail before tearing down the management cluster if we fail to get the logs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, they do fail. Trying to find a way to make it continue on failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe deletion shouldn't be in AfterSuite and should instead be its own set of assertions that run after the tests run.

It'd be cool if we could do something like FOCUS='Create|Upgrade|ScaleDown|ScaleUp|Delete'. Basically be able to control exactly which operations are running. Unfortunately ginkgo doesn't give us a great way to do that except try and abuse FOCUS, but that is surely fraught with peril.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chuckha the problem here is that you should not assume ordering between the various specs that match focus.

In general specs that are defined withing the same Describe are run in the order they are defined (unless running in parallel), but it does not hold up across different Describe blocks.

Copy link
Contributor

@chuckha chuckha Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the general assumption I broke when originally writing these tests, mostly for the sake of test-running-speed. I really don't want to spin up a new cluster for each test, but working with ginkgo instead of against it will likely yield us better results code wise and, depending on how much prow we can use, worse or similar results speed wise.

Perhaps one option (getting way out of scope of the PR now...) is to make the before each is as quick as possible by spinning up a single node control plane cluster with no workers instead of a 3 node control plane. That would put the runtime somewhere around 4 minutes for each test's start up time plus the time the test takes to run (an additional ~2 minutes per control plane that gets created). But it would make the code organization much better and allow for parallelization.

The other idea I had was to manage the ordering of tests internally, pass some kind of flag to the test run that lists the tests you want to run (-tests="upgrade,scaleup,scaledown") and then the tests keep track of the order and run the tests in the correct order skipping tests that are not selected. However, this ruins any possibility of parallelization.

@chuckha
Copy link
Contributor

chuckha commented Mar 17, 2020

@sedefsavas I think the only way forward here is to remove the assertion failures on delete :/ let's have a wait of the same amount of time, but if it times out there is no exception or assertion failure until after the logs have been dumped.

@chuckha
Copy link
Contributor

chuckha commented Mar 17, 2020

/milestone v0.3.2

@k8s-ci-robot k8s-ci-robot added this to the v0.3.2 milestone Mar 17, 2020
@vincepri vincepri modified the milestones: v0.3.2, v0.3.x Mar 19, 2020
@vincepri
Copy link
Member

/approve

Approving / merging this for now given the immediate improvement / gain that we get from it, @sedefsavas can you open up an issue with the follow-up items from above?

/milestone v0.3.3

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.3.x, v0.3.3 Mar 23, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sedefsavas, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2020
@k8s-ci-robot k8s-ci-robot merged commit cf8be83 into kubernetes-sigs:master Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[e2e] Cluster is deleted before running "Full upgrade" test
5 participants