Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[e2e] Cluster is deleted before running "Full upgrade" test #2654

Closed
sedefsavas opened this issue Mar 12, 2020 · 16 comments · Fixed by #2655 or #2758
Closed

[e2e] Cluster is deleted before running "Full upgrade" test #2654

sedefsavas opened this issue Mar 12, 2020 · 16 comments · Fixed by #2655 or #2758
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@sedefsavas
Copy link

What steps did you take and what happened:
Run e2e tests by

  1. cd ./test/infrastructure/docker
  2. Run: go test ./e2e -v -ginkgo.v -ginkgo.trace -count=1 -timeout=20m -tags=e2e -e2e.config="/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/local-e2e.conf" -skip-resource-cleanup=false

There are two e2e tests for docker. In AfterEach(), the cluster that was created in the first test is being deleted, which is alse used in the second test.

What did you expect to happen:
I'd expect it to not delete the cluster after each test or make tests selfs sufficient so that they don't share any state.

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 12, 2020
@sedefsavas
Copy link
Author

/assign

@chuckha
Copy link
Contributor

chuckha commented Mar 13, 2020

They are defined in this structure:

AfterEach()
Describe() {
    Specify(Basic create control plane) {}
    Specify(Full upgrade control plane) {}
}

So the after-each only applies to the describe, not each Specify.

The correct way to run the tests, and this is my fault for not documenting appropriately, is to use the FOCUS var.

try make test-capd-e2e FOCUS='Basic|Full'.

@sedefsavas
Copy link
Author

I was running without a FOCUS and observed it being triggered but will recheck.

@chuckha
Copy link
Contributor

chuckha commented Mar 13, 2020

i think there's a lot of room for improvement in how the capd-e2e tests are organized. I'm not sure if this helps or not, but this is what I'm looking for in the test organization:

  1. CI needs to be able to run the basic happy-path test that spins up a cluster without any additional longer-running tests such as upgarade.
  2. I want to be able to add new tests that modify an existing cluster, much like the Upgrade test is doing.
  3. I want a way to run the basic setup test and then some combination of the additional tests (or all of them) that will modify the cluster set up in the basic test.

I think FOCUS might get us what we want. I would like to avoid spinning up new clusters unless it's absolutely necessary since that takes such a long time.

@sedefsavas
Copy link
Author

I started with FOCUS='Basic|Full' and observing the following:

  • test-0 cluster is deleted after Basic test. (Even though it is failing here, it shouldn't call AfterEach, right?)
    STEP: deleting cluster test-0

  • Then, Full upgrade test is starting but failing due to unable to locate test-0

Logs

Basic create
  /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:97
STEP: creating an InfrastructureCluster resource
STEP: creating a Cluster resource linked to the InfrastructureCluster resource
STEP: creating the machine template
STEP: creating a KubeadmControlPlane
STEP: waiting for cluster to enter the provisioned phase
STEP: waiting for one control plane node to exist
Creating directory: resources/KubeadmConfig/default
Creating directory: resources/Node
Creating directory: resources/Cluster/default
Creating directory: resources/Machine/default
Creating directory: resources/KubeadmControlPlane/default
Creating directory: resources/DockerCluster/default
Creating directory: resources/DockerMachine/default
Creating directory: resources/DockerMachineTemplate/default
STEP: deleting cluster test-0
STEP: waiting for cluster test-0 to be deleted
STEP: ensuring all CAPI artifacts have been deleted
STEP: Ensuring docker artifacts have been deleted
STEP: Succeeding in deleting all docker artifacts

• Failure [210.761 seconds]
Docker
/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:44
  Cluster Creation
  /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:45
    Multi-node controlplane cluster
    /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:94
      Basic create [It]
      /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:97

      Timed out after 180.001s.
      Expected
          <bool>: false
      to be true

      /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/framework/control_plane.go:279

      Full Stack Trace
      sigs.k8s.io/cluster-api/test/framework.WaitForOneKubeadmControlPlaneMachineToExist(0x2553040, 0xc0000420f0, 0x621e488, 0xc00072f9b0, 0xc00054b200, 0xc0003e0280, 0x0, 0x0, 0x0)
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/framework/control_plane.go:279 +0x252
      sigs.k8s.io/cluster-api/test/infrastructure/docker/e2e.glob..func3.1.3.1()
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:148 +0x65d
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0002e9620, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xb8
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0002e9620, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0xcf
      github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc00000e4a0, 0x2518880, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x64
      github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc000336000, 0x0, 0x2518880, 0xc00009c940)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x5b5
      github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc000336000, 0x2518880, 0xc00009c940)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0x101
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0001b32c0, 0xc000336000, 0x0)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x10f
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0001b32c0, 0x1)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x120
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0001b32c0, 0xc000043eb0)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x117
      github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0000de230, 0x62192a0, 0xc00022f300, 0x2322193, 0xe, 0xc00000e600, 0x2, 0x2, 0x255a480, 0xc00009c940, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:62 +0x42b
      github.com/onsi/ginkgo.RunSpecsWithCustomReporters(0x2519780, 0xc00022f300, 0x2322193, 0xe, 0xc00000e5e0, 0x2, 0x2, 0x2)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:226 +0x217
      github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters(0x2519780, 0xc00022f300, 0x2322193, 0xe, 0xc0003bd720, 0x1, 0x1, 0x58b42d3c7aa4)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:214 +0xad
      sigs.k8s.io/cluster-api/test/infrastructure/docker/e2e.TestDocker(0xc00022f300)
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_suite_test.go:56 +0x1cd
      testing.tRunner(0xc00022f300, 0x23ca6f8)
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:909 +0xc9
      created by testing.(*T).Run
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:960 +0x350
------------------------------
Docker Cluster Creation Multi-node controlplane cluster 
  Full upgrade
  /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:207
STEP: upgrading the control plane object to a new version
Creating directory: resources/Node
STEP: deleting cluster test-0

• Failure [0.324 seconds]
Docker
/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:44
  Cluster Creation
  /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:45
    Multi-node controlplane cluster
    /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:94
      Full upgrade [It]
      /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:207

      Expected success, but got an error:
          <errors.aggregate | len:1, cap:1>: [
              {
                  ErrStatus: {
                      TypeMeta: {Kind: "", APIVersion: ""},
                      ListMeta: {
                          SelfLink: "",
                          ResourceVersion: "",
                          Continue: "",
                          RemainingItemCount: nil,
                      },
                      Status: "Failure",
                      Message: "kubeadmcontrolplanes.controlplane.cluster.x-k8s.io \"test-0\" not found",
                      Reason: "NotFound",
                      Details: {
                          Name: "test-0",
                          Group: "controlplane.cluster.x-k8s.io",
                          Kind: "kubeadmcontrolplanes",
                          UID: "",
                          Causes: nil,
                          RetryAfterSeconds: 0,
                      },
                      Code: 404,
                  },
              },
          ]
          kubeadmcontrolplanes.controlplane.cluster.x-k8s.io "test-0" not found

      /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:212

      Full Stack Trace
      sigs.k8s.io/cluster-api/test/infrastructure/docker/e2e.glob..func3.1.3.2()
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:212 +0x2e1
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0002e96e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xb8
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0002e96e0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0xcf
      github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc00000e580, 0x2518880, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x64
      github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0003360f0, 0x0, 0x2518880, 0xc00009c940)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x5b5
      github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0003360f0, 0x2518880, 0xc00009c940)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0x101
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0001b32c0, 0xc0003360f0, 0x1)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x10f
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0001b32c0, 0x1)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x120
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0001b32c0, 0xc000043eb0)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x117
      github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0000de230, 0x62192a0, 0xc00022f300, 0x2322193, 0xe, 0xc00000e600, 0x2, 0x2, 0x255a480, 0xc00009c940, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:62 +0x42b
      github.com/onsi/ginkgo.RunSpecsWithCustomReporters(0x2519780, 0xc00022f300, 0x2322193, 0xe, 0xc00000e5e0, 0x2, 0x2, 0x2)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:226 +0x217
      github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters(0x2519780, 0xc00022f300, 0x2322193, 0xe, 0xc0003bd720, 0x1, 0x1, 0x58b42d3c7aa4)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:214 +0xad
      sigs.k8s.io/cluster-api/test/infrastructure/docker/e2e.TestDocker(0xc00022f300)
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_suite_test.go:56 +0x1cd
      testing.tRunner(0xc00022f300, 0x23ca6f8)
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:909 +0xc9
      created by testing.(*T).Run
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:960 +0x350
------------------------------
Creating directory: logs/capi-controller-manager/capi-controller-manager-55f7f6d54c-7vtpz
Creating directory: logs/capi-controller-manager/capi-controller-manager-55f7f6d54c-7vtpz
Creating directory: logs/capi-kubeadm-bootstrap-controller-manager/capi-kubeadm-bootstrap-controller-manager-685f67fd6-86sxj
Creating directory: logs/capi-kubeadm-bootstrap-controller-manager/capi-kubeadm-bootstrap-controller-manager-685f67fd6-86sxj
Creating directory: logs/capi-kubeadm-control-plane-controller-manager/capi-kubeadm-control-plane-controller-manager-69c8dd9b54-f2r4k
Creating directory: logs/capi-kubeadm-control-plane-controller-manager/capi-kubeadm-control-plane-controller-manager-69c8dd9b54-f2r4k
Creating directory: logs/capd-controller-manager/capd-controller-manager-7f544b5ccd-s9jqx
Creating directory: logs/capd-controller-manager/capd-controller-manager-7f544b5ccd-s9jqx
STEP: Deleting the management cluster

JUnit report was created: /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/junit.e2e_suite.1.xml


Summarizing 2 Failures:

[Fail] Docker Cluster Creation Multi-node controlplane cluster [It] Basic create 
/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/framework/control_plane.go:279

[Fail] Docker Cluster Creation Multi-node controlplane cluster [It] Full upgrade 
/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:212

Ran 2 of 2 Specs in 364.683 seconds
FAIL! -- 0 Passed | 2 Failed | 0 Pending | 0 Skipped

@chuckha
Copy link
Contributor

chuckha commented Mar 13, 2020

hmm that's strange. It shouldn't be calling AfterEach here, but maybe there's something weird in our failure handler.

Does it work for you when the basic test passes?

@chuckha
Copy link
Contributor

chuckha commented Mar 13, 2020

After reading https://onsi.github.io/ginkgo/#organizing-specs-with-containers-describe-and-context it looks like my assertions are wrong and we should for sure fix this.

I almost want to move the Deletes into their own Describe, but that gets weird with failure cleanups :( Maybe what I want is not easily possible with Ginkgo

@sedefsavas
Copy link
Author

FYI: @chuckha pointed out that the reason that the basic test is failing on my local machine was due to some containers exited from previous runs. So, it is important to have a clean docker environment before running these tests.

We still need to address avoiding deleting the clusters at each describe so I moved the deletion to AfterSuite() in the PR.

@ncdc ncdc added this to the v0.3.x milestone Mar 18, 2020
@ncdc ncdc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 18, 2020
@Jaakko-Os
Copy link
Contributor

I also ran into the upgrade issue and luckily found this bug report.
As a side remark, I noticed that the 'Full ...' case cannot be run independently.
make test-capd-e2e FOCUS='Full'
I would expect no such dependency between cases.

@chuckha
Copy link
Contributor

chuckha commented Mar 20, 2020

I also ran into the upgrade issue and luckily found this bug report.
As a side remark, I noticed that the 'Full ...' case cannot be run independently.
make test-capd-e2e FOCUS='Full'
I would expect no such dependency between cases.

Absolutely agree. It's not ideal and is quite confusing. I haven't found a nice way to achieve both speed and convenience, but I'm definitely open to ideas! Requirements are: Run the CI job as quickly as possible (only spin up one cluster). It would be nice if we could reuse the same cluster for other operations to optimize the run, but it's starting to look like that's not possible without internal state management which simply doesn't exist and might be fairly ugly to implement.

@detiber
Copy link
Member

detiber commented Mar 20, 2020

If all jobs require a baseline cluster deployed, then you could theoretically spin it up in the BeforeSuite, and then tear it down in the AfterSuite.

That said, that would only work in the case that we never expect to run the jobs in parallel, since parallel invocations could clobber each other and each test would have to leave the cluster back in the expected state at the end of it's run.

Unless resource utilization is a major concern, I think the best course of action would be something similar to what we are doing in the AWS tests, where each test brings up it's own cluster and we improved runtime by parallelizing the test runs (though if there are shared pre-requisites, you have do do some trickiness around ensuring those are configured correctly, but I don't expect that to be the case in the CAPD use case).

@chuckha
Copy link
Contributor

chuckha commented Mar 20, 2020

Yeah, that might be the right answer for our general use case

@sedefsavas
Copy link
Author

Makes sense. I will spin up a different cluster for each tests and then look for parallelizing them.

@sedefsavas
Copy link
Author

/reopen to track failing e2e test.
Also, skip-resource-cleanup flag is no-op right now. That needs to be fixed as well.

@wfernandes
Copy link
Contributor

Should this issue have remained open?

@sedefsavas
Copy link
Author

@wfernandes I will create another issue to avoid rereading this thread as it includes some solved issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
8 participants