[e2e] Cluster is deleted before running "Full upgrade" test #2654

sedefsavas · 2020-03-12T15:28:01Z

What steps did you take and what happened:
Run e2e tests by

cd ./test/infrastructure/docker
Run: go test ./e2e -v -ginkgo.v -ginkgo.trace -count=1 -timeout=20m -tags=e2e -e2e.config="/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/local-e2e.conf" -skip-resource-cleanup=false

There are two e2e tests for docker. In AfterEach(), the cluster that was created in the first test is being deleted, which is alse used in the second test.

What did you expect to happen:
I'd expect it to not delete the cluster after each test or make tests selfs sufficient so that they don't share any state.

/kind bug

sedefsavas · 2020-03-12T15:28:11Z

/assign

chuckha · 2020-03-13T13:51:02Z

They are defined in this structure:

AfterEach()
Describe() {
    Specify(Basic create control plane) {}
    Specify(Full upgrade control plane) {}
}

So the after-each only applies to the describe, not each Specify.

The correct way to run the tests, and this is my fault for not documenting appropriately, is to use the FOCUS var.

try make test-capd-e2e FOCUS='Basic|Full'.

sedefsavas · 2020-03-13T15:57:03Z

I was running without a FOCUS and observed it being triggered but will recheck.

chuckha · 2020-03-13T16:02:21Z

i think there's a lot of room for improvement in how the capd-e2e tests are organized. I'm not sure if this helps or not, but this is what I'm looking for in the test organization:

CI needs to be able to run the basic happy-path test that spins up a cluster without any additional longer-running tests such as upgarade.
I want to be able to add new tests that modify an existing cluster, much like the Upgrade test is doing.
I want a way to run the basic setup test and then some combination of the additional tests (or all of them) that will modify the cluster set up in the basic test.

I think FOCUS might get us what we want. I would like to avoid spinning up new clusters unless it's absolutely necessary since that takes such a long time.

sedefsavas · 2020-03-13T16:40:13Z

I started with FOCUS='Basic|Full' and observing the following:

test-0 cluster is deleted after Basic test. (Even though it is failing here, it shouldn't call AfterEach, right?)
STEP: deleting cluster test-0
Then, Full upgrade test is starting but failing due to unable to locate test-0

Logs

Basic create
  /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:97
STEP: creating an InfrastructureCluster resource
STEP: creating a Cluster resource linked to the InfrastructureCluster resource
STEP: creating the machine template
STEP: creating a KubeadmControlPlane
STEP: waiting for cluster to enter the provisioned phase
STEP: waiting for one control plane node to exist
Creating directory: resources/KubeadmConfig/default
Creating directory: resources/Node
Creating directory: resources/Cluster/default
Creating directory: resources/Machine/default
Creating directory: resources/KubeadmControlPlane/default
Creating directory: resources/DockerCluster/default
Creating directory: resources/DockerMachine/default
Creating directory: resources/DockerMachineTemplate/default
STEP: deleting cluster test-0
STEP: waiting for cluster test-0 to be deleted
STEP: ensuring all CAPI artifacts have been deleted
STEP: Ensuring docker artifacts have been deleted
STEP: Succeeding in deleting all docker artifacts

• Failure [210.761 seconds]
Docker
/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:44
  Cluster Creation
  /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:45
    Multi-node controlplane cluster
    /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:94
      Basic create [It]
      /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:97

      Timed out after 180.001s.
      Expected
          <bool>: false
      to be true

      /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/framework/control_plane.go:279

      Full Stack Trace
      sigs.k8s.io/cluster-api/test/framework.WaitForOneKubeadmControlPlaneMachineToExist(0x2553040, 0xc0000420f0, 0x621e488, 0xc00072f9b0, 0xc00054b200, 0xc0003e0280, 0x0, 0x0, 0x0)
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/framework/control_plane.go:279 +0x252
      sigs.k8s.io/cluster-api/test/infrastructure/docker/e2e.glob..func3.1.3.1()
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:148 +0x65d
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0002e9620, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xb8
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0002e9620, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0xcf
      github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc00000e4a0, 0x2518880, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x64
      github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc000336000, 0x0, 0x2518880, 0xc00009c940)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x5b5
      github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc000336000, 0x2518880, 0xc00009c940)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0x101
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0001b32c0, 0xc000336000, 0x0)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x10f
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0001b32c0, 0x1)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x120
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0001b32c0, 0xc000043eb0)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x117
      github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0000de230, 0x62192a0, 0xc00022f300, 0x2322193, 0xe, 0xc00000e600, 0x2, 0x2, 0x255a480, 0xc00009c940, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:62 +0x42b
      github.com/onsi/ginkgo.RunSpecsWithCustomReporters(0x2519780, 0xc00022f300, 0x2322193, 0xe, 0xc00000e5e0, 0x2, 0x2, 0x2)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:226 +0x217
      github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters(0x2519780, 0xc00022f300, 0x2322193, 0xe, 0xc0003bd720, 0x1, 0x1, 0x58b42d3c7aa4)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:214 +0xad
      sigs.k8s.io/cluster-api/test/infrastructure/docker/e2e.TestDocker(0xc00022f300)
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_suite_test.go:56 +0x1cd
      testing.tRunner(0xc00022f300, 0x23ca6f8)
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:909 +0xc9
      created by testing.(*T).Run
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:960 +0x350
------------------------------
Docker Cluster Creation Multi-node controlplane cluster 
  Full upgrade
  /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:207
STEP: upgrading the control plane object to a new version
Creating directory: resources/Node
STEP: deleting cluster test-0

• Failure [0.324 seconds]
Docker
/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:44
  Cluster Creation
  /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:45
    Multi-node controlplane cluster
    /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:94
      Full upgrade [It]
      /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:207

      Expected success, but got an error:
          <errors.aggregate | len:1, cap:1>: [
              {
                  ErrStatus: {
                      TypeMeta: {Kind: "", APIVersion: ""},
                      ListMeta: {
                          SelfLink: "",
                          ResourceVersion: "",
                          Continue: "",
                          RemainingItemCount: nil,
                      },
                      Status: "Failure",
                      Message: "kubeadmcontrolplanes.controlplane.cluster.x-k8s.io \"test-0\" not found",
                      Reason: "NotFound",
                      Details: {
                          Name: "test-0",
                          Group: "controlplane.cluster.x-k8s.io",
                          Kind: "kubeadmcontrolplanes",
                          UID: "",
                          Causes: nil,
                          RetryAfterSeconds: 0,
                      },
                      Code: 404,
                  },
              },
          ]
          kubeadmcontrolplanes.controlplane.cluster.x-k8s.io "test-0" not found

      /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:212

      Full Stack Trace
      sigs.k8s.io/cluster-api/test/infrastructure/docker/e2e.glob..func3.1.3.2()
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:212 +0x2e1
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0002e96e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xb8
      github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0002e96e0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0xcf
      github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc00000e580, 0x2518880, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x64
      github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0003360f0, 0x0, 0x2518880, 0xc00009c940)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x5b5
      github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0003360f0, 0x2518880, 0xc00009c940)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0x101
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0001b32c0, 0xc0003360f0, 0x1)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x10f
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0001b32c0, 0x1)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x120
      github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0001b32c0, 0xc000043eb0)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x117
      github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0000de230, 0x62192a0, 0xc00022f300, 0x2322193, 0xe, 0xc00000e600, 0x2, 0x2, 0x255a480, 0xc00009c940, ...)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:62 +0x42b
      github.com/onsi/ginkgo.RunSpecsWithCustomReporters(0x2519780, 0xc00022f300, 0x2322193, 0xe, 0xc00000e5e0, 0x2, 0x2, 0x2)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:226 +0x217
      github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters(0x2519780, 0xc00022f300, 0x2322193, 0xe, 0xc0003bd720, 0x1, 0x1, 0x58b42d3c7aa4)
        /Users/ssavas/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:214 +0xad
      sigs.k8s.io/cluster-api/test/infrastructure/docker/e2e.TestDocker(0xc00022f300)
        /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_suite_test.go:56 +0x1cd
      testing.tRunner(0xc00022f300, 0x23ca6f8)
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:909 +0xc9
      created by testing.(*T).Run
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:960 +0x350
------------------------------
Creating directory: logs/capi-controller-manager/capi-controller-manager-55f7f6d54c-7vtpz
Creating directory: logs/capi-controller-manager/capi-controller-manager-55f7f6d54c-7vtpz
Creating directory: logs/capi-kubeadm-bootstrap-controller-manager/capi-kubeadm-bootstrap-controller-manager-685f67fd6-86sxj
Creating directory: logs/capi-kubeadm-bootstrap-controller-manager/capi-kubeadm-bootstrap-controller-manager-685f67fd6-86sxj
Creating directory: logs/capi-kubeadm-control-plane-controller-manager/capi-kubeadm-control-plane-controller-manager-69c8dd9b54-f2r4k
Creating directory: logs/capi-kubeadm-control-plane-controller-manager/capi-kubeadm-control-plane-controller-manager-69c8dd9b54-f2r4k
Creating directory: logs/capd-controller-manager/capd-controller-manager-7f544b5ccd-s9jqx
Creating directory: logs/capd-controller-manager/capd-controller-manager-7f544b5ccd-s9jqx
STEP: Deleting the management cluster

JUnit report was created: /Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/junit.e2e_suite.1.xml


Summarizing 2 Failures:

[Fail] Docker Cluster Creation Multi-node controlplane cluster [It] Basic create 
/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/framework/control_plane.go:279

[Fail] Docker Cluster Creation Multi-node controlplane cluster [It] Full upgrade 
/Users/ssavas/dev/qa_capi/tilttest/cluster-api/test/infrastructure/docker/e2e/docker_test.go:212

Ran 2 of 2 Specs in 364.683 seconds
FAIL! -- 0 Passed | 2 Failed | 0 Pending | 0 Skipped

chuckha · 2020-03-13T16:42:52Z

hmm that's strange. It shouldn't be calling AfterEach here, but maybe there's something weird in our failure handler.

Does it work for you when the basic test passes?

chuckha · 2020-03-13T16:46:41Z

After reading https://onsi.github.io/ginkgo/#organizing-specs-with-containers-describe-and-context it looks like my assertions are wrong and we should for sure fix this.

I almost want to move the Deletes into their own Describe, but that gets weird with failure cleanups :( Maybe what I want is not easily possible with Ginkgo

sedefsavas · 2020-03-13T18:40:14Z

FYI: @chuckha pointed out that the reason that the basic test is failing on my local machine was due to some containers exited from previous runs. So, it is important to have a clean docker environment before running these tests.

We still need to address avoiding deleting the clusters at each describe so I moved the deletion to AfterSuite() in the PR.

Jaakko-Os · 2020-03-20T11:57:54Z

I also ran into the upgrade issue and luckily found this bug report.
As a side remark, I noticed that the 'Full ...' case cannot be run independently.
make test-capd-e2e FOCUS='Full'
I would expect no such dependency between cases.

chuckha · 2020-03-20T14:08:07Z

I also ran into the upgrade issue and luckily found this bug report.
As a side remark, I noticed that the 'Full ...' case cannot be run independently.
make test-capd-e2e FOCUS='Full'
I would expect no such dependency between cases.

Absolutely agree. It's not ideal and is quite confusing. I haven't found a nice way to achieve both speed and convenience, but I'm definitely open to ideas! Requirements are: Run the CI job as quickly as possible (only spin up one cluster). It would be nice if we could reuse the same cluster for other operations to optimize the run, but it's starting to look like that's not possible without internal state management which simply doesn't exist and might be fairly ugly to implement.

detiber · 2020-03-20T14:27:26Z

If all jobs require a baseline cluster deployed, then you could theoretically spin it up in the BeforeSuite, and then tear it down in the AfterSuite.

That said, that would only work in the case that we never expect to run the jobs in parallel, since parallel invocations could clobber each other and each test would have to leave the cluster back in the expected state at the end of it's run.

Unless resource utilization is a major concern, I think the best course of action would be something similar to what we are doing in the AWS tests, where each test brings up it's own cluster and we improved runtime by parallelizing the test runs (though if there are shared pre-requisites, you have do do some trickiness around ensuring those are configured correctly, but I don't expect that to be the case in the CAPD use case).

chuckha · 2020-03-20T15:11:12Z

Yeah, that might be the right answer for our general use case

sedefsavas · 2020-03-20T16:06:06Z

Makes sense. I will spin up a different cluster for each tests and then look for parallelizing them.

sedefsavas · 2020-03-23T21:51:30Z

/reopen to track failing e2e test.
Also, skip-resource-cleanup flag is no-op right now. That needs to be fixed as well.

wfernandes · 2020-03-26T20:38:54Z

Should this issue have remained open?

sedefsavas · 2020-03-26T20:42:56Z

@wfernandes I will create another issue to avoid rereading this thread as it includes some solved issues.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 12, 2020

k8s-ci-robot assigned sedefsavas Mar 12, 2020

sedefsavas mentioned this issue Mar 12, 2020

🐛[e2e] Avoid deleting the shared cluster after each test #2655

Merged

ncdc added this to the v0.3.x milestone Mar 18, 2020

ncdc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 18, 2020

chuckha mentioned this issue Mar 20, 2020

CAPD improvement plan #2738

Closed

k8s-ci-robot closed this as completed in #2655 Mar 23, 2020

sedefsavas mentioned this issue Mar 23, 2020

🏃[e2e] enable all docker e2e tests for CI #2758

Merged

vincepri reopened this Mar 23, 2020

k8s-ci-robot closed this as completed in #2758 Mar 26, 2020

sedefsavas mentioned this issue Mar 26, 2020

[e2e] Parallelize CAPD e2e tests #2796

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[e2e] Cluster is deleted before running "Full upgrade" test #2654

[e2e] Cluster is deleted before running "Full upgrade" test #2654

sedefsavas commented Mar 12, 2020

sedefsavas commented Mar 12, 2020

chuckha commented Mar 13, 2020

sedefsavas commented Mar 13, 2020

chuckha commented Mar 13, 2020 •

edited

Loading

sedefsavas commented Mar 13, 2020

chuckha commented Mar 13, 2020

chuckha commented Mar 13, 2020

sedefsavas commented Mar 13, 2020

Jaakko-Os commented Mar 20, 2020

chuckha commented Mar 20, 2020

detiber commented Mar 20, 2020

chuckha commented Mar 20, 2020

sedefsavas commented Mar 20, 2020

sedefsavas commented Mar 23, 2020

wfernandes commented Mar 26, 2020

sedefsavas commented Mar 26, 2020

[e2e] Cluster is deleted before running "Full upgrade" test #2654

[e2e] Cluster is deleted before running "Full upgrade" test #2654

Comments

sedefsavas commented Mar 12, 2020

sedefsavas commented Mar 12, 2020

chuckha commented Mar 13, 2020

sedefsavas commented Mar 13, 2020

chuckha commented Mar 13, 2020 • edited Loading

sedefsavas commented Mar 13, 2020

chuckha commented Mar 13, 2020

chuckha commented Mar 13, 2020

sedefsavas commented Mar 13, 2020

Jaakko-Os commented Mar 20, 2020

chuckha commented Mar 20, 2020

detiber commented Mar 20, 2020

chuckha commented Mar 20, 2020

sedefsavas commented Mar 20, 2020

sedefsavas commented Mar 23, 2020

wfernandes commented Mar 26, 2020

sedefsavas commented Mar 26, 2020

chuckha commented Mar 13, 2020 •

edited

Loading