Fix self hosting E2E test flakes #3589

fabriziopandini · 2020-09-03T09:15:14Z

What steps did you take and what happened:
The self-hosting test in E2E is failing occasionally due to

When testing Cluster API working on self-hosted clusters
/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted_test.go:27
  Should pivot the bootstrap cluster to a self-hosted cluster [It]
  /home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted.go:75
  Failed to run clusterctl move
  Expected success, but got an error:
      <*errors.withStack | 0xc000318fa0>: {
          error: {
              cause: {
                  error: {
                      cause: {
                          error: {
                              cause: {
                                  Op: "Get",
                                  URL: "https://172.17.0.4:6443/api?timeout=30s",
                                  Err: {s: "EOF"},
                              },
                              msg: "action failed after 9 attempts",
                          },
                          stack: [0x16ce792, 0x16e3078, 0x16d983c, 0x16eb1ee, 0x16ea6e0, 0x11a23e1, 0x16ce700, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
                      },
                      msg: "failed to connect to the management cluster",
                  },
                  stack: [0x16e30aa, 0x16d983c, 0x16eb1ee, 0x16ea6e0, 0x11a23e1, 0x16ce700, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
              },
              msg: "action failed after 10 attempts",
          },
          stack: [0x16ce792, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
      }
      action failed after 10 attempts: failed to connect to the management cluster: action failed after 9 attempts: Get https://172.17.0.4:6443/api?timeout=30s: EOF
  /home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/clusterctl/client.go:167

Might be we are too aggressive in this operation and a wait loop before doing this operation might help to avoid this flakes

/kind failing-test
/area testing

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2020-09-03T09:15:16Z

@fabriziopandini: The label(s) area/ cannot be applied, because the repository doesn't have them

In response to this:

What steps did you take and what happened:
The self-hosting test in E2E is failing occasionally due to

When testing Cluster API working on self-hosted clusters
/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted_test.go:27
 Should pivot the bootstrap cluster to a self-hosted cluster [It]
 /home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted.go:75
 Failed to run clusterctl move
 Expected success, but got an error:
     <*errors.withStack | 0xc000318fa0>: {
         error: {
             cause: {
                 error: {
                     cause: {
                         error: {
                             cause: {
                                 Op: "Get",
                                 URL: "https://172.17.0.4:6443/api?timeout=30s",
                                 Err: {s: "EOF"},
                             },
                             msg: "action failed after 9 attempts",
                         },
                         stack: [0x16ce792, 0x16e3078, 0x16d983c, 0x16eb1ee, 0x16ea6e0, 0x11a23e1, 0x16ce700, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
                     },
                     msg: "failed to connect to the management cluster",
                 },
                 stack: [0x16e30aa, 0x16d983c, 0x16eb1ee, 0x16ea6e0, 0x11a23e1, 0x16ce700, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
             },
             msg: "action failed after 10 attempts",
         },
         stack: [0x16ce792, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
     }
     action failed after 10 attempts: failed to connect to the management cluster: action failed after 9 attempts: Get https://172.17.0.4:6443/api?timeout=30s: EOF
 /home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/clusterctl/client.go:167

Might be we are too aggressive in this operation and a wait loop before doing this operation might help to avoid this flakes

/kind failing-test
/area testing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fabriziopandini · 2020-09-03T09:15:30Z

/milestone v0.3.10

vincepri · 2020-09-03T13:38:16Z

/priority important-soon
/help

k8s-ci-robot · 2020-09-03T13:38:17Z

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/priority important-soon
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fabriziopandini · 2020-09-07T13:39:56Z

I just noticed that we are adding the MHC to the default cluster-template, and this will trigger all the worker machines to get deleted after 30s.

I'm 100% not sure this is the source of the issue above, but IMO we should get this variable out of the table for all the tests except for MHC remdiation...
@sedefsavas opinions?

fabriziopandini · 2020-09-15T09:45:32Z

/assign
/lifecycle active

k8s-ci-robot added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Sep 3, 2020

k8s-ci-robot added the area/testing Issues or PRs related to testing label Sep 3, 2020

k8s-ci-robot added this to the v0.3.10 milestone Sep 3, 2020

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Sep 3, 2020

k8s-ci-robot assigned fabriziopandini Sep 15, 2020

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Sep 15, 2020

fabriziopandini mentioned this issue Sep 15, 2020

🌱 Fix self-hosted flakes in E2E tests #3639

Merged

k8s-ci-robot closed this as completed in #3639 Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix self hosting E2E test flakes #3589

Fix self hosting E2E test flakes #3589

fabriziopandini commented Sep 3, 2020

k8s-ci-robot commented Sep 3, 2020

fabriziopandini commented Sep 3, 2020

vincepri commented Sep 3, 2020

k8s-ci-robot commented Sep 3, 2020

fabriziopandini commented Sep 7, 2020

fabriziopandini commented Sep 15, 2020

Fix self hosting E2E test flakes #3589

Fix self hosting E2E test flakes #3589

Comments

fabriziopandini commented Sep 3, 2020

k8s-ci-robot commented Sep 3, 2020

fabriziopandini commented Sep 3, 2020

vincepri commented Sep 3, 2020

k8s-ci-robot commented Sep 3, 2020

fabriziopandini commented Sep 7, 2020

fabriziopandini commented Sep 15, 2020