Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix self hosting E2E test flakes #3589

Closed
fabriziopandini opened this issue Sep 3, 2020 · 6 comments · Fixed by #3639
Closed

Fix self hosting E2E test flakes #3589

fabriziopandini opened this issue Sep 3, 2020 · 6 comments · Fixed by #3639
Assignees
Labels
area/testing Issues or PRs related to testing help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@fabriziopandini
Copy link
Member

What steps did you take and what happened:
The self-hosting test in E2E is failing occasionally due to

When testing Cluster API working on self-hosted clusters
/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted_test.go:27
  Should pivot the bootstrap cluster to a self-hosted cluster [It]
  /home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted.go:75
  Failed to run clusterctl move
  Expected success, but got an error:
      <*errors.withStack | 0xc000318fa0>: {
          error: {
              cause: {
                  error: {
                      cause: {
                          error: {
                              cause: {
                                  Op: "Get",
                                  URL: "https://172.17.0.4:6443/api?timeout=30s",
                                  Err: {s: "EOF"},
                              },
                              msg: "action failed after 9 attempts",
                          },
                          stack: [0x16ce792, 0x16e3078, 0x16d983c, 0x16eb1ee, 0x16ea6e0, 0x11a23e1, 0x16ce700, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
                      },
                      msg: "failed to connect to the management cluster",
                  },
                  stack: [0x16e30aa, 0x16d983c, 0x16eb1ee, 0x16ea6e0, 0x11a23e1, 0x16ce700, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
              },
              msg: "action failed after 10 attempts",
          },
          stack: [0x16ce792, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
      }
      action failed after 10 attempts: failed to connect to the management cluster: action failed after 9 attempts: Get https://172.17.0.4:6443/api?timeout=30s: EOF
  /home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/clusterctl/client.go:167

Might be we are too aggressive in this operation and a wait loop before doing this operation might help to avoid this flakes

/kind failing-test
/area testing

@k8s-ci-robot k8s-ci-robot added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Sep 3, 2020
@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The label(s) area/ cannot be applied, because the repository doesn't have them

In response to this:

What steps did you take and what happened:
The self-hosting test in E2E is failing occasionally due to

When testing Cluster API working on self-hosted clusters
/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted_test.go:27
 Should pivot the bootstrap cluster to a self-hosted cluster [It]
 /home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted.go:75
 Failed to run clusterctl move
 Expected success, but got an error:
     <*errors.withStack | 0xc000318fa0>: {
         error: {
             cause: {
                 error: {
                     cause: {
                         error: {
                             cause: {
                                 Op: "Get",
                                 URL: "https://172.17.0.4:6443/api?timeout=30s",
                                 Err: {s: "EOF"},
                             },
                             msg: "action failed after 9 attempts",
                         },
                         stack: [0x16ce792, 0x16e3078, 0x16d983c, 0x16eb1ee, 0x16ea6e0, 0x11a23e1, 0x16ce700, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
                     },
                     msg: "failed to connect to the management cluster",
                 },
                 stack: [0x16e30aa, 0x16d983c, 0x16eb1ee, 0x16ea6e0, 0x11a23e1, 0x16ce700, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
             },
             msg: "action failed after 10 attempts",
         },
         stack: [0x16ce792, 0x16d96fd, 0x16d8d11, 0x16d70d5, 0x16f1f60, 0x16f88ab, 0x171708e, 0x7a3898, 0x7a34ef, 0x7a2994, 0x7a98f5, 0x7a9151, 0x7af01f, 0x7aeb40, 0x7ae387, 0x7b096b, 0x7b31b7, 0x7b2efd, 0x1708b44, 0x5117c9, 0x461311],
     }
     action failed after 10 attempts: failed to connect to the management cluster: action failed after 9 attempts: Get https://172.17.0.4:6443/api?timeout=30s: EOF
 /home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/clusterctl/client.go:167

Might be we are too aggressive in this operation and a wait loop before doing this operation might help to avoid this flakes

/kind failing-test
/area testing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the area/testing Issues or PRs related to testing label Sep 3, 2020
@fabriziopandini
Copy link
Member Author

/milestone v0.3.10

@k8s-ci-robot k8s-ci-robot added this to the v0.3.10 milestone Sep 3, 2020
@vincepri
Copy link
Member

vincepri commented Sep 3, 2020

/priority important-soon
/help

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/priority important-soon
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Sep 3, 2020
@fabriziopandini
Copy link
Member Author

I just noticed that we are adding the MHC to the default cluster-template, and this will trigger all the worker machines to get deleted after 30s.

I'm 100% not sure this is the source of the issue above, but IMO we should get this variable out of the table for all the tests except for MHC remdiation...
@sedefsavas opinions?

@fabriziopandini
Copy link
Member Author

/assign
/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Issues or PRs related to testing help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants