Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Implement e2e test for clusterctl upgrade #3708

Conversation

fabriziopandini
Copy link
Member

What this PR does / why we need it:
This PR implements a new E2E test providing a signal on clusterctl upgrades

  • init a management cluster with providers in version X
  • create a workload cluster
  • upgrade providers to version Y using clusterctl upgrade
  • check everything works fine (e.g. add a machine to the workload cluster)
  • cleanup

Which issue(s) this PR fixes:
Fixes #3690

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 29, 2020
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2020
@fabriziopandini
Copy link
Member Author

/area testing
/hold
for v0.3.10 to be released because it assumes docker components to be available in the release artifacts

/milestone v0.4.0
(this could go in v0.3.11 as well)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2020
@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Sep 29, 2020
@k8s-ci-robot k8s-ci-robot added the area/testing Issues or PRs related to testing label Sep 29, 2020
@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from 4fd507a to 134b7af Compare October 6, 2020 12:17
@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-full-main
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 6, 2020
@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from 134b7af to 9c288cb Compare October 6, 2020 12:28
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 11, 2020
@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from 9c288cb to 1dbc0aa Compare October 13, 2020 11:55
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 13, 2020
@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

@fabriziopandini
Copy link
Member Author

/retest

@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2020
@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from 1dbc0aa to 6100c5a Compare November 19, 2020 11:24
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2020
@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

@fabriziopandini
Copy link
Member Author

@wfernandes @srm09 if you have some spare time PTAL

@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from 6100c5a to 66d342e Compare November 19, 2020 12:48
@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

Copy link
Contributor

@wfernandes wfernandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run the test on my system and saw the following error:

  Should create a management cluster and then upgrade all the providers [It]
  /Users/wfernandes/workspace/cluster-api/test/e2e/clusterctl_upgrade.go:79

  Failed to run clusterctl config cluster
  Unexpected error:
      <*yamlprocessor.errMissingVariables | 0xc000ac6200>: {
          Missing: ["CNI_RESOURCES"],
      }
      value for variables [CNI_RESOURCES] is not set. Please set the value using os environment variables or the clusterctl config file
  occurred

Do I need to configure some variables for this E2E test?

test/framework/clusterctl/e2e_config.go Outdated Show resolved Hide resolved
@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from 66d342e to 6fc8b94 Compare November 20, 2020 13:45
@fabriziopandini
Copy link
Member Author

@wfernandes the error was due to a conflicting merge with #3896 recently merged; now I have rebased, so everything should be fine

/test pull-cluster-api-e2e-full-main

@fabriziopandini
Copy link
Member Author

The test error seems unrelated to this PR...

@wfernandes
Copy link
Contributor

For some reason when I run GINKGO_FOCUS="clusterctl upgrades" make test-e2e, it fails in the following step

STEP: Waiting for one control plane node to exist
When testing clusterctl upgrades
/Users/wfernandes/workspace/cluster-api/test/e2e/clusterctl_upgrade_test.go:27
  Should create a management cluster and then upgrade all the providers [It]
  /Users/wfernandes/workspace/cluster-api/test/e2e/clusterctl_upgrade.go:79

  Timed out after 600.000s.
  Expected
      <bool>: false
  to be true

  /Users/wfernandes/workspace/cluster-api/test/framework/controlplane_helpers.go:143

I'll try and investigate some more but it's not immediately clear as to what is happening. Any thoughts @fabriziopandini ?

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 26, 2021
@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from d12ba1e to e1a622f Compare April 27, 2021 12:20
@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-main

@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from e1a622f to 706caea Compare April 27, 2021 13:35
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the approach looks good to me.

test/e2e/config/docker.yaml Show resolved Hide resolved
test/e2e/config/docker.yaml Outdated Show resolved Hide resolved
test/framework/clusterctl/client.go Outdated Show resolved Hide resolved
test/framework/clusterctl/clusterctl_helpers.go Outdated Show resolved Hide resolved
test/framework/clusterctl/clusterctl_helpers.go Outdated Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Outdated Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Outdated Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Outdated Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Outdated Show resolved Hide resolved
test/e2e/config/docker.yaml Show resolved Hide resolved
@sbueringer
Copy link
Member

@fabriziopandini Should I do another round of review or wait until the test is green or you ping again?

@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from 8b19078 to 4bf0c08 Compare May 5, 2021 20:11
@fabriziopandini fabriziopandini force-pushed the clusterctl-upgrade-e2e-test branch from 4bf0c08 to bb2fbfe Compare May 5, 2021 20:31
@@ -219,7 +219,7 @@ func (p *providerComponents) Delete(options DeleteOptions) error {

func (p *providerComponents) DeleteWebhookNamespace() error {
log := logf.Log
log.V(5).Info("Deleting %s namespace", repository.WebhookNamespaceName)
log.V(5).Info("Deleting", "namespace", repository.WebhookNamespaceName)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error in clusterctl detected by this test

targetPort: 9443
selector:
control-plane: controller-manager
targetPort: webhook-server
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inconsistency in CAPD vs other providers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some more context what the inconsistency is, why we have it and why we are dropping the selector now?

Copy link
Member Author

@fabriziopandini fabriziopandini May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly for consistency across all the providers in the CAPI codebase, eg.

  • All the provider are using the named port, CAPD was using the port number
  • All the provider are not using a selector (they rely on the labels added by kustomize), CAPD had the selector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, so we're consistent now after this change. Wasn't sure ;).

@fabriziopandini fabriziopandini changed the title [WIP] 🌱 Implement e2e test for clusterctl upgrade 🌱 Implement e2e test for clusterctl upgrade May 5, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2021
@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-full-main
/hold cancel

@vincepri @sbueringer this test already helped in finding a controller runtime defect and a clusterctl error; the job passed when running alone, I'm now testing with e2e-full, but we should now be ready to go

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2021
@sbueringer
Copy link
Member

@fabriziopandini Great! I put it on my review list

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits. I think relative to the complexity of the test it reads very nicely! :)

test/framework/clusterctl/client.go Outdated Show resolved Hide resolved
@@ -82,6 +85,61 @@ func Init(ctx context.Context, input InitInput) {
Expect(err).ToNot(HaveOccurred(), "failed to run clusterctl init")
}

// InitWithBinary uses clusterctl binary to run init with the list of providers defined in the local repository.
func InitWithBinary(_ context.Context, binary string, input InitInput) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the idea behind the context param here (and the other funcs in this file is). Maybe just hand it over in case we need it later on?

I imagine at least in the funcs with binary we will probably never use it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to have some context with timeout percolating down the code, and potentially down to the exec command as well...

test/framework/clusterctl/client.go Outdated Show resolved Hide resolved
targetPort: 9443
selector:
control-plane: controller-manager
targetPort: webhook-server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some more context what the inconsistency is, why we have it and why we are dropping the selector now?

test/e2e/clusterctl_upgrade.go Outdated Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Outdated Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Outdated Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Outdated Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Show resolved Hide resolved
test/e2e/clusterctl_upgrade.go Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-cluster-api-apidiff-main 2a6e55d link /test pull-cluster-api-apidiff-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sbueringer
Copy link
Member

/lgtm

Just to confirm. No changes in test-infra required as this test will just run with our other tests in pull-cluster-api-e2e-full-main?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2021
@fabriziopandini
Copy link
Member Author

Just to confirm. No changes in test-infra required as this test will just run with our other tests in pull-cluster-api-e2e-full-main?

Yes!
Eventually, if the test proves to be stable we will consider if to move it into the [PR blocking] group, but not for now.

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit 93ee29c into kubernetes-sigs:master May 12, 2021
@fabriziopandini fabriziopandini deleted the clusterctl-upgrade-e2e-test branch May 17, 2021 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Issues or PRs related to testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a new E2E test for verifying clusterctl upgrades
5 participants