Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Templates and E2E tests use external cloud-provider-azure by default #1994

Conversation

jackfrancis
Copy link
Contributor

@jackfrancis jackfrancis commented Jan 21, 2022

What type of PR is this?

What this PR does / why we need it:

Derived from #1889

This PR re-orients test and reference templates so that external cloud-provider-azure is the "default" for all other templates, and that "in-tree" cloud-provider-azure configuration is only tested when that specific scenario is required.

The reason for this is twofold:

  1. The Kubernetes community has agreed that all future cloud provider development (across all providers, not specifically Azure) will be maintained separate from the main upstream Kubernetes code. So this PR re-orients the capz tests and reference templates towards that community best-practice. More information here:
  1. The development of new Azure cloud provider features and, more importantly, bug fixes, is prioritized for delivery in the external cloud-provider-azure project:

As a result of the complexity of the years-long transition away from in-tree (i.e., cloud provider bits being tightly coupled to Kubernetes versions and delivered in common with Kubernetes versions) cloud-provider-azure there are now material differences in the actual cloud-provider-azure runtime that is used depending on whether or not you are using in-tree or external cloud-provider-azure. The differences can be summarized most simply thusly: the external cloud-provider-azure is more stable (more frequently tested, more confirmed fixes actually included).

For the above reasons, we want to make it easier to rapidly test capz against external cloud-provider-azure, and to encourage capz users to create clusters running the external cloud-provider-azure components.

An important note for capz users: the capz reference templates use the experimental cluster-api ClusterResourceSet CRD. In order to use ClusterResourceSet resources in your capz templates, you must enable that feature flag in your cluster-api environment (management cluster).

Reference:

An example of how to enable ClusterResourceSet on your cluster-api management cluster is here:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Templates and E2E tests use external cloud-provider-azure by default

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 21, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign devigned after the PR has been reviewed.
You can assign the PR to them by writing /assign @devigned in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 21, 2022
@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch 4 times, most recently from 83caeae to fa92afa Compare January 22, 2022 03:27
@jackfrancis
Copy link
Contributor Author

/retest

@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch from fa92afa to 7621099 Compare January 24, 2022 21:06
@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch from 7621099 to b149c16 Compare January 25, 2022 22:09
@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch 2 times, most recently from 1b4bd2a to 1b01e01 Compare January 26, 2022 02:50
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 26, 2022
@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch from 1b01e01 to d000940 Compare January 26, 2022 18:20
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 26, 2022
@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch from d000940 to 4e6fc6d Compare January 26, 2022 19:08
@jackfrancis
Copy link
Contributor Author

/retest

@jackfrancis
Copy link
Contributor Author

/retest

@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch from 4e6fc6d to edb48c4 Compare January 27, 2022 04:39
@nilo19
Copy link
Contributor

nilo19 commented Jan 27, 2022

@jackfrancis can we get a template with an out-of-tree cloud provider + multiple VMSS?

@jackfrancis
Copy link
Contributor Author

@nilo19 Sure, we can do that, would be best to do that in a follow-up PR. Can you create an issue describing the multiple VMSS scenario in more detail and assign it to me?

@jackfrancis
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Mar 11, 2022
@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch from e4dd633 to e21ffb0 Compare March 11, 2022 22:07
@jackfrancis
Copy link
Contributor Author

@CecileRobertMichon docs updated and ready for review/feedback

(also PR description updated)

@CecileRobertMichon
Copy link
Contributor

I noticed this PR isn't updating https://github.com/kubernetes-sigs/cluster-api-provider-azure/tree/main/test/e2e/data/infrastructure-azure which is used to run the CAPI test suite on CAPZ clusters (to cover various lifecycle management scenarios, nothing Azure specific). It might make sense to also update those. We can either do it now or open an issue to follow up and update them.

side note: we really need to refactor CAPZ test templates to make it easier to follow, there are too many templates in too many places

@CecileRobertMichon
Copy link
Contributor

/hold

let's merge this next week to limit impact since it might disrupt k8s tests

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 11, 2022
@CecileRobertMichon
Copy link
Contributor

/test pull-cluster-api-provider-azure-apiversion-upgrade
/test pull-cluster-api-provider-azure-capi-e2e
/test pull-cluster-api-provider-azure-conformance
/test pull-cluster-api-provider-azure-conformance-with-ci-artifact
/test pull-cluster-api-provider-azure-e2e-workload-upgrade
/test pull-cluster-api-provider-azure-upstream-windows-dockershim
/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts
/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

@jackfrancis
Copy link
Contributor Author

@CecileRobertMichon #2165

@jackfrancis
Copy link
Contributor Author

/retest

1 similar comment
@jackfrancis
Copy link
Contributor Author

/retest

name: cloud-controller-manager
---
apiVersion: v1
kind: Pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis confirmed with @feiskyer we shouldn't use a Pod here as you had pointed out before. Can you please change this and the other cloud-provider manifests to use Deployments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the update to sample manifests for reference: kubernetes-sigs/cloud-provider-azure#1252

@jackfrancis jackfrancis force-pushed the more-external-cloud-provider-azure branch from e21ffb0 to 06ce3bb Compare March 16, 2022 16:38
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 16, 2022

@jackfrancis: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-e2e-full 73b3cd3 link false /test pull-cluster-api-provider-azure-e2e-full
pull-cluster-api-provider-azure-e2e-windows-dockershim 21c5981 link true /test pull-cluster-api-provider-azure-e2e-windows-dockershim

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jackfrancis
Copy link
Contributor Author

/retest

@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-apiversion-upgrade
/test pull-cluster-api-provider-azure-capi-e2e
/test pull-cluster-api-provider-azure-conformance
/test pull-cluster-api-provider-azure-conformance-with-ci-artifact
/test pull-cluster-api-provider-azure-e2e-workload-upgrade
/test pull-cluster-api-provider-azure-upstream-windows-dockershim
/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts
/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-exp
/test pull-cluster-api-provider-azure-e2e-optional

@jackfrancis
Copy link
Contributor Author

/retest

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cloud-node-manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cloud-node-manager doesn't include the windows DS. Is windows working without it?

The windows deamon set is provided in the examples: https://github.com/kubernetes-sigs/cloud-provider-azure/blob/66e1bec130df801212310a14bedae737fefd042e/examples/out-of-tree/cloud-node-manager.yaml#L102-L108)

I think it will be required for all of the Windows functionality to work. @pengfi might know?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, windows should also have a daemonset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like all the windows tests are passing. Feels like we are missing a test to that would catch this missing deamon set. What is some of the functionality that we could test that makes sense from capz perspective?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was missed because our out tree cloud provider test doesn't run against windows nodes: #2209 (comment)

It("with a 1 control plane nodes and 2 worker nodes", func() {

@k8s-ci-robot
Copy link
Contributor

@jackfrancis: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 29, 2022
@lzhecheng
Copy link
Contributor

Hello @jackfrancis, how much effort left for this PR? I feel like it is almost finished.

@jackfrancis
Copy link
Contributor Author

@lzhecheng thank you for your patience

I think it's fair to say that at this point we don't plan to merge this PR as-is. tl;dr we are going a different approach:

  1. Use helm to install external cloud-provider-azure
  2. Implement "external cloud-provider-azure as default" iteratively, over a series of smaller PRs

@CecileRobertMichon agree that we should close this PR so that other folks aren't confused?

@CecileRobertMichon
Copy link
Contributor

@jackfrancis agreed

let's merge #2209 first

/close

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: Closed this PR.

In response to this:

@jackfrancis agreed

let's merge #2209 first

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jackfrancis jackfrancis deleted the more-external-cloud-provider-azure branch December 9, 2022 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants