CAPZ should use Out of Tree `cloud-controller-manager` and Storage Drivers #715

jseely · 2020-06-18T19:05:37Z

⚠️ Cluster API Azure maintainers can ask to turn an issue-proposal into a CAEP when necessary. This is to be expected for large changes that impact multiple components, breaking changes, or new large features.

Dependencies

Cluster ResourceSet needs to be implemented to properly support this

Goals

CAPZ clusters should be deploy-able using OOT Azure Provider and Storage Drivers
Default should be OOT
Tests need to be updated to test both modes and migration scenario from in-tree to OOT

Non-Goals/Future Work

Implement Cluster ResourceSet

User Story

As an operator I would like to separate the cloud provider integration from the kubernetes binaries and use the newer Storage Drivers and cloud-provider-azure.

Detailed Description

In 2018/2019 Kubernetes started to externalize interactions with the underlying cloud provider to slow down the growth in size of Kubernetes binaries and to decouple the lifecycle and development of Kubernetes from that of the individual cloud provider integrations.
https://kubernetes.io/blog/2019/04/17/the-future-of-cloud-providers-in-kubernetes/

/kind proposal

The text was updated successfully, but these errors were encountered:

jseely · 2020-06-18T19:07:17Z

@nader-ziada @devigned @CecileRobertMichon @ncdc

alexeldeib · 2020-06-18T19:13:38Z

Have you already seen the doc and template? Might help to distinguish this issue from what's already possible by adding some additional details? ClusterResourceSet is one approach to automate this, but I see you've listed that as a non-goal (and dependency)?

CecileRobertMichon · 2020-06-18T19:18:09Z

The 2nd goal "Default should be OOT" is something we're not necessarily ready for. I think for now we want to support optionally using OOT (without any manual steps, possibly using ClusterResourceSet), but I don't we'll want to move this to be the default right away to align with other Azure provisioning tools. cc @feiskyer @ritazh

See kubernetes/enhancements#667 for current Azure OOT provider status

fejta-bot · 2020-10-08T17:08:55Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

CecileRobertMichon · 2020-11-03T23:42:59Z

/remove-lifecycle stale

CecileRobertMichon · 2020-11-03T23:43:16Z

/priority important-longterm

fejta-bot · 2021-02-02T00:35:09Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-03-04T01:21:29Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-04-03T02:05:10Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-04-03T02:05:14Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

CecileRobertMichon · 2021-04-05T16:12:41Z

/lifecycle frozen

status update:

Cluster ResourceSet needs to be implemented to properly support this

done

CAPZ clusters should be deploy-able using OOT Azure Provider and Storage Drivers

done

Default should be OOT

hold until OOT is fully ready

Tests need to be updated to test both modes and migration scenario from in-tree to OOT

Added tests for OOT (already testing in tree). Not testing migration currently.

Implement Cluster ResourceSet

Done in #1216

CecileRobertMichon · 2021-05-17T18:41:28Z

Default should be OOT

Now that v1.0.0 has been released, we should be able to move forward with this

CecileRobertMichon · 2021-05-17T18:41:37Z

/assign

shysank · 2022-03-29T22:57:31Z

cc @sonasingh46

sonasingh46 · 2022-03-30T08:55:22Z

I have been trying to validate this manually. Especially around kubernetes 1.22 --> 1.23 upgrade paths.
The following in-tree components for azure are the points of attention :

AzureDiskCSI Driver
AzureFileCSI Driver
Cloud-provider-azure

As an effort to extract the cloud provider dependency from Kubernetes, the cloud provider dependent code is moving out from the in-tree Kubernetes.
As a result of this, the in-tree csi drivers and cloud providers are moving out of the Kubernetes code base.

From Kubernetes version 1.23 the azureDiskCSIDriver migration is enabled by default. This means that to provision a volume via azureDiskCSIDriver, it will require to install the external azureDiskCSIDriver as the in-tree azureDiskCSIDriver won't work in Kubernetes 1.23 as azureDiskCSIDriver migration is enabled by default.

The in-tree azureFileCSIDriver will continue to work in 1.23 as azureFileCSIDriver migration is not enabled by default in 1.23. If azureFileCSIDriver migration is enabled by user/admin then external azureFileCSIDriver needs to be installed.

Consider the following upgrade paths from v1.22 to v1.23:

Scenario1: Upgrade cluster from Kubernetes version 1.22 to 1.23 without any extra tuning and configuration

AzureDiskCSI migration is enabled by default on the upgraded cluster.
External azureDiskCSI driver must be installed so that pods using existing volume from in-tree AzureDiskCSI driver of 1.22 will continue to work on upgraded cluster.
To create new volume, external azureDiskCSI driver must be installed. One way of installing is via CRS.
AzureFileCSI migration is disabled by default.
Existing volumes created from in-tree AzureFileCSI driver of 1.22 will continue to work on upgraded cluster.
New azure file volumes can be created without any external driver installation.
In-tree CCM is enabled by default.

Scenario2: Upgrade cluster from Kubernetes version 1.22 to 1.23 by disabling AzureDiskCSIMigration

AzureDiskCSI migration will be disabled the upgraded cluster.
Existing volumes created from in-tree AzureDiskCSI driver of 1.22 will continue to work on upgraded cluster via in-tree AzureDiskCSI driver.
New azure file volumes can be created without any external driver installation.
AzureFileCSI migration is disabled by default.
Existing volumes created from in-tree AzureFileCSI driver of 1.22 will continue to work on upgraded cluster via in-tree AzureFileCSI driver.
New azure file volumes can be created without any external driver installation.
In-tree CCM is enabled by default.

PS: Still validating other scenarios

sonasingh46 · 2022-03-30T09:28:14Z

Scenario3: Upgraded cluster from Kubernetes version 1.22 to 1.23 by enabling external cloud provider

The upgrade failed. The new control plane machine did not pass the perFlight checks. Readiness and startup probe failed for the control plane components on the new control plane machine that came up.
To fix this, we may need to enable the external volume plugin. ( WIP )

mboersma · 2022-07-21T15:12:58Z

@jackfrancis and @Jont828, is this something that should land in milestone v1.5, or will it probably hit the next one?

Jont828 · 2022-07-21T22:49:54Z

I'm not too sure, is there a PR open or being worked on for this ATM? Looks like Jack was assigned on it so maybe we can ask him when he's back.

jackfrancis · 2022-07-22T14:52:03Z

I think we can land this in the next milestone

mboersma · 2022-08-18T15:24:31Z

/milestone next

CecileRobertMichon · 2023-01-10T22:33:55Z

/assign
/milestone v1.8

k8s-ci-robot added the kind/proposal Issues or PRs related to proposals. label Jun 18, 2020

CecileRobertMichon added this to the next milestone Jul 10, 2020

nader-ziada mentioned this issue Aug 21, 2020

📖 Create roadmap document #890

Merged

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 8, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2020

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Nov 3, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 2, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 4, 2021

k8s-ci-robot closed this as completed Apr 3, 2021

CecileRobertMichon reopened this Apr 5, 2021

CecileRobertMichon removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 5, 2021

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 5, 2021

CecileRobertMichon mentioned this issue May 17, 2021

When running a workload with a single control plane node the load balancers take 15 mins to provision #857

Closed

k8s-ci-robot assigned jackfrancis and unassigned CecileRobertMichon Mar 29, 2022

CecileRobertMichon assigned CecileRobertMichon and jackfrancis and unassigned jackfrancis and CecileRobertMichon Mar 29, 2022

fiunchinho mentioned this issue Apr 4, 2022

[CAPZ] Use external cloud controller manager giantswarm/roadmap#912

Open

CecileRobertMichon modified the milestones: v1.3, v1.4 May 4, 2022

CecileRobertMichon modified the milestones: v1.4, v1.5 Jul 7, 2022

CecileRobertMichon assigned Jont828 and unassigned jackfrancis Aug 17, 2022

k8s-ci-robot modified the milestones: v1.5, next Aug 18, 2022

CecileRobertMichon unassigned Jont828 Jan 6, 2023

k8s-ci-robot assigned CecileRobertMichon Jan 10, 2023

k8s-ci-robot modified the milestones: next, v1.8 Jan 10, 2023

CecileRobertMichon mentioned this issue Jan 30, 2023

Switch flavor and test templates to external cloud-provider #3105

Merged

3 tasks

CecileRobertMichon mentioned this issue Feb 6, 2023

Update capi-->capz tests to use out-of-tree cloud-provider-azure #2165

Closed

k8s-ci-robot closed this as completed in #3105 Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAPZ should use Out of Tree `cloud-controller-manager` and Storage Drivers #715

CAPZ should use Out of Tree `cloud-controller-manager` and Storage Drivers #715

jseely commented Jun 18, 2020

jseely commented Jun 18, 2020

alexeldeib commented Jun 18, 2020

CecileRobertMichon commented Jun 18, 2020 •

edited

Loading

fejta-bot commented Oct 8, 2020

CecileRobertMichon commented Nov 3, 2020

CecileRobertMichon commented Nov 3, 2020

fejta-bot commented Feb 2, 2021

fejta-bot commented Mar 4, 2021

fejta-bot commented Apr 3, 2021

k8s-ci-robot commented Apr 3, 2021

CecileRobertMichon commented Apr 5, 2021 •

edited

Loading

CecileRobertMichon commented May 17, 2021

CecileRobertMichon commented May 17, 2021

shysank commented Mar 29, 2022

sonasingh46 commented Mar 30, 2022 •

edited

Loading

sonasingh46 commented Mar 30, 2022 •

edited

Loading

mboersma commented Jul 21, 2022

Jont828 commented Jul 21, 2022

jackfrancis commented Jul 22, 2022

mboersma commented Aug 18, 2022

CecileRobertMichon commented Jan 10, 2023

CAPZ should use Out of Tree cloud-controller-manager and Storage Drivers #715

CAPZ should use Out of Tree cloud-controller-manager and Storage Drivers #715

Comments

jseely commented Jun 18, 2020

jseely commented Jun 18, 2020

alexeldeib commented Jun 18, 2020

CecileRobertMichon commented Jun 18, 2020 • edited Loading

fejta-bot commented Oct 8, 2020

CecileRobertMichon commented Nov 3, 2020

CecileRobertMichon commented Nov 3, 2020

fejta-bot commented Feb 2, 2021

fejta-bot commented Mar 4, 2021

fejta-bot commented Apr 3, 2021

k8s-ci-robot commented Apr 3, 2021

CecileRobertMichon commented Apr 5, 2021 • edited Loading

CecileRobertMichon commented May 17, 2021

CecileRobertMichon commented May 17, 2021

shysank commented Mar 29, 2022

sonasingh46 commented Mar 30, 2022 • edited Loading

sonasingh46 commented Mar 30, 2022 • edited Loading

mboersma commented Jul 21, 2022

Jont828 commented Jul 21, 2022

jackfrancis commented Jul 22, 2022

mboersma commented Aug 18, 2022

CecileRobertMichon commented Jan 10, 2023

CAPZ should use Out of Tree `cloud-controller-manager` and Storage Drivers #715

CAPZ should use Out of Tree `cloud-controller-manager` and Storage Drivers #715

CecileRobertMichon commented Jun 18, 2020 •

edited

Loading

CecileRobertMichon commented Apr 5, 2021 •

edited

Loading

sonasingh46 commented Mar 30, 2022 •

edited

Loading

sonasingh46 commented Mar 30, 2022 •

edited

Loading