Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ClusterClass and managed clusters #2684

Closed
AAkindele opened this issue Sep 29, 2022 · 22 comments · Fixed by #4155
Closed

Add support for ClusterClass and managed clusters #2684

AAkindele opened this issue Sep 29, 2022 · 22 comments · Fixed by #4155
Assignees
Labels
area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Milestone

Comments

@AAkindele
Copy link
Contributor

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Add support for using ClusterClass when working with managed AKS clusters. Ideally, there would be a flavor for AKS that uses ClusterClass to make is easier for first time users to get started with ClusterClass and managed clusters.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

A use case for this is the ability to use lifecycle hook runtime extensions, https://cluster-api.sigs.k8s.io/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.html. Clusters need to be created using ClusterClass in order for the hooks to work.

Environment:

  • cluster-api-provider-azure version: latest
  • Kubernetes version: (use kubectl version): 1.25.2
  • OS (e.g. from /etc/os-release): 20.04.3
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 29, 2022
@CecileRobertMichon
Copy link
Contributor

/area managedclusters

@k8s-ci-robot k8s-ci-robot added the area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type label Oct 6, 2022
@CecileRobertMichon
Copy link
Contributor

Thanks for opening this issue @AAkindele!

The main change required to get ClusterClass working with managed clusters is going to be to add AzureManagedClusterTemplate and AzureManagedControlPlaneTemplate as described in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210526-cluster-class-and-managed-topologies.md#provider-implementation

In addition, we also need to evaluate whether CAPZ should implement recommendations outlined in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220725-managed-kubernetes.md, which would involve moving some of the fields in AzureManagedCluster to AzureManagedControlPlane (breaking change). This is outlined in https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2602/files#diff-f732765b7a645dc84b492225dc4db5fe8f47f49199a28d65a785f1d604714955R49

@jackfrancis, I wonder if it would make sense to add ClusterClass support as part of #2602 as well?

@AAkindele
Copy link
Contributor Author

This issue about MachinePool, kubernetes-sigs/cluster-api#5991, is referenced in the Managed Kubernetes proposal here, https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220725-managed-kubernetes.md#clusterclass-support-for-machinepool. Would this be a blocker to adding ClusterClass managed cluster support?

@CecileRobertMichon, for the breaking change option, do you happen to have a list of the fields involved or other information on the scope of the breaking change. I am new to the codebase, but I'm willing to help gather the information if pointed in the right direction. Thanks.

@CecileRobertMichon
Copy link
Contributor

Would this be a blocker to adding ClusterClass managed cluster support?

Yes that would be required since AKS only supports MachinePools (not MachineDeployments). Good catch!

for the breaking change option, do you happen to have a list of the fields involved or other information on the scope of the breaking change

That proposal / implementation for CAPZ + AKS doesn't exist yet, and no I don't know which fields would be moved off the top of my head, but the changes needed are described here from a GCP perspective: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220725-managed-kubernetes.md#option-3-two-kinds-with-a-managed-control-plane-and-managed-infra-cluster-with-better-separation-of-responsibilities, we would want the equivalent changes for CAPZ.

@richardcase @pydctw are you able to provide more guidance on the above?

@CecileRobertMichon
Copy link
Contributor

I think the first step would be a detailed issue or design doc that describes: what is the current state of the CAPZ AzureManagedCluster API, what exact field changes would be needed to implement the recommendations made by CAPI , what are the pros and cons + risks. @AAkindele is that something you are interested in helping with? Perhaps with support and guidance from @jackfrancis since you are relatively new to CAPZ.

@AAkindele
Copy link
Contributor Author

👍 Yep! I'm interested in helping out.

@richardcase
Copy link
Member

@richardcase @pydctw are you able to provide more guidance on the above?

Would it be helpful to have a call to discuss? We have also been thinking about the same with CAPA, which is problematic as we GA's EKS support.

@CecileRobertMichon
Copy link
Contributor

PTAL @AAkindele @richardcase: #2739

We are going to discuss at tomorrow's CAPZ office hours (9am PT) if either of you are able to join. If not, feel free to leave your thought in the PR above async.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2023
@willie-yao
Copy link
Contributor

/assign

@willie-yao
Copy link
Contributor

/lifecycle active

@k8s-ci-robot k8s-ci-robot added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 31, 2023
@willie-yao
Copy link
Contributor

willie-yao commented Feb 15, 2023

To implement clusterclass for managed clusters, we need to determine a list of fields that are appropriate to be shared across clusters using the clusterclass template. AzureManagedCluster only has one field ControlPlaneEndpoint that won't be shared. However, there are many fields in AzureManagedControlPlane that cannot be shared. Here is a preliminary list of fields to omit from the AzureManagedControlPlaneTemplate, taken from azuremanagedcontrolplane_types.go. Every other field will be included in the template.

  • ResourceGroupName
  • NodeResourceGroupName
  • VirtualNetwork *Include VirtualNetwork.cidrBlock
  • ControlPlaneEndpoint
  • SSHPublicKey
  • DNSServiceIP
  • AADProfile.AdminGroupObjectIDs
  • APIServerAccessProfile.AuthorizedIPRanges

Please let me know what you think about this list and if there are any fields that are missing. For reference, I omitted fields that contain unique identifiers such as names and IPs that can't be shared.

cc @CecileRobertMichon

@CecileRobertMichon
Copy link
Contributor

/milestone v1.8

@k8s-ci-robot k8s-ci-robot added this to the v1.8 milestone Feb 15, 2023
@CecileRobertMichon
Copy link
Contributor

Thanks @willie-yao. That list looks good to me. The only one I'm not 100% sure about is AADProfile.AdminGroupObjectIDs, it might make sense to share that one across clusters if you have an existing AAD group you want to use to be admin on all your clusters (at least that's my interpretation of https://learn.microsoft.com/en-us/azure/aks/managed-aad). What led you to include it in the no-sharing list?

@CecileRobertMichon
Copy link
Contributor

cc @dtzar @nojnhuh @jackfrancis @mtougeron for more eyes on #2684 (comment)

@nojnhuh
Copy link
Contributor

nojnhuh commented Feb 15, 2023

Is it possible with ClusterClass to share part of a field? e.g. it seems like we might be able to share the vnet's cidrBlock since that itself doesn't refer to the name or resource group of the actual vnet. I saw a couple others like that under the vnet spec.

Could DNSServiceIP also be shared? I think that's a private IP address (by default at least) so two clusters could have the same value for that.

Besides those two things looks good to me!

@willie-yao
Copy link
Contributor

What led you to include it in the no-sharing list?

I definitely misunderstood the role of AdminGroupObjectIDs, and thought that the objectIDs were specific to a single cluster, rather than an AAD group you can apply to multiple clusters. I'll remove that from the list!

@willie-yao
Copy link
Contributor

Is it possible with ClusterClass to share part of a field?

Yes it is. I think it'll make sense to include the cidrBlock as well. I also thought that DNSServiceIP was not private so that's why I left it out. I'll go ahead and include those two fields!

@willie-yao
Copy link
Contributor

Update: Will also be omitting the APIServerAccessProfile.AuthorizedIPRanges field as it uses public IP addresses. More documentation on this field can be found here

@willie-yao
Copy link
Contributor

We are currently blocked on this feature until CAPI ClusterClass adds support for MachinePools. It is being worked on! I will still be implementing the types and an E2E test and will validate it once MachinePool support is added.

kubernetes-sigs/cluster-api#5991

@dtzar dtzar added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 5, 2023
@willie-yao willie-yao moved this to Blocked in CAPZ Planning Apr 21, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Jul 4, 2023
@richardcase
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2023
@mboersma mboersma modified the milestones: next, v1.11 Jul 13, 2023
@willie-yao willie-yao modified the milestones: v1.11, next Aug 17, 2023
@mboersma mboersma modified the milestones: next, v1.11, v1.12 Sep 18, 2023
@dtzar dtzar moved this from Blocked to In Progress in CAPZ Planning Sep 28, 2023
@dtzar dtzar added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Sep 28, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in CAPZ Planning Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Archived in project
10 participants