Integrate AzureMachine with AzureManagedControlPlane (BYO nodes on AKS) #826

alexeldeib · 2020-07-23T23:52:45Z

/kind feature

Describe the solution you'd like
Extend CAPZ as necessary to allow CAPZ-managed AzureMachines to join AKS clusters provisioned by AzureManagedControlPlane. A PoC exists in #822, which itself relies on #824 and #825. On top of those changes, we need to add logic in AzureMachineController to allow instantiating the appropriate cluster describer at runtime using the duck typing from #825.

Anything else you would like to add:

Currently there are two issues with the PoC:

nodes register with master labels. I think this is due to using the admin kubeconfig to join nodes. we need to drop privileges and try to join in that way. We can also try doing a token join via kubeconfig with a low privilege serviceaccount.
need to validate azure-cni functionality and/or debug any issues there

Warning

This is 100% unsupported by AKS right now, I just think it's cool and I don't see any technical reasons it can't work. This feature would be indefinitely experimental unless AKS supported BYO node

alexeldeib · 2020-09-14T18:17:32Z

Issues I encountered working on https://github.com/kubernetes-sigs/cluster-api-provider-azure/compare/master...alexeldeib:ace/integrate?expand=1

Critical issues // functional problems

Currently nodes use an admin kubeconfig to join the cluster, then patch their own role and labels to drop privileges and become agents. This is awful security practice and I feel is the most critical item to fix before anyone could use this.
Azure CNI requires pre-allocating IP for pods to the VMs, which is not currently handled by CAPZ. This would be a critical requirement for anyone using Azure CNI, unless we can take advantage of another IPAM/IP allocation scheme.
Joining nodes with CAPBK uses 2 preKubeadmCommands and 2 postKubeadmCommands, which is less than ideal. We should consider encapsulating this more cleanly, either in a bootstrap provider or VHD/script somehow.

Minor issues // implementation details

group service should not always attempt to reconcile the resource group, and it should differentiate node/control plane resource groups
AKS vnet name is not predictable and nodes require it for provisioning. The managed cluster API does not return the vnet name, but we can extract it from the node pools API which contains vnetSubnetID as a fully qualified ARM resource
anywhere that currently uses a cluster scope now needs to accept either an AzureCluster or AzureManagedControlPlane as a ClusterDescriber, increasing code bloat.
Patching node labels/roles seems to work well on VM, but seems to have issues on VMSS when nodes get deleted and recreated?

Other observations // no action necessarily required

CAPZ VHD does not include Azure CNI, which would be required for nodes joining AKS cluster
AKS VHD does not include kubeadm, which would be required for CAPBK-driven bootstrap
Can't access AKS Shared Image Gallery (??) using a random client, get auth failures

More testing required

service type LoadBalancer (does it work?)
further validation on CNI functionality

alexeldeib · 2020-09-14T18:25:24Z

/assign

fejta-bot · 2020-12-13T23:00:45Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

CecileRobertMichon · 2020-12-17T18:59:18Z

/remove-lifecycle stale

fejta-bot · 2021-06-14T20:17:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

alexeldeib · 2021-06-15T19:15:24Z

/remove-lifecycle stale
/lifecycle frozen

still interested in this, but it's not super critical for the AzureMangedCluster/AzureManagedControlPlane functionality. I probably will have a small spike on some of the gritty details here, and then would need a lot of incremental refactoring to make the reconcilers/scopes work as expected.

alexeldeib · 2021-07-14T11:24:19Z

note: we should add .status.externalManagedControlPlane to AMCP so node deletion in machine controller works properly:
https://github.com/kubernetes-sigs/cluster-api/blob/master/controllers/machine_controller.go#L442-L467

bridgetkromhout · 2022-12-21T17:57:51Z

Related: kubernetes/kubernetes#112313

CecileRobertMichon · 2023-09-11T16:51:57Z

This is no longer blocked since #3861 is done

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 23, 2020

CecileRobertMichon added this to the next milestone Aug 4, 2020

alexeldeib mentioned this issue Sep 13, 2020

[wip] 💎 add vnet to managed control plane status #929

Closed

3 tasks

k8s-ci-robot assigned alexeldeib Sep 14, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 17, 2020

zhiweiv mentioned this issue Mar 4, 2021

Feature request: custom VHD or a way to prepull docker images offline Azure/AKS#1532

Open

CecileRobertMichon added the area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type label Mar 16, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 15, 2021

alexeldeib mentioned this issue Jun 19, 2021

move ClusterDescriber methods to CRD types #825

Closed

alexeldeib mentioned this issue Aug 11, 2021

Virtual network in managed clusters should be moved from AMCP to AMMP #1599

Closed

yywandb mentioned this issue Jan 9, 2023

Load balancer behavior changed for unmanaged nodes kubernetes/kubernetes#112313

Closed

alexeldeib mentioned this issue Jan 18, 2023

CAPZ creates a vnet, subnet while creating an AKS cluster. #3051

Closed

CecileRobertMichon added this to CAPZ Planning Apr 5, 2023

jackfrancis removed this from the next milestone May 11, 2023

CecileRobertMichon assigned CecileRobertMichon and unassigned alexeldeib Aug 14, 2023

dtzar moved this to In Progress in CAPZ Planning Sep 18, 2023

dtzar added this to the v1.12 milestone Sep 18, 2023

CecileRobertMichon mentioned this issue Sep 28, 2023

Allow joining AzureMachinePools to AKS clusters #4052

Merged

4 tasks

dtzar added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 31, 2023

k8s-ci-robot closed this as completed in #4052 Nov 4, 2023

github-project-automation bot moved this from In Progress to Done in CAPZ Planning Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate AzureMachine with AzureManagedControlPlane (BYO nodes on AKS) #826

Integrate AzureMachine with AzureManagedControlPlane (BYO nodes on AKS) #826

alexeldeib commented Jul 23, 2020 •

edited

Loading

alexeldeib commented Sep 14, 2020 •

edited

Loading

alexeldeib commented Sep 14, 2020

fejta-bot commented Dec 13, 2020

CecileRobertMichon commented Dec 17, 2020

fejta-bot commented Jun 14, 2021

alexeldeib commented Jun 15, 2021

alexeldeib commented Jul 14, 2021

bridgetkromhout commented Dec 21, 2022

CecileRobertMichon commented Sep 11, 2023

Integrate AzureMachine with AzureManagedControlPlane (BYO nodes on AKS) #826

Integrate AzureMachine with AzureManagedControlPlane (BYO nodes on AKS) #826

Comments

alexeldeib commented Jul 23, 2020 • edited Loading

Warning

alexeldeib commented Sep 14, 2020 • edited Loading

Critical issues // functional problems

Minor issues // implementation details

Other observations // no action necessarily required

More testing required

alexeldeib commented Sep 14, 2020

fejta-bot commented Dec 13, 2020

CecileRobertMichon commented Dec 17, 2020

fejta-bot commented Jun 14, 2021

alexeldeib commented Jun 15, 2021

alexeldeib commented Jul 14, 2021

bridgetkromhout commented Dec 21, 2022

CecileRobertMichon commented Sep 11, 2023

alexeldeib commented Jul 23, 2020 •

edited

Loading

alexeldeib commented Sep 14, 2020 •

edited

Loading