Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for joining clusters to AKS Fleet #4316

Merged
merged 1 commit into from
Jan 18, 2024

Conversation

willie-yao
Copy link
Contributor

@willie-yao willie-yao commented Nov 27, 2023

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds support for joining managed clusters to AKS Fleet via the field fleetManager in the AzureManagedControlPlane.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Steps to test:

  1. Create a fleet manager in AKS. Microsoft docs: https://learn.microsoft.com/en-us/azure/kubernetes-fleet/quickstart-create-fleet-and-members
  2. Add the following field to your AzureManagedControlPlane spec
fleetManager: 
    group: test-group
    managerName: fleet-manager-name
    managerResourceGroup: fleet-manager-rg
  1. Create the managed cluster and observe it being added as a member to your fleet manager
  • cherry-pick candidate

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Add support for joining clusters to AKS Fleet

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 27, 2023
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 27, 2023
@willie-yao willie-yao force-pushed the aks-fleets-member branch 2 times, most recently from 0cdf4e1 to 5d61e6f Compare November 27, 2023 20:47
api/v1beta1/types.go Outdated Show resolved Hide resolved
azure/scope/managedcontrolplane.go Outdated Show resolved Hide resolved
azure/services/fleetsmember/spec.go Outdated Show resolved Hide resolved
import (
"context"

asocontainerservicev1 "github.com/Azure/azure-service-operator/v2/api/containerservice/v1api20230315preview"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is fleetmember in the same namespace as AKS in ASO? that seems wrong? We release with a different SDK, different API version, etc...
our latest API version is 0815preview.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure. ASO includes both under asocontainerservicev1 here: https://azure.github.io/azure-service-operator/reference/containerservice/

azure/services/fleetsmember/spec.go Outdated Show resolved Hide resolved

fleetsMember := &asocontainerservicev1.FleetsMember{}
fleetsMember.Spec = asocontainerservicev1.Fleets_Member_Spec{}
fleetsMember.Spec.AzureName = s.ManagerName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why AzureName = ManagerName?
member.Name is the name of the member.
it shouldn't be the same as the fleet.

and same comment as further up on naming consistency. is the Azure prefix needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AzureName is defined as a part of an ASO spec type, so I can't change that

}
fleetsMember.Spec.Group = ptr.To(s.Group)
fleetsMember.Spec.ClusterResourceReference = &genruntime.ResourceReference{
ARMID: azure.ManagedClusterID(s.SubscriptionID, s.ResourceGroup, s.Name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is s.ResourceGroup the resourceGroup of the target AKS cluster? or the resource group of the member resource?
I assume that memberSpec.ResourceGroup is the RG of the member?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the resource group of both the target AKS cluster and the member resource. The member resource is then attached the owning fleet manager. I may be misunderstanding how member resources work though. Are they supposed to be in a different resource group?

Copy link
Member

@serbrech serbrech Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the member resource is a child resource of the fleet manager, just like an agentpool is a child resource of an aks cluster.
the member cannot be in a different RG than the fleet manager.

the aks cluster that the member resource targets, can be anywhere.

it is the resource group of both the target AKS cluster and the member resource.

This would imply that all AKS clusters that join a fleet must be in the same resource group as the fleet manager, which means they would all have to be in the same subscription as well.

  • subscription-1

    • MyFleet-RG
      • Fleet
        • Member A -> aks-a cluster (located anywhere)
        • Member B -> aks-b cluster (located anywhere)
  • subscription-2

    • Cluster-A-RG
      • aks-a cluster
  • subscription-3

    • Cluster-B-RG
      • aks-b cluster

Copy link
Contributor Author

@willie-yao willie-yao Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the member cannot be in a different RG than the fleet manager.

Is the spec.Group field the resource group of the fleet member then? I don't see any other place where I specified the resource group for the member. If so, I can set Group to s.ManagerResourceGroup. For the ClusterResourceReference, I think the current logic makes sense as the cluster resource group is separate from the manager/member resource group.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

member.Group is an "UpdateGroup" that the user can provide to group members within a fleet.
"test", "staging", "prod-early", "prod" for example.
like a static annotation on the member if you will. not a resourcegroup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I just re-tested and it seems like the member cluster I made is in a different resource group than the manager and it seems to be working fine?
Member cluster:
image
And my fleet manager is in a separate resource group fleet-test

a member of.
properties:
group:
description: Group is the group this member belongs to for multi-cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that correct? why is the group under the fleetsManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

codecov bot commented Nov 28, 2023

Codecov Report

Attention: 45 lines in your changes are missing coverage. Please review.

Comparison is base (2c47af5) 62.20% compared to head (ad60a09) 62.11%.
Report is 4 commits behind head on main.

Files Patch % Lines
azure/scope/managedcontrolplane.go 0.00% 16 Missing ⚠️
api/v1beta1/azuremanagedcontrolplane_webhook.go 36.84% 11 Missing and 1 partial ⚠️
azure/services/fleetsmembers/spec.go 60.00% 10 Missing ⚠️
azure/services/fleetsmembers/fleetsmembers.go 0.00% 5 Missing ⚠️
azure/defaults.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4316      +/-   ##
==========================================
- Coverage   62.20%   62.11%   -0.10%     
==========================================
  Files         187      189       +2     
  Lines       18568    18642      +74     
==========================================
+ Hits        11551    11580      +29     
- Misses       6381     6425      +44     
- Partials      636      637       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

main.go Show resolved Hide resolved
@willie-yao
Copy link
Contributor Author

/retest

@willie-yao
Copy link
Contributor Author

@serbrech I have addressed all comments and also made a few improvements to the implementation. Let me know what you think!

@willie-yao willie-yao force-pushed the aks-fleets-member branch 3 times, most recently from 4e81e22 to d688d75 Compare December 2, 2023 00:41
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2023
@willie-yao
Copy link
Contributor Author

/retest

@serbrech
Copy link
Member

serbrech commented Dec 5, 2023

Screenshot 2023-12-05 122911

I noticed in our fleet logs that your test was proactively trying to join the cluster as a member before it was created.
it gets repetitively 409.
Do you have a way to add a condition on your side to only join once the target cluster is created? or is it out of scope and unusual in cluster-api?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 8, 2023
@willie-yao willie-yao force-pushed the aks-fleets-member branch 2 times, most recently from 93b25d0 to 434d82b Compare January 17, 2024 21:01
@Jont828
Copy link
Contributor

Jont828 commented Jan 17, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6ba3ca6e1cc7a596ccd48ae841e90d759a860158

@jackfrancis
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 17, 2024
@nojnhuh
Copy link
Contributor

nojnhuh commented Jan 17, 2024

/hold I have some comments queued up

@@ -260,6 +261,10 @@ func (mw *azureManagedControlPlaneWebhook) ValidateUpdate(ctx context.Context, o
allErrs = append(allErrs, errs...)
}

if errs := m.validateFleetsMember(old); len(errs) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do any validation for AzureManagedControlPlaneTemplate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need validation for the template since the name is the only thing we need to validate, and we don't specify the name in the template.

@@ -110,6 +110,10 @@ type AzureManagedControlPlaneSpec struct {
// Immutable.
// +optional
DNSPrefix *string `json:"dnsPrefix,omitempty"`

// FleetsMember is the spec for the fleet this cluster is a member of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a link to AKS docs we could include here?

@@ -211,6 +211,10 @@ type AzureManagedControlPlaneClassSpec struct {
// DisableLocalAccounts disables getting static credentials for this cluster when set. Expected to only be used for AAD clusters.
// +optional
DisableLocalAccounts *bool `json:"disableLocalAccounts,omitempty"`

// FleetsMember is the spec for the fleet this cluster is a member of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re: AKS doc link

Comment on lines 43 to 46
spec := scope.AzureFleetsMemberSpec()
if spec != nil {
svc.Specs = []azure.ASOResourceSpecGetter[*asocontainerservicev1.FleetsMember]{spec}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way I can think of to get rid of this little dance is for AzureFleetsMemberSpec to return a slice of 0 or 1 elements instead of a possibly-nil pointer to one. Then this could just be:

Suggested change
spec := scope.AzureFleetsMemberSpec()
if spec != nil {
svc.Specs = []azure.ASOResourceSpecGetter[*asocontainerservicev1.FleetsMember]{spec}
}
svc.Specs = scope.AzureFleetsMemberSpecs()

}

// Service provides operations on Azure resources.
type Service struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this struct definition or can we refactor this like #4224?

g.Expect(mgmtClient.Update(ctx, infraControlPlane)).To(Succeed())
}, input.WaitIntervals...).Should(Succeed())

By("Waiting for the managed cluster to finish updating")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking this PR, but I'm still curious as to exactly what's being updated here if removing the fleets member field doesn't actually delete the ASO resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there was an issue with the controller trying to reconcile the fleets member as we're trying to delete it, and it may have been causing some problems. I'll try testing it again without this step and maybe those problems were due to a different issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the error. It happens when I try to delete the fleets member:

this operation depends on is not in an expected state. Expected: ProvisioningState in (Succeeded, Canceled, Failed, Deleting, Upgrading). Actual: ProvisioningState is Updating. Change the resource to expected state and try again.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2024
@@ -1000,6 +1000,16 @@ type AzureBastion struct {
EnableTunneling bool `json:"enableTunneling,omitempty"`
}

// FleetsMember defines the fleets member configuration.
// [AKS doc]: https://learn.microsoft.com/en-us/azure/templates/microsoft.containerservice/2023-03-15-preview/fleets/members
Copy link
Contributor

@nojnhuh nojnhuh Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This [AKS doc]: ... syntax basically only defines a variable that you need to reference for it to actually render in the generated docs:

Suggested change
// [AKS doc]: https://learn.microsoft.com/en-us/azure/templates/microsoft.containerservice/2023-03-15-preview/fleets/members
// See also [AKS doc].
//
// [AKS doc]: https://learn.microsoft.com/en-us/azure/templates/microsoft.containerservice/2023-03-15-preview/fleets/members

https://deploy-preview-4316--kubernetes-sigs-cluster-api-provider-azure.netlify.app/reference/v1beta1-api#infrastructure.cluster.x-k8s.io/v1beta1.FleetsMember

Copy link
Contributor

@nojnhuh nojnhuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 17, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1bf6b4cf8533455daae165b411807cbe6661db79

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jan 17, 2024

@willie-yao: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-conformance-custom-builds 434d82b link false /test pull-cluster-api-provider-azure-conformance-custom-builds
pull-cluster-api-provider-azure-windows-custom-builds ad60a09 link false /test pull-cluster-api-provider-azure-windows-custom-builds

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@willie-yao
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 337a278 into kubernetes-sigs:main Jan 18, 2024
25 of 29 checks passed
@willie-yao
Copy link
Contributor Author

Looks like the PR merged without these tests passing... I think the custom windows one was failing but I don't think it's related to this PR
image

@nojnhuh nojnhuh added the area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type label Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Automated way to join AKS clusters to Fleet Hub
9 participants