-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: aks provider #482
feat: aks provider #482
Conversation
76eb8fa
to
3c4d4e3
Compare
9952b5c
to
f8412eb
Compare
/retest |
2e136ab
to
e52ad2b
Compare
I think we should consider moving this to an |
So I changed to title to reflect that this isn't really a proof of concept, since I'd attempted this once already independently: https://github.com/alexeldeib/caks/blob/ace/private/controllers/azuremanagedmachine_controller.go I'd consider this PR the "real" attempt to clean everything up and get it usable, especially with machine pool and my learnings from the initial cut. SummaryBriefly summarizing approach + remaining work for what i'd consider a mergable cut. This PR introduces two new types, AzureManagedCluster and AzureManagedMachinePool, and their associated controllers. AzureManagedCluster maps fairly directly to the specification for an AKS cluster. For generic Kubernetes configuration, the CAPI Cluster object contains some of the required information, AzureManagedCluster only models the set of options specific to AKS bootstrapping. AKS requires a default machine pool. We model this as part of the AzureMangedClusterSpec by adding a reference to an AzureManagedMachinePool in the same namespace. We require the user to explicitly define this node pool the same as any other (TODO: consider defaulting). CAPI machine pools also map nicely into AKS agent pools, so nothing super surprising there. AKS still lets us see the underlying MC resources, and we record all the resulting VMSS nodes into the AzureManagedMachinePool. This PR does not introduce any defaulting or validation. defaulting + update validation in particular might be something to consider in the future. Remaining Work
|
@alexeldeib I didn't dive too deep into the implementation yet, but I do wonder if it would provide a better user experience (or at least similar upgrade experience) if this also made use of a control plane provider as well instead of just AzureManagedCluster and AzureManagedMachinePool. I'm not sure if this would require moving the majority of the AzureManagedCluster implementation to an AzureManagedControlPlane and implementing a mostly noop AzureManagedCluster resource, or if it would require some more fine grained separation. |
@detiber seems reasonable to me. I can shuffle things around and see if I hit any issues. |
61971ec
to
b8244a6
Compare
@devigned @CecileRobertMichon I think this works fine as a control plane provider now, and I've cleaned up the CIDR issue. Going to work on testing and flesh out the AKS options, maybe validation on various edge cases after that. Would appreciate any glaring feedback you see, I know it's a big PR already. |
b0d1600
to
452d2f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed first half of all the changes, mostly nits :)
return errors.New("expected agent pool specification") | ||
} | ||
|
||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing the generic err != nil
or we don't care about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following paths are a little hard to read, maybe use a case switch with some comments on what the logic is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, I originally left it off because we wouldn't care about transient errors but I should have added back when I differentiated between Create and Update.
I'll also add a doc comment that the reason for differentiating the paths is that you can't use the ManagedClusters API to create agent pools on update, only at creation time.
func (s *Service) Reconcile(ctx context.Context, spec interface{}) error { | ||
managedClusterSpec, ok := spec.(*Spec) | ||
if !ok { | ||
return errors.New("expected managed cluster specification") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, re:avoiding interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 this is vestigial and we are working on a design proposal to change this across the entire project. At this point, it's followed to be consistent with the rest of the repo.
@CecileRobertMichon or @justaugustus may have more background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep right now interfaces are in a bunch of places, let's stay consistent for now - currently working on a proposal to refactor all of them as David said.
// HACK: set the last octet of the IP to .10 | ||
// This ensures the dns IP is valid in the service cidr without forcing the user | ||
// to specify it in both the Capi cluster and the Azure control plane. | ||
// https://golang.org/src/net/ip.go#L48 | ||
ip[15] = byte(10) | ||
dnsIP := ip.String() | ||
properties.NetworkProfile.DNSServiceIP = &dnsIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to suggestions 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n.b., consider how defaulting should work if we allow but don't require both dns and service IP as separate values (or if that's valuable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be something that we might move into a defaulting webhook? And probably some docs I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I was mostly trying to scope down this PR as I keep finding things to add 🙂
} else if strings.EqualFold(*managedClusterSpec.NetworkPolicy, "Calico") { | ||
properties.NetworkProfile.NetworkPolicy = containerservice.NetworkPolicyCalico | ||
} else { | ||
return fmt.Errorf("invalid network policy: '%s'. Allowed options are 'calico' and 'azure'", *managedClusterSpec.NetworkPolicy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in a validation webhook or kubebuilder enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, I'm marking webhooks as work for a followup because this is already a large PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create issues when this actually merges // make a list in the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did do the Kubebuilder enums, btw, do we currently support clusters without structural schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these
// +kubebuilder:validation:Enum=Calico;Azure |
Ignore the comment about structural schema, I was confusing that with trivialVersions conversion in Kubebuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, that should do it!
Co-authored-by: Vince Prignano <[email protected]> Signed-off-by: Alexander Eldeib <[email protected]>
@alexeldeib is this ready to go? |
I think so, I'll start breaking things out into separate issues:
|
@alexeldeib please open the follow up issues and link them to the PR. I'll do a last round review in the meantime. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold cancel
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexeldeib, CecileRobertMichon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
demo for a managed cluster with machine pools. Still a bit rough around the edges but basic scenarios (create, add pool, delete pool, scale) work well. Sharing for some feedback, finally got around to trying the approach from here: kubernetes-sigs/cluster-api#2045 (comment)
xref:
#337
kubernetes-sigs/cluster-api#2045
kubernetes-sigs/cluster-api#980
fyi @vincepri @detiber @jsturtevant for some 👀 to start
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Special notes for your reviewer:
To test this with Tilt, you can use the normal Azure config + this change in CAPI upstream:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
Release note: