Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose AKS preview features #2625

Closed
Tracked by #3446
nojnhuh opened this issue Sep 2, 2022 · 21 comments · Fixed by #4617
Closed
Tracked by #3446

Expose AKS preview features #2625

nojnhuh opened this issue Sep 2, 2022 · 21 comments · Fixed by #4617
Assignees
Labels
area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type kind/feature Categorizes issue or PR as related to a new feature. kind/proposal Issues or PRs related to proposals. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Milestone

Comments

@nojnhuh
Copy link
Contributor

nojnhuh commented Sep 2, 2022

Goals

  1. Determine a method for making AKS preview features consumable through CAPZ like GA AKS features

Non-Goals/Future Work

  1. Implement every AKS preview feature
  2. Guarantee the stability of AKS preview features in CAPZ beyond what AKS itself supports
  3. Determine how to handle preview features of other APIs used for self-managed clusters

User Story

As a user I would like to configure AKS managed clusters with preview features to take advantage of the latest and greatest functionality available in AKS.

Detailed Description

Some AKS features are not available from Azure's stable API and can only be used from separate preview APIs. Since CAPZ currently only uses the stable AKS APIs, those preview features are not available and must be configured through some other means which introduces friction for users looking to use Cluster API for all their cluster lifecycle management. Exposing AKS preview features in CAPZ would eliminate that friction.

/kind proposal
/area managedclusters

@k8s-ci-robot k8s-ci-robot added kind/proposal Issues or PRs related to proposals. area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type labels Sep 2, 2022
@mtougeron
Copy link
Member

As a user I would love this functionality. Not being able to test preview features is a limitation that I'm bumping up against (e.g., BYO CNI).

Two high-level thoughts that I have on this:

  1. How would we address api version compatibility? For example, if while a feature goes through the preview process there is a change to the interface, would we have to iterate on the apiVersion of the AzureManagedControlPlane? The contract we provide to the users of CAPZ would have to be very explicit and clear to avoid confusion.

  2. With the lower SLA of a preview feature how would that impact the testing matrix that CAPZ would perform again the preview features? Would it significantly increase the load on the maintainers to keep the tests/functionality working if the preview functionality changes behind the scenes before an updated go sdk has been released.

@mboersma
Copy link
Contributor

mboersma commented Sep 8, 2022

Since CAPZ currently only uses the stable AKS APIs

Probably stupid question, but is it not possible to use the preview AKS API only in the managed cluster code? Or can we not vendor in both at the same time?

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Sep 8, 2022

The preview APIs exist side-by-side (at least in the same Go module) with the stable ones in the SDK from what I can see: https://github.com/Azure/azure-sdk-for-go/tree/ca4d22fbbacf9ddc32b42a14113b13ea3feb9ccd/services/preview. So it should be possible to do something pretty granular like:

import (
    stable "github.com/Azure/azure-sdk-for-go/services/..."
    preview "github.com/Azure/azure-sdk-for-go/services/preview/..."
)

if r.previewFeatureDefined() {
    preview.CreateResource(r)
} else {
    stable.CreateResource(r)
}

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Sep 15, 2022

/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2022
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Dec 14, 2022

/lifecycle active

@k8s-ci-robot k8s-ci-robot added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 14, 2022
@dtzar
Copy link
Contributor

dtzar commented Jan 11, 2023

IMO - we shouldn't work on this until we complete #2670 at least for all resources which touch ManagedCluster.
We already know we wouldn't be able to add all the features anyways until we do this.

@dtzar dtzar added kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Apr 5, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 17, 2023

/unassign

@dtzar
Copy link
Contributor

dtzar commented May 22, 2023

The preview APIs should all be available once we have the ASO migration, since the ASO ContainerService today at 2.0.0 supports both:
2023-02-01 (stable) and 2023-02-02-preview

That said, there is a current cadence of a new stable and preview API every month for this service!
It would be good to map out a plan for long term support here as well as decide when we do migrate if we're going to enable just stable, just preview, or both in parallel.

@nojnhuh @CecileRobertMichon - thoughts?

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 22, 2023

I would vote to only enable the stable AKS APIs in the first ASO pass. After that I think it would be good into include the option to use a stable or preview API, updating them as-needed like we do today.

@dtzar
Copy link
Contributor

dtzar commented May 22, 2023

If we're only going to enable one while ASO implementation is also under a feature/experimental flag, I'd vote to only do the preview API.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 22, 2023

The current plan is not to put ASO behind a feature flag: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/eb77ec4944f9a6aa1cc4dbf0e629b45e636f3c13/docs/proposals/20230123-azure-service-operator.md#graduation-criteria

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 22, 2023

One idea I had in the context of ASO is to have an AzureManagedControlPlane simply refer to an ASO ManagedCluster. CAPZ would then only mirror any status or anything else needed to fulfill the CAPI contract to the AzureManagedControlPlane resource, but not make any updates to it and defer that to the user. The referenced ASO ManagedCluster could be any API version, including a preview API version or a newer API version than CAPZ itself communicates with, and ASO's webhooks will do any necessary conversion. A strategy like this may also help enable #3629 and #1173.

There is probably some deal-breaking reason that won't work, but might at least be useful as a reference point.

@dtzar dtzar added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Sep 29, 2023
@theunrepentantgeek
Copy link

theunrepentantgeek commented Nov 1, 2023

I would vote to only enable the stable AKS APIs in the first ASO pass. After that I think it would be good into include the option to use a stable or preview API, updating them as-needed like we do today.

Are you wanting ALL stable versions in ASO?

Because there are consequences to that. Currently there's a ~4MB memory load on api-server per CRD, regardless of whether it's currently in use or not. I'm 80% sure that overhead is per version, as we're talking about removing the v1beta CRDs in the upcoming 2.4.0 ASO release to keep the memory load down.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 1, 2023

I don't think we would ever need every API version at the same time. I think what I originally had in mind here was supporting one stable API version and one preview API version.

@dtzar
Copy link
Contributor

dtzar commented Nov 1, 2023

One "latest" stable and one "latest" preview API version. We should try to align with whatever is agreed upon in Azure/azure-service-operator#2687

I don't think users need the new API version every single month, but we should have a cadence of updates they can expect. Preview more frequently of course versus stable.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 20, 2023

/assign

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Feb 7, 2024

We still absolutely plan to do this, but that likely won't be driven by me.

/unassign

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Feb 12, 2024

I'd like for us give another go at seeing if it really is impossible to change the apiVersion in a patch (#4527), since that would be the easiest way for us to enable this. Even if we can get that to work, that would be a little riskier for users since ASO's conversion wouldn't be invoked. Last I looked though, there wasn't a good way to directly invoke controller-runtime's conversion logic outside the context of webhooks.

The one way that would definitely work is to basically duplicate azure/services/managedclusters and replace the ASO API version to the preview and then make one of them no-op depending on some preview: true flag we add to the CAPZ resources, but that would be a maintenance nightmare down the road. I think this should be at least a last resort, if not block this issue entirely.

@dtzar
Copy link
Contributor

dtzar commented Feb 12, 2024

One design idea would be users to specify "all-in on preview API" and then it switches everything with CAPZ's ASO to use that latest ASO preview version. Then they could utilize patch with that ASO version as well in its normal fashion.

Another advantage with this approach would be that a developer could also write a native CAPZ feature in the same way and when the feature moves out from preview to GA, it should "just work" (unless of course there were some significant underlying changes from preview API to GA).

@willie-yao
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type kind/feature Categorizes issue or PR as related to a new feature. kind/proposal Issues or PRs related to proposals. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Archived in project
9 participants