Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ScaleSetPriority for AzureManagedMachinePool #2604

Conversation

jackfrancis
Copy link
Contributor

@jackfrancis jackfrancis commented Aug 25, 2022

What type of PR is this?

/kind feature
/kind bug

What this PR does / why we need it:

This PR implements the ScaleSetPriority feature for AzureManagedMachinePool. See the official AKS documentation for Spot node pools here:

https://docs.microsoft.com/en-us/azure/aks/spot-node-pool

This PR ensures that we don't override (e.g., accidentally delete) any AKS-enforced labels that may make it to the user-configurable part of the the node pool labels configuration surface area.

Until this AKS issue is addressed, capz must have this adaptive update behavior:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2460

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Implement ScaleSetPriority for AzureManagedMachinePool

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 25, 2022
@jackfrancis jackfrancis force-pushed the synthesize-aks-capz-nodepool-labels branch from 66f13f0 to 9afb6b4 Compare August 25, 2022 21:56
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 25, 2022
@jackfrancis jackfrancis marked this pull request as ready for review August 25, 2022 21:56
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 25, 2022
@jackfrancis jackfrancis changed the title synthesize node label ownership between capz and AKS don't override AKS system node pool labels Aug 25, 2022
@jackfrancis jackfrancis force-pushed the synthesize-aks-capz-nodepool-labels branch 2 times, most recently from a7d6791 to 3b77f73 Compare August 25, 2022 22:54
@@ -319,6 +336,21 @@ func (m *AzureManagedMachinePool) validateName() error {
return nil
}

func (m *AzureManagedMachinePool) validateNodeLabels() error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to do this validation in the kubebuilder API type spec (e.g., using the Pattern= interface to define a required regular expression match) because the API type is a map[string]string.

@jackfrancis jackfrancis force-pushed the synthesize-aks-capz-nodepool-labels branch 2 times, most recently from 27355ce to 4548d7f Compare August 29, 2022 18:45
@jackfrancis jackfrancis changed the title don't override AKS system node pool labels WIP: Implement ScaleSetPriority for AzureManagedMachinePool Aug 29, 2022
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 29, 2022
@jackfrancis jackfrancis added area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 29, 2022
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 29, 2022
@@ -98,6 +98,11 @@ type AzureManagedMachinePoolSpec struct {
// +kubebuilder:validation:Enum=Linux;Windows
// +optional
OSType *string `json:"osType,omitempty"`

// ScaleSetPriority specifies the ScaleSetPriority value. Default to Regular. Possible values include: 'Regular', 'Spot'
// +kubebuilder:validation:Enum=Regular;Spot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis if you send the Regular value. AKS isn't setting it to Regular but returns the empty.

Later any other change to nodepool will fail as Azure API returns the error that it cannot change from "" to Regular

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to reproduce this with labels changes to the node pool. Which node pool changes were you seeing this on when ScaleSetPriority was set to "Regular"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I did some more manual tests and this looks fine. Creating a cluster with an explicit scaleSetPriority: Regular AzureManagedMachinePool configuration does not have any apparent negative side-effects.

I think this PR is ready to gol

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still doesn't work for me.

steps to reproduce:

  1. create an agentpool with explicit ScaleSetPriority set to Regular
  2. verify in az aks nodepool show --cluster-name cluster-name -n nodepool-name that it has "scaleSetPriority": null,
  3. try to change a label in agentpool and see reconcile error Message=\"Changing property 'properties.ScaleSetPriority' is not allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you testing using a build from this branch? In this current PR implementation, I definitely can't reproduce. I do observe what you're seeing from the AKS API:

$ az aks nodepool show --cluster-name capz-e2e-l9d26h-aks -n pool1 -g capz-e2e-l9d26h-aks | grep 'scaleSetPriority'
  "scaleSetPriority": null,

However, I'm able to continually update the node pool using capz (using node labels below as an example):

$ k get azuremanagedmachinepool/pool1 -n capz-e2e-l9d26h -o yaml | grep -A 5 '  nodeLabels'
  nodeLabels:
    foo: bar
  osDiskSizeGB: 40
  osDiskType: Ephemeral
  osType: Linux
  providerIDList:

You can see I've already applied a foo: bar node label. I'll now add a 2nd label:

$ k edit azuremanagedmachinepool/pool1 -n capz-e2e-l9d26h
azuremanagedmachinepool.infrastructure.cluster.x-k8s.io/pool1 edited
$ k get azuremanagedmachinepool/pool1 -n capz-e2e-l9d26h -o yaml | grep -A 5 '  nodeLabels'
  nodeLabels:
    foo: bar
    hello: world
  osDiskSizeGB: 40
  osDiskType: Ephemeral
  osType: Linux

Now if I get the node pool data from the AKS API I see the updated label:

$ az aks nodepool show --cluster-name capz-e2e-l9d26h-aks -n pool1 -g capz-e2e-l9d26h-aks | grep -A 5 'nodeLabels'
  "nodeLabels": {
    "foo": "bar",
    "hello": "world"
  },
  "nodePublicIpPrefixId": null,
  "nodeTaints": [

As expected, ScaleSetPriority is unchanged from either source:

$ az aks nodepool show --cluster-name capz-e2e-l9d26h-aks -n pool1 -g capz-e2e-l9d26h-aks | grep 'scaleSetPriority'
  "scaleSetPriority": null,
$ k get azuremanagedmachinepool/pool1 -n capz-e2e-l9d26h -o yaml | grep '  scaleSetPriority'
  scaleSetPriority: Regular

I think this PR has overcome this particular pathology. We should ship add'l E2E tests with this PR to prove it, but I'm confident we're good here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely its working now, does Azure API changed something?
in any case I agree to merge this

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2022
@jackfrancis jackfrancis force-pushed the synthesize-aks-capz-nodepool-labels branch from 4548d7f to efc23fc Compare August 31, 2022 22:38
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2022
@jackfrancis jackfrancis force-pushed the synthesize-aks-capz-nodepool-labels branch from efc23fc to a5b45c8 Compare August 31, 2022 22:57
@jackfrancis
Copy link
Contributor Author

@nojnhuh @CecileRobertMichon I'd like to advocate we move forward with this after further reviewing the existing code for vulnerabilities to the introduction of lots of requests (and thus API throttling). For the record, the scenario I'm concerned about it:

  1. Scale out a spot pool successfully
  2. Azure recalculates spot pricing and deletes the vms that underlie our cluster nodes
  3. capz continually initiates a PUT against Azure APIs to restore its source of goal state truth

I think there is possibly some work to do to streamline how capz and Azure communicate in spot scenarios, but I'm not concerned about exponential request leakage to Azure.

tl;dr ready for final review

@jackfrancis jackfrancis added this to the v1.6 milestone Sep 29, 2022
Copy link
Contributor

@nojnhuh nojnhuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple nits, overall lgtm

exp/api/v1beta1/azuremanagedmachinepool_webhook.go Outdated Show resolved Hide resolved
}
var client client.Client
for _, tc := range tests {
tc := tc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think doing this is necessary here because we're never taking the address of tc in the subtest (like with &tc). Certainly doesn't hurt to keep it here though in case this test needs to do that in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points:

  1. It's correct that we do this as a best-practice to avoid the non-deterministic UT outcomes that result from not doing this when we need to.
  2. Doing it all the time as a rote practice tends to make the developer forget the exact edge case root cause that makes this pattern a necessary thing and it becomes a kind of incantation.

🤷

exp/api/v1beta1/azuremanagedmachinepool_webhook.go Outdated Show resolved Hide resolved
@jackfrancis jackfrancis force-pushed the synthesize-aks-capz-nodepool-labels branch from 7937424 to daf9414 Compare September 30, 2022 17:19
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 5, 2022
@jackfrancis jackfrancis force-pushed the synthesize-aks-capz-nodepool-labels branch from daf9414 to d498079 Compare October 5, 2022 20:39
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 5, 2022
@jackfrancis
Copy link
Contributor Author

/retest


// mergeSystemNodeLabels appends any kubernetes.azure.com-prefixed labels from the AKS label set
// into the local capz label set.
func mergeSystemNodeLabels(capz, aks map[string]*string) map[string]*string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this intentionally drop any labels that are not system labels (ie. no kubernetes.azure.com prefix) but aren't part of the CAPZ spec? for example if they were added by some external process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think the idea here is that capz is the "exclusive" maintainer of labels, and the update interface has been defined such that every time you push a labels change you need to include the entire set of labels. Ref:

https://learn.microsoft.com/en-us/azure/aks/use-labels#updating-labels-on-existing-node-pools

The think about the kubernetes.azure.com-prefixed labels is (IMO) and AKS bug, and hopefully we can drop this foo at some point in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis jackfrancis added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 6, 2022
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @nojnhuh

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: GitHub didn't allow me to assign the following users: nojnhuh.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/lgtm
/assign @nojnhuh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022
@nojnhuh
Copy link
Contributor

nojnhuh commented Oct 7, 2022

/assign

@jackfrancis
Copy link
Contributor Author

/retest

Copy link
Contributor

@nojnhuh nojnhuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CecileRobertMichon
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2022
@k8s-ci-robot k8s-ci-robot merged commit 724a4e7 into kubernetes-sigs:main Oct 10, 2022
@jackfrancis jackfrancis deleted the synthesize-aks-capz-nodepool-labels branch December 9, 2022 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spot nodepool loses "kubernetes.azure.com/scalesetpriority: spot" label
5 participants