✨ Add externallyManagedReplicaCount field to MachinePool #5685

dthorsen · 2021-11-16T21:22:25Z

What this PR does / why we need it:

User Story

As a cluster operator, I would like to have the option to utilize external scaling applications for autoscaling infrastructure managed under a MachinePool so that we can take advantage of the capabilities of existing infrastructure scaling tools if we chose.

Detailed Description

There are many potential scaler implementations that support things like AWS ASG and Azure VMSS. With a small change, cluster-api can be made to optionally allow external software to manage the scaling of the node pools without needing to have support for cluster-api. Some examples of scalers that would benefit from this solution are:

Kubernetes cluster-autoscaler using the existing cloud-provider implementations. This provides a stopgap while cluster-autoscaler's support for CAPI MachinePools to mature and reach feature parity with the existing cloud provider implementations. Incidentally, this would also allow the CAPZ provider add AKS' native managed node pool autoscaling support, which provides an fully-managed installation of cluster-autoscaler on the AKS control plane itself completely managed by the cloud provider, Azure.
Atlassian escalator
Spotinst (an infrastructure provider for spotinst does not yet exist but interest has been expressed by the community for one to exist)

Fixes #5658

vincepri · 2021-11-17T16:53:37Z

exp/api/v1beta1/machinepool_types.go

+	// Whether the Replicas value is externally managed. This is useful when
+	// the infrastructure is a managed machine pool, or if the cluster-autoscaler is scaling
+	// the underlying cloud infrastructure directly outside of cluster-api. Set this to true
+	// if you wish the replica count to be managed outside of cluster-api. Defaults to false.
+	// +optional
+	ExternallyManagedReplicaCount *bool `json:"externallyManagedReplicaCount,omitempty"`


Is this field used anywhere?

It will be used to signal to the infrastructure provider that it is responsible for managing the spec.replicas and relevant status fields on the MachinePool. So there is no implementation code in the MachinePool itself yet. That may change a little as the MachinePool Machines implementation begins though.

CecileRobertMichon · 2021-11-18T20:21:56Z

/assign @devigned @mboersma

devigned

If this is intended to be a stop gap until more mature support in Cluster Autoscaler, did you consider something a little more temporary, like an annotation?

dthorsen · 2021-11-19T18:30:24Z

@devigned It is intended for more than a temporary stop gap, though stop gap is one use case. Some other example use cases are listed above. This field is intended to provide flexibility and permit cluster operator choice of scaler software. Without this field, cluster operators that wish to use external scaling tools are blocked from using cluster-api to manage node pools. That seems unnecessarily restrictive.

If we place this in an annotation instead, it seems likely to me this will be a very long lived annotation, especially considering developers of other scaling solutions may never target scaling via cluster API. It assumes at some point in the future we intend to make cluster API incompatible with underlying managed cloud features or external software that manipulate the pools outside of cluster API. MachinePools were originally conceived to support this exact use case and it is a key differentiator from MachineDeployments. One that we use heavily today and others also wish to do.

dthorsen · 2021-12-01T18:12:12Z

I can rebase this tonight to resolve merge conflicts but is there anything preventing this from being merged? I'd like to start on the AWSMachinePool implementation soon.

mboersma · 2021-12-02T16:25:13Z

I don't think this is something that the MachinePool Machines proposal requires, so no objections there. It does seem odd to add a field that isn't used by any logic in CAPI, just in infra providers. But I understand your use case and see how it could be useful in certain cluster-autoscaler configurations like AKS, so 👍🏻 .

xvzf · 2021-12-28T15:13:54Z

Hey guys is there an update here?

dthorsen · 2021-12-28T16:59:28Z

@xvzf My apologies, I fell behind on this. I'll rebase this PR today. Most of the maintainers are out for holidays but I expect this should get merged quickly once folks return.

enxebre · 2022-01-05T18:44:04Z

This seems to me a natural case to differentiate nil from zero. The semantic could be included into the existing machinePool.Replicas field. machinePool.Replicas=nil then machinePool.status.replicasExternallyManaged=true. Then each implementation reacts appropriately.

dthorsen · 2022-01-06T20:48:23Z

@enxebre @fabriziopandini @vincepri @CecileRobertMichon
The origin story of this field is really old. It was suggested in yesterday's meeting to use nil replicas to signal this instead, and since the context is very old I had to go relearn why we had not used that approach. To clarify, we use this field extensively today internally in a fork of cluster-api. We did consult with the community before doing so as well in an attempt to plan ahead and avoid diverging from a path that would be acceptable to upstream while we tested the approach. This PR is our attempt to upstream the work to support our use case. The choice of this field was arrived at in an old issue here and the concerns with using nil replicas was raised then. kubernetes-sigs/cluster-api-provider-aws#2022 (comment)
Rather than repeat that whole conversation, I suggest reading that thread. We actually attempted using nil replicas for a short period internally in our fork of CAPI, and the UX was not good. It is too easy for someone to misunderstand, and think they are scaling up a MachinePool, but that in effect actually suddenly scales it down dramatically (because it basically breaks the feedback loop). Operators see no replicas, or the defaulted value of replicas 1 and think they need to scale it up. Having this boolean be separate provides an excellent safeguard against operator error. The status reporting when replicas is nil is also problematic in output like kubectl get machinepools. Things like this were hard to workaround, since replicas appears in the summary output. This machinepool was not actually scaling down, but the machinepool controller reports that it is because the replicas field in status is so much higher than the spec.replicas field.

$ kubectl get machinepools
NAME           REPLICAS           PHASE          VERSION
node-group-a   1                  ScalingDown

Based on first hand experience using this at scale, using a nil replica count to enable or disable this feature amounts to an implicit side-effect resulting in a major configuration change that is not immediately obvious to the operator by looking at the MachinePool spec. This misconfiguration will lead to a thrashing of desired instance count in the pool between the external system controlling the replica count and the MachinePool controller, with high potential to cause service disruption within the workload cluster.

I think regarding the concern that there is no logic in this controller currently that uses this field, I do think that will change as we implement MachinePool Machines. This field could be used to signal to the machinepool controller whether it is responsible for creating Machines or whether the infrastructure provider will be doing it.

CecileRobertMichon · 2022-01-06T21:25:49Z

The status reporting when replicas is nil is also problematic in output like kubectl get machinepools. Things like this were hard to workaround, since replicas appears in the summary output. This machinepool was not actually scaling down, but the machinepool controller reports that it is because the replicas field in status is so much higher than the spec.replicas field.

What would be the expected behavior if using an externallyManagedReplicaCount field? Would the Spec.Replicas field be updated to match the actual replica count? If not, would the same wrong phase also be shown when the Status.Replicas don't match the Spec.Replicas?

dthorsen · 2022-01-06T22:09:18Z

Correct, that is how it behaves in our fork today. The infrastructure provider keeps the spec.replicas field on the machinepools object up to date with the actual replicas value it observes in the cloud provider. This provided a nice feedback loop and fixed the UX issues for our cluster operators.

sbueringer · 2022-01-07T14:43:05Z

exp/api/v1beta1/machinepool_types.go

+	// the infrastructure is a managed machine pool, or if the cluster-autoscaler is scaling
+	// the underlying cloud infrastructure directly outside of cluster-api. Set this to true
+	// if you wish the replica count to be managed outside of cluster-api. Defaults to false.
+	// +optional


Should we document this property and the (expected) behavior of infra providers in the book (https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-pool.html#infrastructure-provider)?

I assume this logic in the core MachinePool controller won't change when ExternallyManagedReplicaCount is set?

After the machine pool controller sets the OwnerReferences on the associated objects, it waits for the bootstrap and infrastructure objects referenced by the machine to have the Status.Ready field set to true. Then the infrastructure object is ready, the machine pool controller will attempt to read its spec.ProviderIDList and copy it into MachinePool.Spec.ProviderIDList.

The machine pool controller uses the kubeconfig for the new workload cluster to watch new nodes coming up. When a node appears with a Node.Spec.ProviderID in MachinePool.Spec.ProviderIDList, the machine pool controller increments the number of ready replicas. When all replicas are ready and the infrastructure ref is also Ready, the machine pool controller marks the machine pool as Running.

Yes, I will definitely update the book. And also yes, the logic you mention does not change.

enxebre · 2022-01-07T15:28:47Z

Hey @dthorsen great to hear we are coming here with real use feedback already. Some thoughts:

It is too easy for someone to misunderstand

I see what you mean. In any case let's make sure we document exhaustively so kubectl explain machinepools.spec.replicas is useful.

The status reporting when replicas is nil is also problematic in output like kubectl get machinepools. Things like this were hard to workaround, since replicas appears in the summary output. This machinepool was not actually scaling down, but the machinepool controller reports that it is because the replicas field in status is so much higher than the spec.replicas field.

I see these problems as cosmetic consequences to be approached individually rather than as a driving force to come up with the best design for user input experience.

The infrastructure provider keeps the spec.replicas field on the machinepools object up to date with the actual replicas value it observes in the cloud provider.

This sounds pretty reasonable to me although peculiar because you are effectively treating spec.replicas as status to satisfy existing core controller assumptions with your particular implementation.

This use case seems to me more like a provider specific field for awsmachinepool. I think externallyManagedReplicaCount naming is confusing since any external out of band controller adding smart scaling e.g capi provider autoscaling would modify spec.replicas naturally without enabling externallyManagedReplicaCount, right? Based on this reasoning I'm curious have you considered this use case (aws provider scaling instead of capi) to be supported by awsmachinepool.AutoManagedReplicas or so? Why does the core MachinePool needs to know who is setting their spec.replicas at all?

k8s-ci-robot · 2022-02-22T22:36:39Z

@dthorsen: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dntosas · 2022-03-20T14:54:02Z

@dthorsen @fabriziopandini hey gents! have you came into any conclusion for the case?

mweibel · 2022-03-21T08:52:32Z

FWIW, I use an own version of this fix which requires no changes in CAPI (uses an annotation instead) in my fork: https://github.com/kubernetes-sigs/cluster-api-provider-aws/compare/main...helio:externally-managed?expand=1

I haven't yet tested it well enough to be in any way contributable, but in case anyone is looking for a way to implement this, have a look :)

fabriziopandini · 2022-04-21T10:39:34Z

Getting back to this after the 20th Apr office hour discussion.

I don't want to block this PR and I'm happy to adapt to whatever the community agrees on given that it can unblock so many use cases and help people to un-fork.

Personally, for sake of consistency with what we already have in the project, I like the idea of using

replicas == nil for declaring the user intent of not managing replicas (same as what we have for MachineDeployment)
an annotation for surfacing the fact that one or more fields are managed by something else (same as what we have for managed topologies)

I also agree with @enxebre that externallyManagedReplicaCount naming is confusing; adding my two cents on top of this, any name based on which fields are externally managed could lead us to a tight corner, because what if we have something that manages more than one field (and we already have it as soon as we add support for MachinePools in ClusterClass).

Instead, using annotation named on capabilities like e.gtopology-owned, or something like asg-owned for AWS ASG etc., seems to be less sensitive to the above problem; last but not least, annotations are easily composable/expansible in order to accommodate all the auto scaler variants listed on top

k8s-triage-robot · 2022-07-20T11:37:27Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

AverageMarcus · 2022-07-20T11:42:07Z

Anything we can do to help move this forward? We'd still very much like to see this functionality implemented.

CecileRobertMichon · 2022-07-28T21:29:14Z

+1 to using an annotation to surface information to the user that replicas are being ignored / managed by an autoscaler or other external process

In kubernetes-sigs/cluster-api-provider-azure#2444 we've been going back and forth on this for AzureManagedClusters specifically and it looks like we're going to land on the following:

User expressed intent to have replicas autoscale via built-in AKS cluster-autoscaler feature by setting autoscaling options in the AzureManagedMachinePool spec.
When above autoscaling options are set, CAPZ AzureManagedMachinePool controller sets MachinePool replicas to reflect the current replica count and sets an annotation on the MachinePool to surface the fact that replicas are managed by the autoscaler (annotation name TBD).

cc @LochanRn @jackfrancis

k8s-triage-robot · 2022-08-27T22:15:19Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-09-26T22:53:36Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-09-26T22:53:39Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Skarlso · 2022-10-11T19:31:40Z

/reopen

k8s-ci-robot · 2022-10-11T19:31:44Z

@Skarlso: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-10-11T19:32:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign timothysc for approval by writing /assign @timothysc in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2022-10-11T19:32:20Z

@dthorsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-e2e-main	`b5889a6`	link	true	`/test pull-cluster-api-e2e-main`
pull-cluster-api-build-main	`b5889a6`	link	true	`/test pull-cluster-api-build-main`
pull-cluster-api-verify-main	`b5889a6`	link	true	`/test pull-cluster-api-verify-main`
pull-cluster-api-e2e-informing-main	`b5889a6`	link	false	`/test pull-cluster-api-e2e-informing-main`
pull-cluster-api-apidiff-main	`b5889a6`	link	false	`/test pull-cluster-api-apidiff-main`
pull-cluster-api-e2e-informing-ipv6-main	`b5889a6`	link	false	`/test pull-cluster-api-e2e-informing-ipv6-main`
pull-cluster-api-test-main	`b5889a6`	link	true	`/test pull-cluster-api-test-main`
pull-cluster-api-test-mink8s-main	`b5889a6`	link	true	`/test pull-cluster-api-test-mink8s-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Skarlso · 2022-10-12T05:58:47Z

@dthorsen I know this has been quite some time, and frankly, I don't understand why we aren't just moving forward with this after years of discussions. Having something is still better than nothing or things that just don't work. We can always fix it later if there is a bug, or we can adjust things. Just have SOMETHING.

I'm asking if someone might still review this, please, as it would be a great help. I don't care about the annotations-based approach, to be honest. I don't see the value, but others might. However, that could easily be a followup to this added as a separate ticket.

WDYT?

dthorsen · 2022-10-12T14:11:45Z

@Skarlso There has been progress on the annotation-based approach, and that will fulfill the use-case here. Given the current traction of the annotation approach I propose we close this issue and pursue completion of the annotation-based solution.

Skarlso · 2022-10-12T14:13:58Z

Okay, that's a fair point.

Skarlso · 2022-10-12T14:14:07Z

/close

k8s-ci-robot · 2022-10-12T14:14:11Z

@Skarlso: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 16, 2021

k8s-ci-robot requested review from CecileRobertMichon and fabriziopandini November 16, 2021 21:22

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 16, 2021

dthorsen force-pushed the externallyManagedReplicaCount branch from 0dd1e56 to 4481e9b Compare November 16, 2021 21:31

vincepri reviewed Nov 17, 2021

View reviewed changes

k8s-ci-robot assigned devigned and mboersma Nov 18, 2021

devigned reviewed Nov 19, 2021

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2021

dthorsen force-pushed the externallyManagedReplicaCount branch from 4481e9b to 076597a Compare December 3, 2021 23:00

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 3, 2021

🌱 Fix comment for MachinePool noderefs

2288064

dthorsen force-pushed the externallyManagedReplicaCount branch from 076597a to f393a8d Compare December 30, 2021 16:04

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 30, 2021

✨ Add externallyManagedReplicaCount field to MachinePool

b5889a6

dthorsen force-pushed the externallyManagedReplicaCount branch from f393a8d to b5889a6 Compare December 30, 2021 16:58

sbueringer reviewed Jan 7, 2022

View reviewed changes

CecileRobertMichon mentioned this pull request Jan 24, 2022

skip updating the AKS agentpoolprofile count if autoscaling is enabled (managed clusters) kubernetes-sigs/cluster-api-provider-azure#1991

Closed

3 tasks

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2022

AverageMarcus mentioned this pull request Apr 5, 2022

AWSMachinePool does not work in combination with cluster-autoscaler kubernetes-sigs/cluster-api-provider-aws#2022

Closed

CecileRobertMichon mentioned this pull request Jul 5, 2022

When creating AKS clusters using autoscaler enabled, do not make an update api call to agentpool service based on difference in node count kubernetes-sigs/cluster-api-provider-azure#2444

Merged

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2022

CecileRobertMichon mentioned this pull request Aug 23, 2022

✨ MachinePool annotation for externally managed autoscaler #7107

Merged

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 27, 2022

k8s-ci-robot closed this Sep 26, 2022

k8s-ci-robot reopened this Oct 11, 2022

k8s-ci-robot closed this Oct 12, 2022

✨ Add externallyManagedReplicaCount field to MachinePool #5685

✨ Add externallyManagedReplicaCount field to MachinePool #5685

Conversation

dthorsen commented Nov 16, 2021

vincepri Nov 17, 2021

Choose a reason for hiding this comment

dthorsen Nov 17, 2021

Choose a reason for hiding this comment

CecileRobertMichon commented Nov 18, 2021

devigned left a comment

Choose a reason for hiding this comment

dthorsen commented Nov 19, 2021 • edited Loading

dthorsen commented Dec 1, 2021

mboersma commented Dec 2, 2021

xvzf commented Dec 28, 2021

dthorsen commented Dec 28, 2021

enxebre commented Jan 5, 2022

dthorsen commented Jan 6, 2022

CecileRobertMichon commented Jan 6, 2022

dthorsen commented Jan 6, 2022

sbueringer Jan 7, 2022 • edited Loading

Choose a reason for hiding this comment

dthorsen Jan 7, 2022

Choose a reason for hiding this comment

enxebre commented Jan 7, 2022 • edited Loading

k8s-ci-robot commented Feb 22, 2022

dntosas commented Mar 20, 2022

mweibel commented Mar 21, 2022

fabriziopandini commented Apr 21, 2022

k8s-triage-robot commented Jul 20, 2022

AverageMarcus commented Jul 20, 2022

CecileRobertMichon commented Jul 28, 2022

k8s-triage-robot commented Aug 27, 2022

k8s-triage-robot commented Sep 26, 2022

k8s-ci-robot commented Sep 26, 2022

Skarlso commented Oct 11, 2022

k8s-ci-robot commented Oct 11, 2022

k8s-ci-robot commented Oct 11, 2022

k8s-ci-robot commented Oct 11, 2022

Skarlso commented Oct 12, 2022

dthorsen commented Oct 12, 2022

Skarlso commented Oct 12, 2022

Skarlso commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

dthorsen commented Nov 19, 2021 •

edited

Loading

sbueringer Jan 7, 2022 •

edited

Loading

enxebre commented Jan 7, 2022 •

edited

Loading