Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add externallyManagedReplicaCount field to MachinePool #5685

Conversation

dthorsen
Copy link
Contributor

What this PR does / why we need it:

User Story

As a cluster operator, I would like to have the option to utilize external scaling applications for autoscaling infrastructure managed under a MachinePool so that we can take advantage of the capabilities of existing infrastructure scaling tools if we chose.

Detailed Description

There are many potential scaler implementations that support things like AWS ASG and Azure VMSS. With a small change, cluster-api can be made to optionally allow external software to manage the scaling of the node pools without needing to have support for cluster-api. Some examples of scalers that would benefit from this solution are:

Kubernetes cluster-autoscaler using the existing cloud-provider implementations. This provides a stopgap while cluster-autoscaler's support for CAPI MachinePools to mature and reach feature parity with the existing cloud provider implementations. Incidentally, this would also allow the CAPZ provider add AKS' native managed node pool autoscaling support, which provides an fully-managed installation of cluster-autoscaler on the AKS control plane itself completely managed by the cloud provider, Azure.
Atlassian escalator
Spotinst (an infrastructure provider for spotinst does not yet exist but interest has been expressed by the community for one to exist)

Fixes #5658

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 16, 2021
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 16, 2021
@dthorsen dthorsen force-pushed the externallyManagedReplicaCount branch from 0dd1e56 to 4481e9b Compare November 16, 2021 21:31
Comment on lines +63 to +69
// Whether the Replicas value is externally managed. This is useful when
// the infrastructure is a managed machine pool, or if the cluster-autoscaler is scaling
// the underlying cloud infrastructure directly outside of cluster-api. Set this to true
// if you wish the replica count to be managed outside of cluster-api. Defaults to false.
// +optional
ExternallyManagedReplicaCount *bool `json:"externallyManagedReplicaCount,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be used to signal to the infrastructure provider that it is responsible for managing the spec.replicas and relevant status fields on the MachinePool. So there is no implementation code in the MachinePool itself yet. That may change a little as the MachinePool Machines implementation begins though.

@CecileRobertMichon
Copy link
Contributor

/assign @devigned @mboersma

Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is intended to be a stop gap until more mature support in Cluster Autoscaler, did you consider something a little more temporary, like an annotation?

@dthorsen
Copy link
Contributor Author

dthorsen commented Nov 19, 2021

@devigned It is intended for more than a temporary stop gap, though stop gap is one use case. Some other example use cases are listed above. This field is intended to provide flexibility and permit cluster operator choice of scaler software. Without this field, cluster operators that wish to use external scaling tools are blocked from using cluster-api to manage node pools. That seems unnecessarily restrictive.

If we place this in an annotation instead, it seems likely to me this will be a very long lived annotation, especially considering developers of other scaling solutions may never target scaling via cluster API. It assumes at some point in the future we intend to make cluster API incompatible with underlying managed cloud features or external software that manipulate the pools outside of cluster API. MachinePools were originally conceived to support this exact use case and it is a key differentiator from MachineDeployments. One that we use heavily today and others also wish to do.

@dthorsen
Copy link
Contributor Author

dthorsen commented Dec 1, 2021

I can rebase this tonight to resolve merge conflicts but is there anything preventing this from being merged? I'd like to start on the AWSMachinePool implementation soon.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2021
@mboersma
Copy link
Contributor

mboersma commented Dec 2, 2021

I don't think this is something that the MachinePool Machines proposal requires, so no objections there. It does seem odd to add a field that isn't used by any logic in CAPI, just in infra providers. But I understand your use case and see how it could be useful in certain cluster-autoscaler configurations like AKS, so 👍🏻 .

@dthorsen dthorsen force-pushed the externallyManagedReplicaCount branch from 4481e9b to 076597a Compare December 3, 2021 23:00
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 3, 2021
@xvzf
Copy link

xvzf commented Dec 28, 2021

Hey guys is there an update here?

@dthorsen
Copy link
Contributor Author

@xvzf My apologies, I fell behind on this. I'll rebase this PR today. Most of the maintainers are out for holidays but I expect this should get merged quickly once folks return.

@dthorsen dthorsen force-pushed the externallyManagedReplicaCount branch from 076597a to f393a8d Compare December 30, 2021 16:04
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 30, 2021
@dthorsen dthorsen force-pushed the externallyManagedReplicaCount branch from f393a8d to b5889a6 Compare December 30, 2021 16:58
@enxebre
Copy link
Member

enxebre commented Jan 5, 2022

This seems to me a natural case to differentiate nil from zero. The semantic could be included into the existing machinePool.Replicas field. machinePool.Replicas=nil then machinePool.status.replicasExternallyManaged=true. Then each implementation reacts appropriately.

@dthorsen
Copy link
Contributor Author

dthorsen commented Jan 6, 2022

@enxebre @fabriziopandini @vincepri @CecileRobertMichon
The origin story of this field is really old. It was suggested in yesterday's meeting to use nil replicas to signal this instead, and since the context is very old I had to go relearn why we had not used that approach. To clarify, we use this field extensively today internally in a fork of cluster-api. We did consult with the community before doing so as well in an attempt to plan ahead and avoid diverging from a path that would be acceptable to upstream while we tested the approach. This PR is our attempt to upstream the work to support our use case. The choice of this field was arrived at in an old issue here and the concerns with using nil replicas was raised then. kubernetes-sigs/cluster-api-provider-aws#2022 (comment)
Rather than repeat that whole conversation, I suggest reading that thread. We actually attempted using nil replicas for a short period internally in our fork of CAPI, and the UX was not good. It is too easy for someone to misunderstand, and think they are scaling up a MachinePool, but that in effect actually suddenly scales it down dramatically (because it basically breaks the feedback loop). Operators see no replicas, or the defaulted value of replicas 1 and think they need to scale it up. Having this boolean be separate provides an excellent safeguard against operator error. The status reporting when replicas is nil is also problematic in output like kubectl get machinepools. Things like this were hard to workaround, since replicas appears in the summary output. This machinepool was not actually scaling down, but the machinepool controller reports that it is because the replicas field in status is so much higher than the spec.replicas field.

$ kubectl get machinepools
NAME           REPLICAS           PHASE          VERSION
node-group-a   1                  ScalingDown

Based on first hand experience using this at scale, using a nil replica count to enable or disable this feature amounts to an implicit side-effect resulting in a major configuration change that is not immediately obvious to the operator by looking at the MachinePool spec. This misconfiguration will lead to a thrashing of desired instance count in the pool between the external system controlling the replica count and the MachinePool controller, with high potential to cause service disruption within the workload cluster.

I think regarding the concern that there is no logic in this controller currently that uses this field, I do think that will change as we implement MachinePool Machines. This field could be used to signal to the machinepool controller whether it is responsible for creating Machines or whether the infrastructure provider will be doing it.

@CecileRobertMichon
Copy link
Contributor

The status reporting when replicas is nil is also problematic in output like kubectl get machinepools. Things like this were hard to workaround, since replicas appears in the summary output. This machinepool was not actually scaling down, but the machinepool controller reports that it is because the replicas field in status is so much higher than the spec.replicas field.

What would be the expected behavior if using an externallyManagedReplicaCount field? Would the Spec.Replicas field be updated to match the actual replica count? If not, would the same wrong phase also be shown when the Status.Replicas don't match the Spec.Replicas?

@dthorsen
Copy link
Contributor Author

dthorsen commented Jan 6, 2022

Correct, that is how it behaves in our fork today. The infrastructure provider keeps the spec.replicas field on the machinepools object up to date with the actual replicas value it observes in the cloud provider. This provided a nice feedback loop and fixed the UX issues for our cluster operators.

// the infrastructure is a managed machine pool, or if the cluster-autoscaler is scaling
// the underlying cloud infrastructure directly outside of cluster-api. Set this to true
// if you wish the replica count to be managed outside of cluster-api. Defaults to false.
// +optional
Copy link
Member

@sbueringer sbueringer Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document this property and the (expected) behavior of infra providers in the book (https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-pool.html#infrastructure-provider)?

I assume this logic in the core MachinePool controller won't change when ExternallyManagedReplicaCount is set?

After the machine pool controller sets the OwnerReferences on the associated objects, it waits for the bootstrap and infrastructure objects referenced by the machine to have the Status.Ready field set to true. Then the infrastructure object is ready, the machine pool controller will attempt to read its spec.ProviderIDList and copy it into MachinePool.Spec.ProviderIDList.

The machine pool controller uses the kubeconfig for the new workload cluster to watch new nodes coming up. When a node appears with a Node.Spec.ProviderID in MachinePool.Spec.ProviderIDList, the machine pool controller increments the number of ready replicas. When all replicas are ready and the infrastructure ref is also Ready, the machine pool controller marks the machine pool as Running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will definitely update the book. And also yes, the logic you mention does not change.

@enxebre
Copy link
Member

enxebre commented Jan 7, 2022

Hey @dthorsen great to hear we are coming here with real use feedback already. Some thoughts:

It is too easy for someone to misunderstand

I see what you mean. In any case let's make sure we document exhaustively so kubectl explain machinepools.spec.replicas is useful.

The status reporting when replicas is nil is also problematic in output like kubectl get machinepools. Things like this were hard to workaround, since replicas appears in the summary output. This machinepool was not actually scaling down, but the machinepool controller reports that it is because the replicas field in status is so much higher than the spec.replicas field.

I see these problems as cosmetic consequences to be approached individually rather than as a driving force to come up with the best design for user input experience.

The infrastructure provider keeps the spec.replicas field on the machinepools object up to date with the actual replicas value it observes in the cloud provider.

This sounds pretty reasonable to me although peculiar because you are effectively treating spec.replicas as status to satisfy existing core controller assumptions with your particular implementation.

This use case seems to me more like a provider specific field for awsmachinepool. I think externallyManagedReplicaCount naming is confusing since any external out of band controller adding smart scaling e.g capi provider autoscaling would modify spec.replicas naturally without enabling externallyManagedReplicaCount, right? Based on this reasoning I'm curious have you considered this use case (aws provider scaling instead of capi) to be supported by awsmachinepool.AutoManagedReplicas or so? Why does the core MachinePool needs to know who is setting their spec.replicas at all?

@k8s-ci-robot
Copy link
Contributor

@dthorsen: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2022
@dntosas
Copy link
Contributor

dntosas commented Mar 20, 2022

@dthorsen @fabriziopandini hey gents! have you came into any conclusion for the case?

@mweibel
Copy link
Contributor

mweibel commented Mar 21, 2022

FWIW, I use an own version of this fix which requires no changes in CAPI (uses an annotation instead) in my fork: https://github.com/kubernetes-sigs/cluster-api-provider-aws/compare/main...helio:externally-managed?expand=1

I haven't yet tested it well enough to be in any way contributable, but in case anyone is looking for a way to implement this, have a look :)

@fabriziopandini
Copy link
Member

Getting back to this after the 20th Apr office hour discussion.

I don't want to block this PR and I'm happy to adapt to whatever the community agrees on given that it can unblock so many use cases and help people to un-fork.

Personally, for sake of consistency with what we already have in the project, I like the idea of using

  • replicas == nil for declaring the user intent of not managing replicas (same as what we have for MachineDeployment)
  • an annotation for surfacing the fact that one or more fields are managed by something else (same as what we have for managed topologies)

I also agree with @enxebre that externallyManagedReplicaCount naming is confusing; adding my two cents on top of this, any name based on which fields are externally managed could lead us to a tight corner, because what if we have something that manages more than one field (and we already have it as soon as we add support for MachinePools in ClusterClass).

Instead, using annotation named on capabilities like e.gtopology-owned, or something like asg-owned for AWS ASG etc., seems to be less sensitive to the above problem; last but not least, annotations are easily composable/expansible in order to accommodate all the auto scaler variants listed on top

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2022
@AverageMarcus
Copy link
Member

Anything we can do to help move this forward? We'd still very much like to see this functionality implemented.

@CecileRobertMichon
Copy link
Contributor

+1 to using an annotation to surface information to the user that replicas are being ignored / managed by an autoscaler or other external process

In kubernetes-sigs/cluster-api-provider-azure#2444 we've been going back and forth on this for AzureManagedClusters specifically and it looks like we're going to land on the following:

  1. User expressed intent to have replicas autoscale via built-in AKS cluster-autoscaler feature by setting autoscaling options in the AzureManagedMachinePool spec.
  2. When above autoscaling options are set, CAPZ AzureManagedMachinePool controller sets MachinePool replicas to reflect the current replica count and sets an annotation on the MachinePool to surface the fact that replicas are managed by the autoscaler (annotation name TBD).

cc @LochanRn @jackfrancis

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 27, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Skarlso
Copy link

Skarlso commented Oct 11, 2022

/reopen

@k8s-ci-robot
Copy link
Contributor

@Skarlso: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Oct 11, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign timothysc for approval by writing /assign @timothysc in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@dthorsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-e2e-main b5889a6 link true /test pull-cluster-api-e2e-main
pull-cluster-api-build-main b5889a6 link true /test pull-cluster-api-build-main
pull-cluster-api-verify-main b5889a6 link true /test pull-cluster-api-verify-main
pull-cluster-api-e2e-informing-main b5889a6 link false /test pull-cluster-api-e2e-informing-main
pull-cluster-api-apidiff-main b5889a6 link false /test pull-cluster-api-apidiff-main
pull-cluster-api-e2e-informing-ipv6-main b5889a6 link false /test pull-cluster-api-e2e-informing-ipv6-main
pull-cluster-api-test-main b5889a6 link true /test pull-cluster-api-test-main
pull-cluster-api-test-mink8s-main b5889a6 link true /test pull-cluster-api-test-mink8s-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Skarlso
Copy link

Skarlso commented Oct 12, 2022

@dthorsen I know this has been quite some time, and frankly, I don't understand why we aren't just moving forward with this after years of discussions. Having something is still better than nothing or things that just don't work. We can always fix it later if there is a bug, or we can adjust things. Just have SOMETHING.

I'm asking if someone might still review this, please, as it would be a great help. I don't care about the annotations-based approach, to be honest. I don't see the value, but others might. However, that could easily be a followup to this added as a separate ticket.

WDYT?

@dthorsen
Copy link
Contributor Author

@Skarlso There has been progress on the annotation-based approach, and that will fulfill the use-case here. Given the current traction of the annotation approach I propose we close this issue and pursue completion of the annotation-based solution.

@Skarlso
Copy link

Skarlso commented Oct 12, 2022

Okay, that's a fair point.

@Skarlso
Copy link

Skarlso commented Oct 12, 2022

/close

@k8s-ci-robot
Copy link
Contributor

@Skarlso: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support external scalers for infrastructure managed by a MachinePool