r/kubernetes_cluster: autoscaling-related improvements #4256

invidian · 2019-09-05T14:24:36Z

Currently, on every update, agentPoolProfiles are build from scratch
based on pool definition from Terraform. With autoscaled clusters, this
attempts to remove 'Count' property, which triggers manual scaling
action, which is forbidden for clusters with autoscaling enabled. That
breaks any updates to the cluster, including adding tags or updating
kubernetes version.

With this PR, we always try to fetch defintion of the cluster from
the API and if we update, we only modify profile parameters using Terraform
configuration, rather than building profiles from scratch. This makes
sure that all parameters set by API will be preserved.

To enforce this behavior, we should fail when we are updating the
resource, but we are not able to fetch it from the API.

'expandKubernetesClusterAgentPoolProfiles' function now accepts profiles
slice, which it will work on and returns modified copy, rather than
newly build profiles slice.

Also currently autoscaling cannot be disabled gracefully on the cluster and
cluster cannot be scaled when disabling autoscaling either.

This PR adds validation that when autoscaling is disabled, min_count
and max_count parametes must also be unset.

When Terraform disables autoscaling (enable_auto_scaling = false), then
it also unsets MinCount and MaxCount fields in profile obtained from
API.

This PR also adds documentation note, that cluster cannot be
manually scaled when autoscaling is enabled and suggests how to ignore
changes to the 'count' parameter, which will by dynamically changed by
cluster autoscaler.

Closes #4075

invidian · 2019-09-12T08:52:26Z

Any updates on that?

invidian · 2019-10-04T10:50:54Z

Ugh, there've been some big changes to way how cluster is created and updated... Will rebase.

invidian · 2019-10-04T12:35:04Z

Rebased.

invidian · 2019-10-04T22:36:04Z

@tombuildsstuff and chance you could have a look? 🤞

invidian · 2019-10-08T10:16:04Z

Resolved conflicts again.

wagnst · 2019-10-24T16:17:11Z

Any update on this one?

brianxieseattle · 2019-10-28T19:08:12Z

Can someone review this change and help this change to be into the master as soon as possible?

invidian · 2019-10-31T12:15:27Z

Rebased again. Can someone review?

kim0 · 2019-11-03T09:49:00Z

Desperately needing this one! .. Hope @tombuildsstuff can get to it soon

invidian · 2019-11-03T09:50:55Z

To everyone interested in the patch, can you please try it out for your scenarios to make sure it's working as expected and report back here?

Currently, on every update, agentPoolProfiles are build from scratch based on pool definition from Terraform. With autoscaled clusters, this attempts to remove 'Count' property, which triggers manual scaling action, which is forbidden for clusters with autoscaling enabled. That breaks any updates to the cluster, including adding tags or updating kubernetes version. With this commit, we always try to fetch definition of the cluster from the API and if we update, we only modify profile parameters using Terraform configuration, rather than building profiles from scratch. This makes sure that all parameters set by API will be preserved. To enforce this behavior, we should fail when we are updating the resource, but we are not able to fetch it from the API. 'expandKubernetesClusterAgentPoolProfiles' function now accepts profiles slice, which it will work on and returns modified copy, rather than newly build profiles slice. Closes #4075 Signed-off-by: Mateusz Gozdek <[email protected]>

Currently autoscaling cannot be disabled gracefully on the cluster and cluster cannot be scaled when disabling autoscaling either. This commit adds validation that when autoscaling is disabled, min_count and max_count parameters must also be unset. When Terraform disables autoscaling (enable_auto_scaling = false), then it also unsets MinCount and MaxCount fields in profile obtained from API. This commit also adds documentation note, that cluster cannot be manually scaled when autoscaling is enabled and suggests how to ignore changes to the 'count' parameter, which will by dynamically changed by cluster autoscaler. Signed-off-by: Mateusz Gozdek <[email protected]>

invidian · 2019-11-15T14:18:32Z

Any updates on that?

tombuildsstuff · 2019-11-17T10:12:13Z

hi @invidian

Thanks for this PR - apologies for the delay reviewing this!

I've spent some time over the last couple of weeks trying to work out how to consolidate this PR, #4543, #4676, #4046 and #4472, since they're all attempting to solve the same problem (a breaking behavioural change in the AKS API, where non-default node pools now have to be managed via a separate API) in different ways.

After spending some time investigating/experimenting with this I believe the best approach going forward is to introduce a replacement default_node_pool block which is limited to a single element and then deprecate the existing agent_pool_profiles block which can then be removed in 2.0.

This allows existing users to continue to use the agent_pool_profiles field if they need to and migrate across to the default_node_pool object on their own timeline. In addition this allows for Azure regions which haven't rolled these changes out (e.g. China/Germany/Government) to continue to use the existing functionality if necessary. The default_node_pool block can then become Required in 2.0 (at which point the existing agent_pool_profiles block will be removed). At the same time we can handle the other breaking behavioural change mentioned in #4465 by switching the default Node Pool type to VirtualMachineScaleSets.

Whilst this isn't ideal since users will need to migrate at some point - it seems preferable from a UX perspective to manage these as separate resources, rather than inline (which also allows users to order node pool creation/destruction if necessary).

As such whilst we'd like to thank you for this contribution (and apologise again for the delay reviewing it!); ultimately we're going to take a different direction here and thus I'm going to close this PR in favour of #4898 which introduces the new default_node_pool block mentioned above. Once #4898 is merged we should be able to push/rebase #4046 which introduces the new Node Pool resource; at which point we can then rebase #4472 - collectively allowing us to add support for all of this functionality.

Thanks!

invidian · 2019-11-17T16:05:15Z

HI @tombuildsstuff, thanks for your feedback :) I understand the decision about closing this PR. I hope the original issue which I was addressing, #4075, will still be addressed with the changes you proposed. I'll try to review #4899.

In the future, it would be great to receive some feedback earlier. Even not about the changes itself, but knowing, that there are some things happening in parallel, which prevents such PRs being addressed, would be great.

tombuildsstuff · 2019-11-17T17:26:15Z

@invidian

In the future, it would be great to receive some feedback earlier. Even not about the changes itself, but knowing, that there are some things happening in parallel, which prevents such PRs being addressed, would be great.

Agreed - apologies that's my bad here - we've been mostly heads-down on the 2.0 work and so haven't had as much time as we'd like for Github Notifications - from our side we're coming to the end of this work, so I'm hopeful we should be able to do this for all PR's going forward.

ghost · 2019-11-26T08:42:50Z

This has been released in version 1.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.37.0"
}
# ... other configuration ...

ghost · 2020-03-29T15:01:10Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

ghost added size/M documentation labels Sep 5, 2019

invidian mentioned this pull request Sep 5, 2019

Autoscaling enabled AKS Cluster leads to error on terraform apply, tough no changes on aks planned #4075

Closed

WodansSon requested a review from katbyte October 9, 2019 21:53

WodansSon added the service/kubernetes-cluster label Oct 9, 2019

WodansSon added this to the v1.36.0 milestone Oct 9, 2019

WodansSon requested a review from mbfrahry October 9, 2019 22:10

tombuildsstuff modified the milestones: v1.36.0, v1.37.0 Oct 24, 2019

tombuildsstuff self-assigned this Nov 4, 2019

tombuildsstuff mentioned this pull request Nov 7, 2019

Add kubernetes_cluster_agent_pool #4046

Closed

invidian added 2 commits November 7, 2019 14:54

tombuildsstuff closed this Nov 17, 2019

invidian deleted the invidian/fix-updating-autoscaled-aks-cluster branch November 17, 2019 16:05

ghost locked and limited conversation to collaborators Mar 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

r/kubernetes_cluster: autoscaling-related improvements #4256

r/kubernetes_cluster: autoscaling-related improvements #4256

invidian commented Sep 5, 2019

invidian commented Sep 12, 2019

invidian commented Oct 4, 2019

invidian commented Oct 4, 2019

invidian commented Oct 4, 2019

invidian commented Oct 8, 2019

wagnst commented Oct 24, 2019

brianxieseattle commented Oct 28, 2019

invidian commented Oct 31, 2019 •

edited

Loading

kim0 commented Nov 3, 2019

invidian commented Nov 3, 2019

invidian commented Nov 15, 2019

tombuildsstuff commented Nov 17, 2019

invidian commented Nov 17, 2019

tombuildsstuff commented Nov 17, 2019

ghost commented Nov 26, 2019

ghost commented Mar 29, 2020

r/kubernetes_cluster: autoscaling-related improvements #4256

r/kubernetes_cluster: autoscaling-related improvements #4256

Conversation

invidian commented Sep 5, 2019

invidian commented Sep 12, 2019

invidian commented Oct 4, 2019

invidian commented Oct 4, 2019

invidian commented Oct 4, 2019

invidian commented Oct 8, 2019

wagnst commented Oct 24, 2019

brianxieseattle commented Oct 28, 2019

invidian commented Oct 31, 2019 • edited Loading

kim0 commented Nov 3, 2019

invidian commented Nov 3, 2019

invidian commented Nov 15, 2019

tombuildsstuff commented Nov 17, 2019

invidian commented Nov 17, 2019

tombuildsstuff commented Nov 17, 2019

ghost commented Nov 26, 2019

ghost commented Mar 29, 2020

invidian commented Oct 31, 2019 •

edited

Loading