-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS Vertical down scaling deletes all AKS infrastructure #1939
Comments
Hi markbangert, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
hi @markbangert , I looked at the history and based on your terraform version and error message I can see the operations performed on your cluster. On UTC time Then the mentioned failed terraform deployment happened on As you expected, failed operation won't change any state of your cluster. |
Thank you for looking into this. I did some other runs aswell and I now understand that any change of the VM size will delete and recreate the node pool. This in turn means that any terraform deployment with changed VM size will not just upgrade the cluster and keep existing workloads (as I expected) but it will recreate the entire cluster from scratch - which leaves me puzzled how to deal with this... If we want to prevent the cluster from being deleted (especially on the production environment) we should not allow dynamic adjustments of the VM size in our deployment pipelines: However, we want to run different (smaller and cheaper) setups in development environments so it would be cool to simply have this as pipeline environment variable. Any ideas? |
The behavior you described sounds like a terraform behavior at client side. That doesn't sound right to me too. @palma21 do you have any idea? @markbangert typically you just need create a new node pool with different VM size, then delete the old node pool, without needing to re-create the cluster. Not all VM sizes can switch in-place, e.g. they may run on different hardwares, this is Azure API limitation. That's the reason you cannot do vertical scaling in-place. |
I believe that's TF behavior @tombuildsstuff @grayzu could confirm |
@markbangert due to historical limitations within AKS, Terraform doesn't support cycling the default node pool at this time, but does allow updating of external node pools (for some fields) via the separate resource. At this point in time the VM SKU can't be updated in place for external node pools either, if that's possible we can look to support that in the future. Do you have the terraform plan showing the changes which'd be applied here? FWIW ultimately we'd like to remove the inline/default node pool altogether to be able to update/cycle all of these fields on the node pool without destroying the cluster. However my understanding is the service still requires one at initial provisioning time, so unfortunately this isn't possible to model at this time. As such whilst we may look to support cycling the default node pool in the future - we have a lot of questions to be able to do so in practice - but this'd allow users to lean on the default behaviour of the AKS Service itself. |
Thank you all for your feedback. I really appreciate your time! @tombuildsstuff Just to confirm that I got you right: You are saying that the vertical scaling operation would work on a non default node pool? Is this the case because the default node pool would jump in to take over the workloads while the non-default node pool is deleted and recreated afterwards with a modified VM SKU or is there a fundamentally different update mechanism at work for the non-default pools? And regarding the terraform plan... is there any way I can send you this privately. It is from a customer project and I am not 100% happy sharing this right here. |
@markbangert at this point the Taking a look through, this issue appears to be tracking this, as such would you mind subscribing to this one for updates: hashicorp/terraform-provider-azurerm#7093 With regards to the plan, in retrospect I don't think it's necessary since both of these fields are ForceNew - I think we can infer that's the only reason this is being replaced/cycled at this point, so I think we can ignore that for now 👍 |
Thank you all again. Will subscribe to hashicorp/terraform-provider-azurerm#7093 and close this issue for the time being. |
What happened:
I just tried to scale down my AKS cluster which was running on two DS2v2 machines to two DS1v2 machines with a terraform infrastructure as code deployment. The deployment failed with the following error message:
"Message="System node pool must use VM sku with more than 2 cores and 4GB memory. Nodepool name: default."
So far so good... Apparently it is not possible to use AKS with the DS1v2 machines. However, what happened to the existing AKS infrastructure is alarming. All resources, i.e., the AKS instance and the associated VM scale set were automatically deleted. This should def not be possible.
What you expected to happen:
Either the vertical scaling should be performed as desired or only the error message should pop up without any changes to the infrastructure.
How to reproduce it (as minimally and precisely as possible):
Run a terraform AKS update on an existing cluster that changes the VM size to DS1v2.
Environment:
The text was updated successfully, but these errors were encountered: