-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancel rolling upgrades before deleting VMSS & extensions #18973
Conversation
…adeus.net/iac/terraform-provider-azurerm into feature/SACP-47873-add-internal-ci
Merge in IAC/terraform-provider-azurerm from feature/SACP-47873-add-internal-ci to master * commit '6d1953a834749477c5693b3fef7a441b1ef0869a': SACP-47573: Fallback to Amadeus versioning SACP-47573: Adding internal CI Changes Changes with parallelism=2 SACP:47873: changes SACP:47873: changes SACP:47873: changes SACP:47873: changes SACP:47873: changes SACP:47873: changes SACP:47873: Removing stage from Jenkinsfile Add internal CI
…g VMSS Merge in IAC/terraform-provider-azurerm from feature/SACP-47877-redo-vmss-pr to master * commit '8ef20ab8515d3ccb45557aea51dfab5e31e6be3f': SACP-47877: Rolling Upgrade cancelled before deleting VMSS
thanks for re-opening this @harshavmb - its on our list to review soon - as this is supersedes #15634 i'm going to close that one out |
I've got another idea today to take out the Are you okay with this approach? |
Hi @katbyte / @tombuildsstuff , Any update on this PR? We need some update on this before we end up with some code conflicts. The last PR (already closed) was also left un-reviewed for longtime. |
Hello, |
Hi @stephybun , Your feedback is highly appreciated. We have been managing all these times with ugly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @harshavmb and apologies being unable to get to this sooner.
I think this behaviour makes sense and is something we can incorporate into the resource. I've left suggestions which will bring this functionality more in-line with the rest of the provider. These will need to be made before we can consider merging this.
We also need to add a test for all three resources, this is a bit tricky since we need to simulate the behaviour of deleting the VMSS when rolling upgrades are happening in an automated fashion. It isn't blatantly obvious how to do that unless you're very familiar with our testing framework which is why I've gone ahead and written one, unfortunately I'm unable to push it to your branch since I lack permission.
I have the test and all of the suggested changes on my branch, which in the interest of time I could open as a separate PR to supersede this one. Alternatively I can provide the test and test config to you in another way. Let me know which you prefer.
if err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) { | ||
return fmt.Errorf("error cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. This actually resulted in a crash for me when when trying to reproduce the case described in the comments above. In addition matching on the error messages tends to be brittle since that is something that can change very easily in the API.
A more robust way to check whether to call the cancel function for rolling upgrades would be to query whether there are any ongoing rolling upgrades first. I've left comments in-line with what that would look like.
"strings" | ||
) | ||
|
||
func (c *Client) CancelRollingUpgradesBeforeDeletion(ctx context.Context, resourceGroupName string, vmScaleSetName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just pass in the VMSS resource ID here since that contains all the info we need
func (c *Client) CancelRollingUpgradesBeforeDeletion(ctx context.Context, resourceGroupName string, vmScaleSetName string) error { | |
func (c *Client) CancelRollingUpgradesBeforeDeletion(ctx context.Context, id parse.VirtualMachineScaleSetId) error { |
) | ||
|
||
func (c *Client) CancelRollingUpgradesBeforeDeletion(ctx context.Context, resourceGroupName string, vmScaleSetName string) error { | ||
future, err := c.VMScaleSetRollingUpgradesClient.Cancel(ctx, resourceGroupName, vmScaleSetName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an API call to retrieve the latest rolling upgrades, I think we should do that here and return nil if no rolling upgrade is fine, i.e. there is no need to cancel and trigger the 409 error described down below, and if there is an existing rolling upgrade then we proceed to cancel - all other errors should be caught and surfaced
future, err := c.VMScaleSetRollingUpgradesClient.Cancel(ctx, resourceGroupName, vmScaleSetName) | |
resp, err := c.VMScaleSetRollingUpgradesClient.GetLatest(ctx, id.ResourceGroup, id.Name) | |
if err != nil { | |
if utils.ResponseWasNotFound(resp.Response) { | |
return nil | |
} | |
return fmt.Errorf("retrieving rolling updates for %s: %+v", id, err) | |
} | |
future, err := c.VMScaleSetRollingUpgradesClient.Cancel(ctx, resourceGroupName, vmScaleSetName) |
// If rolling upgrades haven't been run (when VMSS are just provisioned with rolling upgrades but no extensions, auto-scaling are run ) | ||
// we can not cancel rolling upgrades | ||
// API call :: GET https://management.azure.com/subscriptions/{subId}/resourceGroups/{rgName}/providers/Microsoft.Compute/virtualMachineScaleSets/{vmSSName}/rollingUpgrades/latest?api-version=2021-07-01 | ||
// Azure API throws 409 conflict error saying "The entity was not found in this Azure location." | ||
// If the above error message matches, we identify and move forward to delete the VMSS | ||
// in all other cases, it just cancels the rolling upgrades and move ahead to delete the VMSS | ||
if err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) { | ||
return fmt.Errorf("error cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Means none of this is necessary since we aren't hitting this behaviour anymore
// If rolling upgrades haven't been run (when VMSS are just provisioned with rolling upgrades but no extensions, auto-scaling are run ) | |
// we can not cancel rolling upgrades | |
// API call :: GET https://management.azure.com/subscriptions/{subId}/resourceGroups/{rgName}/providers/Microsoft.Compute/virtualMachineScaleSets/{vmSSName}/rollingUpgrades/latest?api-version=2021-07-01 | |
// Azure API throws 409 conflict error saying "The entity was not found in this Azure location." | |
// If the above error message matches, we identify and move forward to delete the VMSS | |
// in all other cases, it just cancels the rolling upgrades and move ahead to delete the VMSS | |
if err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) { | |
return fmt.Errorf("error cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err) | |
} |
if err := future.WaitForCompletionRef(ctx, c.VMScaleSetExtensionsClient.Client); err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) { | ||
return fmt.Errorf("waiting for cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err) | ||
} | ||
log.Printf("[DEBUG] cancelled Virtual Machine Scale Set Rolling Upgrades %q (Resource Group %q).", vmScaleSetName, resourceGroupName) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can simplify the condition here since it should be prevented by the check retrieving any rolling upgrades above
if err := future.WaitForCompletionRef(ctx, c.VMScaleSetExtensionsClient.Client); err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) { | |
return fmt.Errorf("waiting for cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err) | |
} | |
log.Printf("[DEBUG] cancelled Virtual Machine Scale Set Rolling Upgrades %q (Resource Group %q).", vmScaleSetName, resourceGroupName) | |
return nil | |
if err := future.WaitForCompletionRef(ctx, c.VMScaleSetExtensionsClient.Client); err != nil { | |
return fmt.Errorf("waiting for cancelling of rolling upgrades for %s: %+v", id, err) | |
} | |
log.Printf("[DEBUG] cancelled Virtual Machine Scale Set Rolling Upgrades for %s.", id) | |
return nil |
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.Name) | ||
if err != nil { | ||
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.Name, id.ResourceGroup, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's simplify this to
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.Name) | |
if err != nil { | |
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.Name, id.ResourceGroup, err) | |
} | |
if err := meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, *id); err != nil { | |
fmt.Errorf("cancelling rolling upgrades for %s: %+v", *id, err) | |
} |
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.VirtualMachineScaleSetName) | ||
if err != nil { | ||
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.VirtualMachineScaleSetName, id.ResourceGroup, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.VirtualMachineScaleSetName) | |
if err != nil { | |
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.VirtualMachineScaleSetName, id.ResourceGroup, err) | |
} | |
if err := meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, *id); err != nil { | |
fmt.Errorf("cancelling rolling upgrades for %s: %+v", *id, err) | |
} |
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.Name) | ||
if err != nil { | ||
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.Name, id.ResourceGroup, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.Name) | |
if err != nil { | |
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.Name, id.ResourceGroup, err) | |
} | |
if err := meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, *id); err != nil { | |
fmt.Errorf("cancelling rolling upgrades for %s: %+v", *id, err) | |
} |
Hi @stephybun , Many Thanks for your review. If you have all the changes you needed to push from your branch, I'm fine with it. This PR can be closed when you create the PR. Also, I must admit I can't write the three test cases you mentioned. Thanks again for your help. Very much appreciated. Thanks, |
@harshavmb I'll get a PR open in the morning, I think it may be cutting it a bit short to make it into this week's release, I need to do some additional testing around the extensions resource but it should make it in the week after. Apologies again for the long delay on this one. |
Closing in favour of #22991 |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
This is redo work of PR-15634. The old PR was having conflicts & expects an additional argument to cancel before deleting VMSS & extensions.
Since
destroy
call is a clear intention of deleting VMSS & associated resources like extensions, rolling upgrades could be cancelled as they are trivial.As of now, we write destroy time provisioners to cancel rolling upgrades which are error prone. We hope this will improve the way VMSS & extensions are deleted in a clean manner.
Our problem statement had been explained to Gauthier Donikian, Hashicorp few months ago. We expect some feedback from Hashicorp team.
Thanks,
Harsha