Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel rolling upgrades before deleting VMSS & extensions #18973

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
5a324aa
Add internal CI
harshavmb Sep 27, 2022
d3400ef
SACP:47873: Removing stage from Jenkinsfile
harshavmb Sep 27, 2022
b9862b9
SACP:47873: changes
harshavmb Sep 27, 2022
2ddd4ed
SACP:47873: changes
harshavmb Sep 27, 2022
9912820
SACP:47873: changes
harshavmb Sep 27, 2022
44c498b
SACP:47873: changes
harshavmb Sep 27, 2022
58d807d
SACP:47873: changes
harshavmb Sep 27, 2022
1bacd30
SACP:47873: changes
harshavmb Sep 27, 2022
5a39c53
with parallelism=2
harshavmb Oct 4, 2022
38d86b6
Changes
harshavmb Oct 12, 2022
1f9b0fa
Changes
harshavmb Oct 14, 2022
985cc04
Merge branch 'feature/SACP-47873-add-internal-ci' of ssh://git.rnd.am…
harshavmb Oct 21, 2022
affd5eb
SACP-47573: Adding internal CI
harshavmb Oct 21, 2022
6c2def3
SACP-47573: Resolving conflicts
harshavmb Oct 21, 2022
6d1953a
SACP-47573: Fallback to Amadeus versioning
harshavmb Oct 21, 2022
772448f
Pull request #1: Feature/SACP-47873 add internal ci
harshavmb Oct 21, 2022
68dd089
SACP-47873: Removing .terraformrc file & limiting parallel tests
harshavmb Oct 21, 2022
14ebf71
SACP-47873: Limiting parallel builds to 5
harshavmb Oct 21, 2022
8ef20ab
SACP-47877: Rolling Upgrade cancelled before deleting VMSS
harshavmb Oct 21, 2022
3516879
Pull request #2: SACP-47877: Rolling Upgrade cancelled before deletin…
harshavmb Oct 25, 2022
876a8da
Merge branch 'main' of https://github.com/AmadeusITGroup/terraform-pr…
harshavmb Oct 25, 2022
6c4090f
Merge branch 'master' of ssh://git.rnd.amadeus.net/iac/terraform-prov…
harshavmb Oct 25, 2022
8e7ddd3
pushing changes to github branch
harshavmb Oct 25, 2022
e14adc0
reverting changes to .goreleaser.yml
harshavmb Oct 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions internal/services/compute/client/helper.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
package client

import (
"context"
"fmt"
"log"
"net/http"
"strings"
)

func (c *Client) CancelRollingUpgradesBeforeDeletion(ctx context.Context, resourceGroupName string, vmScaleSetName string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just pass in the VMSS resource ID here since that contains all the info we need

Suggested change
func (c *Client) CancelRollingUpgradesBeforeDeletion(ctx context.Context, resourceGroupName string, vmScaleSetName string) error {
func (c *Client) CancelRollingUpgradesBeforeDeletion(ctx context.Context, id parse.VirtualMachineScaleSetId) error {

future, err := c.VMScaleSetRollingUpgradesClient.Cancel(ctx, resourceGroupName, vmScaleSetName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an API call to retrieve the latest rolling upgrades, I think we should do that here and return nil if no rolling upgrade is fine, i.e. there is no need to cancel and trigger the 409 error described down below, and if there is an existing rolling upgrade then we proceed to cancel - all other errors should be caught and surfaced

Suggested change
future, err := c.VMScaleSetRollingUpgradesClient.Cancel(ctx, resourceGroupName, vmScaleSetName)
resp, err := c.VMScaleSetRollingUpgradesClient.GetLatest(ctx, id.ResourceGroup, id.Name)
if err != nil {
if utils.ResponseWasNotFound(resp.Response) {
return nil
}
return fmt.Errorf("retrieving rolling updates for %s: %+v", id, err)
}
future, err := c.VMScaleSetRollingUpgradesClient.Cancel(ctx, resourceGroupName, vmScaleSetName)


// If rolling upgrades haven't been run (when VMSS are just provisioned with rolling upgrades but no extensions, auto-scaling are run )
// we can not cancel rolling upgrades
// API call :: GET https://management.azure.com/subscriptions/{subId}/resourceGroups/{rgName}/providers/Microsoft.Compute/virtualMachineScaleSets/{vmSSName}/rollingUpgrades/latest?api-version=2021-07-01
// Azure API throws 409 conflict error saying "The entity was not found in this Azure location."
stephybun marked this conversation as resolved.
Show resolved Hide resolved
// If the above error message matches, we identify and move forward to delete the VMSS
// in all other cases, it just cancels the rolling upgrades and move ahead to delete the VMSS
if err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) {
return fmt.Errorf("error cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err)
}
Comment on lines +20 to +22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. This actually resulted in a crash for me when when trying to reproduce the case described in the comments above. In addition matching on the error messages tends to be brittle since that is something that can change very easily in the API.

A more robust way to check whether to call the cancel function for rolling upgrades would be to query whether there are any ongoing rolling upgrades first. I've left comments in-line with what that would look like.

Comment on lines +14 to +22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Means none of this is necessary since we aren't hitting this behaviour anymore

Suggested change
// If rolling upgrades haven't been run (when VMSS are just provisioned with rolling upgrades but no extensions, auto-scaling are run )
// we can not cancel rolling upgrades
// API call :: GET https://management.azure.com/subscriptions/{subId}/resourceGroups/{rgName}/providers/Microsoft.Compute/virtualMachineScaleSets/{vmSSName}/rollingUpgrades/latest?api-version=2021-07-01
// Azure API throws 409 conflict error saying "The entity was not found in this Azure location."
// If the above error message matches, we identify and move forward to delete the VMSS
// in all other cases, it just cancels the rolling upgrades and move ahead to delete the VMSS
if err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) {
return fmt.Errorf("error cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err)
}


if err := future.WaitForCompletionRef(ctx, c.VMScaleSetExtensionsClient.Client); err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) {
return fmt.Errorf("waiting for cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err)
}
log.Printf("[DEBUG] cancelled Virtual Machine Scale Set Rolling Upgrades %q (Resource Group %q).", vmScaleSetName, resourceGroupName)
return nil
Comment on lines +24 to +28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can simplify the condition here since it should be prevented by the check retrieving any rolling upgrades above

Suggested change
if err := future.WaitForCompletionRef(ctx, c.VMScaleSetExtensionsClient.Client); err != nil && !(future.Response().StatusCode == http.StatusConflict && strings.Contains(err.Error(), "There is no ongoing Rolling Upgrade to cancel.")) {
return fmt.Errorf("waiting for cancelling rolling upgrades of Virtual Machine Scale Set %q (Resource Group %q): %+v", vmScaleSetName, resourceGroupName, err)
}
log.Printf("[DEBUG] cancelled Virtual Machine Scale Set Rolling Upgrades %q (Resource Group %q).", vmScaleSetName, resourceGroupName)
return nil
if err := future.WaitForCompletionRef(ctx, c.VMScaleSetExtensionsClient.Client); err != nil {
return fmt.Errorf("waiting for cancelling of rolling upgrades for %s: %+v", id, err)
}
log.Printf("[DEBUG] cancelled Virtual Machine Scale Set Rolling Upgrades for %s.", id)
return nil

}
Original file line number Diff line number Diff line change
Expand Up @@ -1078,6 +1078,13 @@ func resourceLinuxVirtualMachineScaleSetDelete(d *pluginsdk.ResourceData, meta i
return fmt.Errorf("retrieving Linux Virtual Machine Scale Set %q (Resource Group %q): %+v", id.Name, id.ResourceGroup, err)
}

// When rolling upgrades are setup, vmscalesets can't be deleted unless the upgrade is cancelled.
// Since destroy function intention is to VMSS itself. Rolling upgrades are trivial here, hence we cancel before we trigger destroy call
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.Name)
if err != nil {
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.Name, id.ResourceGroup, err)
}
Comment on lines +1083 to +1086
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify this to

Suggested change
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.Name)
if err != nil {
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.Name, id.ResourceGroup, err)
}
if err := meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, *id); err != nil {
fmt.Errorf("cancelling rolling upgrades for %s: %+v", *id, err)
}


// Sometimes VMSS's aren't fully deleted when the `Delete` call returns - as such we'll try to scale the cluster
// to 0 nodes first, then delete the cluster - which should ensure there's no Network Interfaces kicking around
// and work around this Azure API bug:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,13 @@ func resourceVirtualMachineScaleSetExtensionDelete(d *pluginsdk.ResourceData, me
return err
}

// When rolling upgrades are setup, vmscalesets can't be deleted unless the upgrade is cancelled.
// Since destroy function intention is to VMSS itself. Rolling upgrades are trivial here, hence we cancel before we trigger destroy call
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.VirtualMachineScaleSetName)
if err != nil {
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.VirtualMachineScaleSetName, id.ResourceGroup, err)
}
Comment on lines +366 to +369
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Suggested change
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.VirtualMachineScaleSetName)
if err != nil {
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.VirtualMachineScaleSetName, id.ResourceGroup, err)
}
if err := meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, *id); err != nil {
fmt.Errorf("cancelling rolling upgrades for %s: %+v", *id, err)
}


future, err := client.Delete(ctx, id.ResourceGroup, id.VirtualMachineScaleSetName, id.ExtensionName)
if err != nil {
return fmt.Errorf("deleting Extension %q (Virtual Machine Scale Set %q / Resource Group %q): %+v", id.ExtensionName, id.VirtualMachineScaleSetName, id.ResourceGroup, err)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1108,6 +1108,13 @@ func resourceWindowsVirtualMachineScaleSetDelete(d *pluginsdk.ResourceData, meta
return fmt.Errorf("retrieving Windows Virtual Machine Scale Set %q (Resource Group %q): %+v", id.Name, id.ResourceGroup, err)
}

// When rolling upgrades are setup, vmscalesets can't be deleted unless the upgrade is cancelled.
// Since destroy function intention is to VMSS itself. Rolling upgrades are trivial here, hence we cancel before we trigger destroy call
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.Name)
if err != nil {
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.Name, id.ResourceGroup, err)
}
Comment on lines +1113 to +1116
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

Suggested change
err = meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, id.ResourceGroup, id.Name)
if err != nil {
return fmt.Errorf("error while cancelling rolling upgrade during destroy phase in Linux Virtual Machine Scale Set %q (Resource Group %q) : %+v", id.Name, id.ResourceGroup, err)
}
if err := meta.(*clients.Client).Compute.CancelRollingUpgradesBeforeDeletion(ctx, *id); err != nil {
fmt.Errorf("cancelling rolling upgrades for %s: %+v", *id, err)
}


// Sometimes VMSS's aren't fully deleted when the `Delete` call returns - as such we'll try to scale the cluster
// to 0 nodes first, then delete the cluster - which should ensure there's no Network Interfaces kicking around
// and work around this Azure API bug:
Expand Down