Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_windows_virtual_machine_scale_set | azurerm_linux_virtual_machine_scale_set | azurerm_orchestrated_virtual_machine_scale_set - expose the action attribute in the automatic_instance_repair code block #26227

Merged
merged 27 commits into from
Aug 9, 2024

Conversation

WodansSon
Copy link
Collaborator

Community Note

  • Please vote on this PR by adding a 👍 reaction to the original PR to help the community and maintainers prioritize for review
  • Please do not leave "+1" or "me too" comments, they generate extra noise for PR followers and do not help prioritize for review

Description

Exposes the action attribute in the automatic_instance_repair code block for the azurerm_windows_virtual_machine_scale_set, azurerm_linux_virtual_machine_scale_set, and azurerm_orchestrated_virtual_machine_scale_set resources which has now GA'ed.

PR Checklist

  • I have followed the guidelines in our Contributing Documentation.
  • I have checked to ensure there aren't other open Pull Requests for the same update/change.
  • I have checked if my changes close any open issues. If so please include appropriate closing keywords below.
  • I have updated/added Documentation as required written in a helpful and kind way to assist users that may be unfamiliar with the resource / data source.
  • I have used a meaningful PR title to help maintainers and other users understand this change and help prevent duplicate work.
    For example: “resource_name_here - description of change e.g. adding property new_property_name_here

Changes to existing Resource / Data Source

  • I have added an explanation of what my changes do and why I'd like you to include them (This may be covered by linking to an issue above, but may benefit from additional explanation).
  • I have written new tests for my resource or datasource changes & updated any relevent documentation.
  • I have successfully run tests with my changes locally. If not, please provide details on testing challenges that prevented you running the tests.
  • (For changes that include a state migration only). I have manually tested the migration path between relevant versions of the provider.

Testing

  • My submission includes Test coverage as described in the Contribution Guide and the tests pass. (if this is not possible for any reason, please include details of why you did or could not add test coverage)

Change Log

Below please provide what should go into the changelog (if anything) conforming to the Changelog Format documented here.

  • azurerm_linux_virtual_machine_scale_set - expose the action attribute in the automatic_instance_repair code block
  • azurerm_orchestrated_virtual_machine_scale_set - expose the action attribute in the automatic_instance_repair code block
  • azurerm_windows_virtual_machine_scale_set - expose the action attribute in the automatic_instance_repair code block

This is a (please select all that apply):

  • Bug Fix
  • New Feature (ie adding a service, resource, or data source)
  • Enhancement
  • Breaking Change

Related Issue(s)

Supersedes PR #22330

Note

If this PR changes meaningfully during the course of review please update the title and description as required.

@WodansSon WodansSon added this to the v3.107.0 milestone Jun 4, 2024
Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WodansSon! In addition to the one minor comment left in-line, there is also a test failure:

------- Stdout: -------
=== RUN   TestAccLinuxVirtualMachineScaleSet_otherAutomaticRepairsPolicy
=== PAUSE TestAccLinuxVirtualMachineScaleSet_otherAutomaticRepairsPolicy
=== CONT  TestAccLinuxVirtualMachineScaleSet_otherAutomaticRepairsPolicy
    testcase.go:113: Step 1/8 error: After applying this test step, the plan was not empty.
        stdout:
        Terraform used the selected providers to generate the following execution
        plan. Resource actions are indicated with the following symbols:
          ~ update in-place
        Terraform will perform the following actions:
          # azurerm_linux_virtual_machine_scale_set.test will be updated in-place
          ~ resource "azurerm_linux_virtual_machine_scale_set" "test" {
                id                                                = "/subscriptions/*******/resourceGroups/acctestRG-240605072813414110/providers/Microsoft.Compute/virtualMachineScaleSets/acctestvmss-240605072813414110"
                name                                              = "acctestvmss-240605072813414110"
                # (25 unchanged attributes hidden)
              ~ automatic_instance_repair {
                  ~ action       = "Replace" -> "Restart"
                    # (2 unchanged attributes hidden)
                }
              ~ network_interface {
                  + dns_servers                   = []
                    name                          = "example"
                    # (3 unchanged attributes hidden)
                  ~ ip_configuration {
                      + application_gateway_backend_address_pool_ids = []
                      + application_security_group_ids               = []
                        name                                         = "internal"
                        # (5 unchanged attributes hidden)
                    }
                }
                # (4 unchanged blocks hidden)
            }
        Plan: 0 to add, 1 to change, 0 to destroy.
--- FAIL: TestAccLinuxVirtualMachineScaleSet_otherAutomaticRepairsPolicy (457.25s)
FAIL

Comment on lines 1952 to 1956
ValidateFunc: validation.StringInSlice([]string{
string(virtualmachinescalesets.RepairActionReimage),
string(virtualmachinescalesets.RepairActionRestart),
string(virtualmachinescalesets.RepairActionReplace),
}, false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can replace these with

Suggested change
ValidateFunc: validation.StringInSlice([]string{
string(virtualmachinescalesets.RepairActionReimage),
string(virtualmachinescalesets.RepairActionRestart),
string(virtualmachinescalesets.RepairActionReplace),
}, false),
ValidateFunc: validation.StringInSlice(virtualmachinescalesets.PossibleValuesForRepairAction(), false),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@WodansSon WodansSon modified the milestones: v3.107.0, Future Jun 7, 2024
@WodansSon WodansSon modified the milestones: Future, v3.108.0 Jun 8, 2024
@WodansSon WodansSon modified the milestones: v3.114.0, v3.115.0 Aug 1, 2024
@WodansSon
Copy link
Collaborator Author

The test failures are due to other v4.0 issues not related to this PR:

image

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WodansSon. I have a question about the behaviour and there are some minor things to fix up as well. Once those are resolved and the relevant tests pass then this should be good to go.

internal/services/compute/virtual_machine_scale_set.go Outdated Show resolved Hide resolved
internal/services/compute/virtual_machine_scale_set.go Outdated Show resolved Hide resolved
internal/services/compute/virtual_machine_scale_set.go Outdated Show resolved Hide resolved
internal/services/compute/virtual_machine_scale_set.go Outdated Show resolved Hide resolved
internal/services/compute/virtual_machine_scale_set.go Outdated Show resolved Hide resolved
internal/services/compute/virtual_machine_scale_set.go Outdated Show resolved Hide resolved
internal/services/compute/virtual_machine_scale_set.go Outdated Show resolved Hide resolved
internal/services/compute/virtual_machine_scale_set.go Outdated Show resolved Hide resolved
@WodansSon
Copy link
Collaborator Author

All of the failures are not related to this PR and are service related or test misconfigurations in v4.0:

image

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WodansSon! I was hoping this would be good to go now, but there is a scenario which we have not tested for and raises some questions which would be good to have answered before we proceed with the current behaviour:

How would a user remove automatic_instance_repair if they wanted to removed health_probe_id?

I messed around with this in an attempt to understand what that update path would look like but removing automatic_instance_repair with health_probe_id unset in the config results in an API error even though automatic_instance_repair isn't in the payload which raises questions as to how the Service Team want the interplay between these two properties to work.

PATCH request payload:

{
  "properties": {
    "upgradePolicy": {
      "mode": "Manual"
    },
    "virtualMachineProfile": {
      "networkProfile": {
        "networkInterfaceConfigurations": [
          {
            "name": "example",
            "properties": {
              "dnsSettings": {
                "dnsServers": []
              },
              "enableAcceleratedNetworking": false,
              "enableIPForwarding": false,
              "ipConfigurations": [
                {
                  "name": "internal",
                  "properties": {
                    "applicationGatewayBackendAddressPools": [],
                    "applicationSecurityGroups": [],
                    "loadBalancerBackendAddressPools": [
                      {
                        "id": "***"
                      }
                    ],
                    "loadBalancerInboundNatPools": [
                      {
                        "id": "***"
                      }
                    ],
                    "primary": true,
                    "privateIPAddressVersion": "IPv4",
                    "subnet": {
                      "id": "***"
                    }
                  }
                }
              ],
              "primary": true
            }
          }
        ]
      },
      "storageProfile": {
        "imageReference": {
          "offer": "UbuntuServer",
          "publisher": "Canonical",
          "sku": "16.04-LTS",
          "version": "latest"
        }
      }
    }
  }
}

Response:

{
  "error": {
    "code": "BadRequest",
    "message": "Automatic repairs not supported for this Virtual Machine Scale Set because a health probe or health extension was not provided."
  }
}

Could you raise this and get some clarification?

* `grace_period` - (Optional) Amount of time (in minutes, between 30 and 90) for which automatic repairs will be delayed. The grace period starts right after the VM is found unhealthy. The time duration should be specified in ISO 8601 format. Defaults to `PT30M`.
* `grace_period` - (Optional) Amount of time for which automatic repairs will be delayed. The grace period starts right after the VM is found unhealthy. Possible values are between `10` and `90` minutes. The time duration should be specified in `ISO 8601` format (e.g. `PT10M` to `PT90M`).

-> **Note:** Once the `grace_period` field has been set it cannot be removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-picking here but strictly speaking this isn't correct. Since the property has Computed set users can remove this from their config without hitting any API limitations or experiencing a diff so I think this should be removed, or perhaps rephrased?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a commit is missing here but these have not been changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I misunderstood what you were saying, now that I have read it a few more times, what you want is for the note to be removed since if you remove it from the config is will just pull it from state?


* `action` - (Optional) The repair action that will be used for repairing unhealthy virtual machines in the scale set. Possible values include `Replace`, `Restart`, `Reimage`.

-> **Note:** Once the `action` field has been set it cannot be removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@WodansSon
Copy link
Collaborator Author

WodansSon commented Aug 8, 2024

@stephybun :

How would a user remove automatic_instance_repair if they wanted to removed health_probe_id?

I messed around with this in an attempt to understand what that update path would look like but removing automatic_instance_repair with health_probe_id unset in the config results in an API error even though automatic_instance_repair isn't in the payload which raises questions as to how the Service Team want the interplay between these two properties to work.

IIRC, once the automatic_instance_repair is enabled it cannot be removed from the VMSS resource, however you can suspend it via a separate API call, which I have not exposed, nor was I asked too, but you can disable it via the enabled field in the code block. The suspend bit might be a later ask from the service team?

Viewing and updating the service state of automatic instance repairs policy

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's just a bit jarring to receive the following error:
"message": "Automatic repairs not supported for this Virtual Machine Scale Set because a health probe or health extension was not provided."
When neither automatic repairs and health probe ID are sent in the payload. If the Service Team are open to feedback here then a more contextual error message would go a long way and have saved us some back and forth on this, e.g.

Automatic repairs have been enabled for this Virtual Machine Scale Set previously and cannot be removed

In any case, the comment I left on the docs doesn't seem to have been addressed or there's a commit missing? Once that's sorted and been applied to all three docs then this should be good to go!

@@ -632,7 +632,7 @@ func resourceManagedDiskUpdate(d *pluginsdk.ResourceData, meta interface{}) erro
disk, err := client.Get(ctx, *id)
if err != nil {
if response.WasNotFound(disk.HttpResponse) {
return fmt.Errorf("Managed Disk %q (Resource Group %q) was not found", name, resourceGroup)
return fmt.Errorf("managed disk %q (Resource Group %q) was not found", name, resourceGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to fix this here and now but just so you're aware this is a legacy pattern. If you see more like these, they should be updated to use the ID to be consistent and up to date with the rest of the provider.

Suggested change
return fmt.Errorf("managed disk %q (Resource Group %q) was not found", name, resourceGroup)
return fmt.Errorf("%s was not found", id)

* `grace_period` - (Optional) Amount of time (in minutes, between 30 and 90) for which automatic repairs will be delayed. The grace period starts right after the VM is found unhealthy. The time duration should be specified in ISO 8601 format. Defaults to `PT30M`.
* `grace_period` - (Optional) Amount of time for which automatic repairs will be delayed. The grace period starts right after the VM is found unhealthy. Possible values are between `10` and `90` minutes. The time duration should be specified in `ISO 8601` format (e.g. `PT10M` to `PT90M`).

-> **Note:** Once the `grace_period` field has been set it cannot be removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a commit is missing here but these have not been changed?

@WodansSon
Copy link
Collaborator Author

image

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WodansSon LGTM 👍

@stephybun stephybun merged commit bd330c2 into main Aug 9, 2024
34 checks passed
@stephybun stephybun deleted the e_vmss_repair_action branch August 9, 2024 06:44
stephybun added a commit that referenced this pull request Aug 9, 2024
@NaveenKumar-Y
Copy link

NaveenKumar-Y commented Aug 19, 2024

image

When automatic_instance_repair is used in linux vmss module, even though when we set enable=false, it is taking it as true,
Provider version : v3.116.0

In main.tf:
image

variable:
image

In Terraform Plan:
image

Terraform apply error:
image

FYI - The same setup is working with provider version : v3.113.0

@stephybun @WodansSon

@NaveenKumar-Y
Copy link

@WodansSon @stephybun

Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants