Using user_data on resource.nutanix_virtual_machine yields immediate diff after initial apply #69

rxacevedo · 2019-06-25T17:44:05Z

Describe the bug

When using guest_customization_cloud_init_user_data to bootstrap a virtual machine, a subsequent plan after apply yields a diff because a CDROM device is attached to the virtual machine as a means of supplying user_data to the host. This causes a few issues.

Expected behavior

user_data is provisioned onto the virtual machine in such a way that it does not create a diff on the plan.

Logs

Plan/diff:

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  ~ nutanix_virtual_machine.node
      disk_list.2.device_properties.0.disk_address.device_index: "3" => "2"


Plan: 0 to add, 1 to change, 0 to destroy.

------------------------------------------------------------------------

This might be fine normally, except two things:

Devices attached using IDE for the adapter_type cannot be removed while the VM is powered on (I got this from the Prism Central UI).
resourceVirtualMachineUpdate calls changePowerState indiscriminately on any VM attribute update (beginning and end of function declaration), even something such as changing the hostname on an existing VM.

Now, here's where things start to get weird - let's say I have the following disk_list:

curl -X GET \
          --silent \
          --insecure \
          --header "Content-Type: application/json" \
          --header "Accept: application/json" \
          --header "Authorization: Basic authBase64==" \
          https://prism.mydomain.tld:9440/api/nutanix/v3/vms/3db764c8-6d06-4f51-939a-dc42b1fc24c8 \
          | gron \
          | grep -E '\.spec\.resources\.disk_list\[[0-9]\]\.device_properties\.(device_type|disk_address\.device_index)'
json.spec.resources.disk_list[0].device_properties.device_type = "DISK";
json.spec.resources.disk_list[0].device_properties.disk_address.device_index = 0;
json.spec.resources.disk_list[1].device_properties.device_type = "DISK";
json.spec.resources.disk_list[1].device_properties.disk_address.device_index = 1;
json.spec.resources.disk_list[2].device_properties.device_type = "CDROM";
json.spec.resources.disk_list[2].device_properties.disk_address.device_index = 3;

And I then add two more disks (so, index 0 and 1 are in my config, Nutanix (server) has added a CDROM device (implicitly) to my VM to inject user_data, and this device is located at index 3). So my new disk_list looks like:

  disk_list = [
    {
      data_source_reference = [{
          kind = "image"
          uuid = "${data.terraform_remote_state.images.centos_image}"
      }]

      device_properties = [{

        disk_address = {
          device_index = 0
          adapter_type = "SCSI"
        }
        device_type = "DISK"

      }]
    },
    {
      device_properties = [{

        disk_address = {
          device_index = 1
          adapter_type = "SCSI"
        }
        device_type = "DISK"

      }]
      disk_size_mib   = 100000
    },
    {
      device_properties = [{

        disk_address = {
          device_index = 2
          adapter_type = "SCSI"
        }
        device_type = "DISK"

      }]
      disk_size_mib   = 100000
    },
    {
      device_properties = [{

        disk_address = {
          device_index = 3
          adapter_type = "SCSI"
        }
        device_type = "DISK"

      }]
      disk_size_mib   = 100000
    }
  ]

This yields the following plan:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  ~ nutanix_virtual_machine.node
      disk_list.#:                                               "3" => "4"
      disk_list.2.device_properties.0.device_type:               "CDROM" => "DISK"
      disk_list.2.device_properties.0.disk_address.adapter_type: "IDE" => "SCSI"
      disk_list.2.device_properties.0.disk_address.device_index: "3" => "2"
      disk_list.2.disk_size_mib:                                 "1" => "100000"
      disk_list.3.data_source_reference.%:                       "" => <computed>
      disk_list.3.device_properties.#:                           "0" => "1"
      disk_list.3.device_properties.0.device_type:               "" => "DISK"
      disk_list.3.device_properties.0.disk_address.%:            "0" => "2"
      disk_list.3.device_properties.0.disk_address.adapter_type: "" => "SCSI"
      disk_list.3.device_properties.0.disk_address.device_index: "" => "3"
      disk_list.3.disk_size_mib:                                 "" => "100000"
      disk_list.3.volume_group_reference.%:                      "" => <computed>


Plan: 0 to add, 1 to change, 0 to destroy.

This breaks because Nutanix tries to resize the CDROM device:

Error: Error applying plan:

1 error occurred:
	* nutanix_virtual_machine.node: 1 error occurred:
	* nutanix_virtual_machine.node: error waiting for vm (3db764c8-6d06-4f51-939a-dc42b1fc24c8) to update: error_detail: INTERNAL_ERROR: error_code: 27
error_detail: "NotSupported: Cannot resize cdrom at ide.3 of VM 3db764c8-6d06-4f51-939a-dc42b1fc24c8.", progress_message: update_vm_intentful





Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Furthermore, this actually prevents subsequent plans from succeeding, returning the same error. This is because:

curl -X GET \
          --silent \
          --insecure \
          --header "Content-Type: application/json" \
          --header "Accept: application/json" \
          --header "Authorization: Basic authBase64==" \
          https://prism.mydomain.tld:9440/api/nutanix/v3/vms/3db764c8-6d06-4f51-939a-dc42b1fc24c8 | jq '.status.message_list'
[
  {
    "message": "error_code: 27\nerror_detail: \"NotSupported: Cannot resize cdrom at ide.3 of VM 3db764c8-6d06-4f51-939a-dc42b1fc24c8.\"",
    "reason": "INTERNAL_ERROR"
  }
]

You either have to destroy the VM (terraform destroy -refresh=false) or clear the error state on the object in the API by posting a new spec (I have not tried this).

Versions (please complete the following information):

linux_amd64
Terraform v0.11.14
Nutanix Cluster (Prism Element / AOS) Version 5.10.3.2 LTS
Nutanix Prism Central Version 5.10.3
Terraform provider version (compiled locally off of master / 5fd531b)

Additional context
I can trick the provider and match what the API returns so that I don't get a diff in the plan:

{
      device_properties = [{

        disk_address = {
          device_index = 3
          adapter_type = "IDE"
        }
        device_type = "CDROM"

}

But this feels dirty, and requires that the user understand the implementation details of how user_data is injected into the VM. I'm not sure how many users would do this before just submitting a support ticket.

The text was updated successfully, but these errors were encountered:

rxacevedo · 2019-06-25T18:02:48Z

Also, let me know if the initial diff/resize CDROM problems should be split into separate issues.

Jorge-Holgado · 2019-08-06T07:43:54Z

Good morning,
I've "patched" the provider so it ignores CDROM && IDE (bus) changes.
We're hitting the same problem as you, the 2nd apply after the 1st one (creating vm's) will cause, depending on terraform version:

VM reboot and remove that cdrom (terraform v0.10).
Ask to remove the cdrom and if removed, the vm crash (kernel panic/bsod) as you're hot-removing a IDE bus drive (terraform v > 0.11) and IDE is not hot plug (usually).

You can give a try to my fork on:
https://github.com/Jorge-Holgado/terraform-provider-nutanix/tree/ignore_cdromide

Maybe is not the best option but as we're not using cdrom drives at all, using this little patch is a good option for us.
Thanks!

trexmaster · 2019-10-29T16:15:49Z

We've just hit the same problem, I've @rxacevedo workaround but that does feel very dirty and I feel it shouldn't be necessary.

dot1q · 2020-01-15T18:27:21Z

We just updated to 5.11.2.1 and since then, the workaround above no longer works for us. When creating an new VM, the following error shows up.

Error: Error applying plan:

1 error occurred:
        * module.nutanix-sn.nutanix_virtual_machine.sn-www-01: 1 error occurred:
        * nutanix_virtual_machine.sn-www-01: error waiting for vm () to create: error_detail: INTERNAL_ERROR, progress_message: error_code: 6
error_detail: "VM Disk Attach subtask: fe90924f-9d76-4c95-904e-2c8da74da3dc failed. Error: kBusSlotOccupied: BusSlotOccupied: Slot ide.3 is occupied"

Before we updated, we were at least able to create a VM, but since then, we can no longer perform the change. I'm assuming that something has changed in the pre-checks for creating a VM.

dot1q · 2020-01-20T00:24:06Z

I did find a work around for this, but it required me to clone the current Nutanix provider from Hashicorp and then take the file modified in the branch https://github.com/Jorge-Holgado/terraform-provider-nutanix/tree/ignore_cdromide and recompile it. My Terrafrom config and deployment uses Docker images, so it's a real pain to have to clone and compile the provider now, but it does get my production environment back up and running.

phthano-zz · 2020-03-09T17:55:34Z

@nutanix @JonKohler please fix this, our production environment is impacted by this.

JonKohler · 2020-03-09T19:35:02Z

Hey @phthano - thanks for reaching out. I pinged the internal folks who overseen terraform things these days, as I haven't been a maintainer on this for quite a while. Happy to make that connection tho to get their eyes on this.

BraddMPiontek · 2020-05-12T14:31:11Z

What is the future of this provider? This issue has been open for almost a year. The provider's last release was Sept 2019. The README states it is still a technology preview and is very light on examples and documentation.

We are evaluating moving from VSphere to AHV and we rely heavily on Terraform for our automation. This bug and other nuances on how this provider works has me questioning if this platform (AHV) is ready for automation via terraform.

JonKohler · 2020-05-12T14:42:07Z

Hey @piontekdd - Thank you for reaching out - I've poked the internal team that oversees our various thirdparty integrations with some hot pokers to see whats up. If you'd like, ping me [email protected] and I can get you connected directly with the PM in this area to talk through your use case and give you the g2 you need

marinsalinas · 2020-05-13T17:27:30Z

Hey all, this issue was fixed on #111 , it is now on master and it will be added to the next release.

rxacevedo changed the title ~~Using user_data yields immediate diff affert resourceVirtualMachineCreate called~~ Using user_data yields immediate diff after resourceVirtualMachineCreate called Jun 25, 2019

rxacevedo changed the title ~~Using user_data yields immediate diff after resourceVirtualMachineCreate called~~ Using user_data (resource.nutanix_virtual_machine) yields immediate diff after initial terraform apply Jun 25, 2019

rxacevedo changed the title ~~Using user_data (resource.nutanix_virtual_machine) yields immediate diff after initial terraform apply~~ Using user_data on resource.nutanix_virtual_machine yields immediate diff after initial apply Jun 25, 2019

marinsalinas closed this as completed May 13, 2020

yannickstruyf3 mentioned this issue May 13, 2020

Bugfix/cloudinit final #111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using user_data on resource.nutanix_virtual_machine yields immediate diff after initial apply #69

Using user_data on resource.nutanix_virtual_machine yields immediate diff after initial apply #69

rxacevedo commented Jun 25, 2019

rxacevedo commented Jun 25, 2019

Jorge-Holgado commented Aug 6, 2019 •

edited

Loading

trexmaster commented Oct 29, 2019

dot1q commented Jan 15, 2020

dot1q commented Jan 20, 2020

phthano-zz commented Mar 9, 2020

JonKohler commented Mar 9, 2020

BraddMPiontek commented May 12, 2020

JonKohler commented May 12, 2020

marinsalinas commented May 13, 2020 •

edited

Loading

Using user_data on resource.nutanix_virtual_machine yields immediate diff after initial apply #69

Using user_data on resource.nutanix_virtual_machine yields immediate diff after initial apply #69

Comments

rxacevedo commented Jun 25, 2019

rxacevedo commented Jun 25, 2019

Jorge-Holgado commented Aug 6, 2019 • edited Loading

trexmaster commented Oct 29, 2019

dot1q commented Jan 15, 2020

dot1q commented Jan 20, 2020

phthano-zz commented Mar 9, 2020

JonKohler commented Mar 9, 2020

BraddMPiontek commented May 12, 2020

JonKohler commented May 12, 2020

marinsalinas commented May 13, 2020 • edited Loading

Jorge-Holgado commented Aug 6, 2019 •

edited

Loading

marinsalinas commented May 13, 2020 •

edited

Loading