Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using user_data on resource.nutanix_virtual_machine yields immediate diff after initial apply #69

Closed
rxacevedo opened this issue Jun 25, 2019 · 10 comments · Fixed by #111

Comments

@rxacevedo
Copy link
Contributor

Describe the bug

When using guest_customization_cloud_init_user_data to bootstrap a virtual machine, a subsequent plan after apply yields a diff because a CDROM device is attached to the virtual machine as a means of supplying user_data to the host. This causes a few issues.

Expected behavior

user_data is provisioned onto the virtual machine in such a way that it does not create a diff on the plan.

Logs

Plan/diff:

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  ~ nutanix_virtual_machine.node
      disk_list.2.device_properties.0.disk_address.device_index: "3" => "2"


Plan: 0 to add, 1 to change, 0 to destroy.

------------------------------------------------------------------------

This might be fine normally, except two things:

  1. Devices attached using IDE for the adapter_type cannot be removed while the VM is powered on (I got this from the Prism Central UI).
  2. resourceVirtualMachineUpdate calls changePowerState indiscriminately on any VM attribute update (beginning and end of function declaration), even something such as changing the hostname on an existing VM.

Now, here's where things start to get weird - let's say I have the following disk_list:

curl -X GET \
          --silent \
          --insecure \
          --header "Content-Type: application/json" \
          --header "Accept: application/json" \
          --header "Authorization: Basic authBase64==" \
          https://prism.mydomain.tld:9440/api/nutanix/v3/vms/3db764c8-6d06-4f51-939a-dc42b1fc24c8 \
          | gron \
          | grep -E '\.spec\.resources\.disk_list\[[0-9]\]\.device_properties\.(device_type|disk_address\.device_index)'
json.spec.resources.disk_list[0].device_properties.device_type = "DISK";
json.spec.resources.disk_list[0].device_properties.disk_address.device_index = 0;
json.spec.resources.disk_list[1].device_properties.device_type = "DISK";
json.spec.resources.disk_list[1].device_properties.disk_address.device_index = 1;
json.spec.resources.disk_list[2].device_properties.device_type = "CDROM";
json.spec.resources.disk_list[2].device_properties.disk_address.device_index = 3;

And I then add two more disks (so, index 0 and 1 are in my config, Nutanix (server) has added a CDROM device (implicitly) to my VM to inject user_data, and this device is located at index 3). So my new disk_list looks like:

  disk_list = [
    {
      data_source_reference = [{
          kind = "image"
          uuid = "${data.terraform_remote_state.images.centos_image}"
      }]

      device_properties = [{

        disk_address = {
          device_index = 0
          adapter_type = "SCSI"
        }
        device_type = "DISK"

      }]
    },
    {
      device_properties = [{

        disk_address = {
          device_index = 1
          adapter_type = "SCSI"
        }
        device_type = "DISK"

      }]
      disk_size_mib   = 100000
    },
    {
      device_properties = [{

        disk_address = {
          device_index = 2
          adapter_type = "SCSI"
        }
        device_type = "DISK"

      }]
      disk_size_mib   = 100000
    },
    {
      device_properties = [{

        disk_address = {
          device_index = 3
          adapter_type = "SCSI"
        }
        device_type = "DISK"

      }]
      disk_size_mib   = 100000
    }
  ]

This yields the following plan:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  ~ nutanix_virtual_machine.node
      disk_list.#:                                               "3" => "4"
      disk_list.2.device_properties.0.device_type:               "CDROM" => "DISK"
      disk_list.2.device_properties.0.disk_address.adapter_type: "IDE" => "SCSI"
      disk_list.2.device_properties.0.disk_address.device_index: "3" => "2"
      disk_list.2.disk_size_mib:                                 "1" => "100000"
      disk_list.3.data_source_reference.%:                       "" => <computed>
      disk_list.3.device_properties.#:                           "0" => "1"
      disk_list.3.device_properties.0.device_type:               "" => "DISK"
      disk_list.3.device_properties.0.disk_address.%:            "0" => "2"
      disk_list.3.device_properties.0.disk_address.adapter_type: "" => "SCSI"
      disk_list.3.device_properties.0.disk_address.device_index: "" => "3"
      disk_list.3.disk_size_mib:                                 "" => "100000"
      disk_list.3.volume_group_reference.%:                      "" => <computed>


Plan: 0 to add, 1 to change, 0 to destroy.

This breaks because Nutanix tries to resize the CDROM device:

Error: Error applying plan:

1 error occurred:
	* nutanix_virtual_machine.node: 1 error occurred:
	* nutanix_virtual_machine.node: error waiting for vm (3db764c8-6d06-4f51-939a-dc42b1fc24c8) to update: error_detail: INTERNAL_ERROR: error_code: 27
error_detail: "NotSupported: Cannot resize cdrom at ide.3 of VM 3db764c8-6d06-4f51-939a-dc42b1fc24c8.", progress_message: update_vm_intentful





Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Furthermore, this actually prevents subsequent plans from succeeding, returning the same error. This is because:

curl -X GET \
          --silent \
          --insecure \
          --header "Content-Type: application/json" \
          --header "Accept: application/json" \
          --header "Authorization: Basic authBase64==" \
          https://prism.mydomain.tld:9440/api/nutanix/v3/vms/3db764c8-6d06-4f51-939a-dc42b1fc24c8 | jq '.status.message_list'
[
  {
    "message": "error_code: 27\nerror_detail: \"NotSupported: Cannot resize cdrom at ide.3 of VM 3db764c8-6d06-4f51-939a-dc42b1fc24c8.\"",
    "reason": "INTERNAL_ERROR"
  }
]

You either have to destroy the VM (terraform destroy -refresh=false) or clear the error state on the object in the API by posting a new spec (I have not tried this).

Versions (please complete the following information):

  • linux_amd64
  • Terraform v0.11.14
  • Nutanix Cluster (Prism Element / AOS) Version 5.10.3.2 LTS
  • Nutanix Prism Central Version 5.10.3
  • Terraform provider version (compiled locally off of master / 5fd531b)

Additional context
I can trick the provider and match what the API returns so that I don't get a diff in the plan:

{
      device_properties = [{

        disk_address = {
          device_index = 3
          adapter_type = "IDE"
        }
        device_type = "CDROM"

}

But this feels dirty, and requires that the user understand the implementation details of how user_data is injected into the VM. I'm not sure how many users would do this before just submitting a support ticket.

@rxacevedo rxacevedo changed the title Using user_data yields immediate diff affert resourceVirtualMachineCreate called Using user_data yields immediate diff after resourceVirtualMachineCreate called Jun 25, 2019
@rxacevedo rxacevedo changed the title Using user_data yields immediate diff after resourceVirtualMachineCreate called Using user_data (resource.nutanix_virtual_machine) yields immediate diff after initial terraform apply Jun 25, 2019
@rxacevedo rxacevedo changed the title Using user_data (resource.nutanix_virtual_machine) yields immediate diff after initial terraform apply Using user_data on resource.nutanix_virtual_machine yields immediate diff after initial apply Jun 25, 2019
@rxacevedo
Copy link
Contributor Author

Also, let me know if the initial diff/resize CDROM problems should be split into separate issues.

@Jorge-Holgado
Copy link
Contributor

Jorge-Holgado commented Aug 6, 2019

Good morning,
I've "patched" the provider so it ignores CDROM && IDE (bus) changes.
We're hitting the same problem as you, the 2nd apply after the 1st one (creating vm's) will cause, depending on terraform version:

  • VM reboot and remove that cdrom (terraform v0.10).
  • Ask to remove the cdrom and if removed, the vm crash (kernel panic/bsod) as you're hot-removing a IDE bus drive (terraform v > 0.11) and IDE is not hot plug (usually).

You can give a try to my fork on:
https://github.com/Jorge-Holgado/terraform-provider-nutanix/tree/ignore_cdromide

Maybe is not the best option but as we're not using cdrom drives at all, using this little patch is a good option for us.
Thanks!

@trexmaster
Copy link

We've just hit the same problem, I've @rxacevedo workaround but that does feel very dirty and I feel it shouldn't be necessary.

@dot1q
Copy link

dot1q commented Jan 15, 2020

We just updated to 5.11.2.1 and since then, the workaround above no longer works for us. When creating an new VM, the following error shows up.

Error: Error applying plan:

1 error occurred:
        * module.nutanix-sn.nutanix_virtual_machine.sn-www-01: 1 error occurred:
        * nutanix_virtual_machine.sn-www-01: error waiting for vm () to create: error_detail: INTERNAL_ERROR, progress_message: error_code: 6
error_detail: "VM Disk Attach subtask: fe90924f-9d76-4c95-904e-2c8da74da3dc failed. Error: kBusSlotOccupied: BusSlotOccupied: Slot ide.3 is occupied"

Before we updated, we were at least able to create a VM, but since then, we can no longer perform the change. I'm assuming that something has changed in the pre-checks for creating a VM.

@dot1q
Copy link

dot1q commented Jan 20, 2020

I did find a work around for this, but it required me to clone the current Nutanix provider from Hashicorp and then take the file modified in the branch https://github.com/Jorge-Holgado/terraform-provider-nutanix/tree/ignore_cdromide and recompile it. My Terrafrom config and deployment uses Docker images, so it's a real pain to have to clone and compile the provider now, but it does get my production environment back up and running.

@phthano-zz
Copy link

@nutanix @JonKohler please fix this, our production environment is impacted by this.

@JonKohler
Copy link
Collaborator

Hey @phthano - thanks for reaching out. I pinged the internal folks who overseen terraform things these days, as I haven't been a maintainer on this for quite a while. Happy to make that connection tho to get their eyes on this.

@BraddMPiontek
Copy link

What is the future of this provider? This issue has been open for almost a year. The provider's last release was Sept 2019. The README states it is still a technology preview and is very light on examples and documentation.

We are evaluating moving from VSphere to AHV and we rely heavily on Terraform for our automation. This bug and other nuances on how this provider works has me questioning if this platform (AHV) is ready for automation via terraform.

@JonKohler
Copy link
Collaborator

Hey @piontekdd - Thank you for reaching out - I've poked the internal team that oversees our various thirdparty integrations with some hot pokers to see whats up. If you'd like, ping me [email protected] and I can get you connected directly with the PM in this area to talk through your use case and give you the g2 you need

@marinsalinas
Copy link
Contributor

marinsalinas commented May 13, 2020

Hey all, this issue was fixed on #111 , it is now on master and it will be added to the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants