Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for network_dhcp_wait_seconds invcd_vapp_vm #436

Merged
merged 38 commits into from
Feb 19, 2020

Conversation

Didainius
Copy link
Collaborator

@Didainius Didainius commented Jan 17, 2020

Closes https://github.com/terraform-providers/terraform-provider-vcd/issues/316

This PR adds a parameter network_dhcp_wait_seconds for vcd_vapp_vm resource and data source. The parameter makes it possible to retrieve ip field in network block on the first apply run when ip_allocation_mode=DHCP. The drawback of it is that READ for the functions takes longer. For this function to work VM must be powered on and at least one of the following must be true:

  • VM has guest tools. It waits for IP address to be reported in vCD UI. This is a slower option, but does not require for the VM to use Edge Gateways DHCP service.
  • VM DHCP interface is connected to routed Org network and is using Edge Gateways DHCP service (not relayed). It works by querying DHCP leases on edge gateway. In general it is quicker than waiting until UI reports IP addresses, but is more constrained. However this is the only option if guest tools are not present on the VM.

The feature got the current state after ruling out a few other options. I'm leaving it here to either avoid discussion of what is not possible or to spark some better ideas:

Problem 1: Changing the value of field network_dhcp_wait_seconds will actually show a change in plan and an update although it will result in no op update. The reason for this is that in READ function d.Get("network_dhcp_wait_seconds") actually gets the value not from .tf config, but from statefile. Apparently in the READ function there is no way to access value of .tf config (open issue for that hashicorp/terraform-plugin-sdk#133). If DiffSuppressFunc is used - then d.Get("network_dhcp_wait_seconds") always receives 0 (default TypeInt value).

Problem 2: Initially I had idea to add this field to each of network block however there are a few issues about it:

  • this wouldn't be suitable for data source because they don't have network block defined
  • change of network_dhcp_wait_seconds value would cause whole network block to be reported as changed. It could confuse the user of updating network adapter itself (which is not the case here). Also it does not make sense to have different timings for each DHCP NIC as the lookup checks all of them at the same time (to save on API calls) and would still take as the longest time defined.

Note. Test suite passed on 9.5.

@ghost ghost added the dependencies label Jan 17, 2020
@Didainius Didainius changed the title Add support for network_dhcp_wait_seconds Add support for network_dhcp_wait_seconds invcd_vapp_vm Jan 20, 2020
@Didainius Didainius marked this pull request as ready for review January 21, 2020 07:04
Copy link
Contributor

@vbauzys vbauzys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First scroll :)

vcd/resource_vcd_vapp_vm.go Show resolved Hide resolved
Copy link
Contributor

@vbauzys vbauzys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvirbalas
Copy link
Collaborator

lvirbalas commented Feb 17, 2020

Tried with bitnami-wordpress-5.2.2-3-linux-centos-7-x86_64 and vCD 10.0. Worked quite nicely! VM got created, IP retrieved and all the NAT+FW rules created automatically with the IP assigned to VM by DHCP.
vcd_vapp_vm.demo_vm_wordpress: Creation complete after 5m2s [id=urn:vcloud:vm:e5a82105-6c72-4dbd-b8b3-086ad1db0fea]

BTW:

  • It took about 2.5 mins to get the IP
  • My Org VDC network had no dhcp_pool section, so the IP was retrieved from guest tools

vcd_vapp_vm.demo_vm_wordpress: Still creating... [2m40s elapsed]
->
2020/02/17 15:36:36 [DEBUG] VM 'demo-vm-wordpress' NICs with indexes [0] did not all report their IPs using DHCP leases

vcd_vapp_vm.demo_vm_wordpress: Creation complete after 5m11s [id=urn:vcloud:vm:a2fc45a8-91a8-4365-9f50-8b44cba215ba]
->
grep "all reported" go-vcloud-director.log
2020/02/17 15:38:48 [TRACE] VM 'demo-vm-wordpress' NICs with indexes [0] all reported their IPs using guest tools

log.Printf("[DEBUG] [VM read] VM %s timed out waiting %d seconds "+
"to report DHCP IPs. You may want to increase 'network_dhcp_wait_seconds' or ensure "+
"your DHCP settings are correct.\n", vm.VM.Name, maxDhcpWaitSeconds)
_, _ = fmt.Fprintf(getTerraformStdout(), "WARNING: VM %s timed out waiting %d seconds "+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to hit the timeout. It's nice that we get the WARNING inside the console. However, the further plan ended up running with a bunch of errors, because NAT+FW rules tried to use empty IP. Do we have any way to protect against that or workaround in the .tf config itself?

vcd_vapp_vm.demo_vm_wordpress: Still creating... [4m50s elapsed]
WARNING: VM demo-vm-wordpress timed out waiting 60 seconds to report DHCP IPs. You may want to increase 'network_dhcp_wait_seconds' or ensure your DHCP settings are correct.
vcd_vapp_vm.demo_vm_wordpress: Creation complete after 4m52s [id=urn:vcloud:vm:5322ba15-9fc6-4e56-a193-6072f6c66129]
vcd_nsxv_dnat.demo_dnat_wordpress_ssh: Modifying... [id=196613]
vcd_nsxv_snat.web: Modifying... [id=196612]
vcd_nsxv_dnat.demo_dnat_wordpress_https: Modifying... [id=196614]
vcd_nsxv_firewall_rule.demo-nsxv-fw-rule-outbound: Modifying... [id=131079]
vcd_nsxv_dnat.demo_dnat_wordpress_https: Still modifying... [id=196614, 10s elapsed]
vcd_nsxv_firewall_rule.demo-nsxv-fw-rule-outbound: Still modifying... [id=131079, 10s elapsed]
vcd_nsxv_firewall_rule.demo-nsxv-fw-rule-outbound: Still modifying... [id=131079, 20s elapsed]
vcd_nsxv_firewall_rule.demo-nsxv-fw-rule-outbound: Modifications complete after 25s [id=131079]

Error: unable to update NAT rule with ID 196614: error while updating NAT rule : vShield Edge Invalid IP Address input /32 for field translatedAddress. (API error: 15012)

  on xendi.tf line 393, in resource "vcd_nsxv_dnat" "demo_dnat_wordpress_https":
 393: resource "vcd_nsxv_dnat" "demo_dnat_wordpress_https" {



Error: unable to update NAT rule with ID 196613: error while updating NAT rule : vShield Edge Invalid IP Address input /32 for field translatedAddress. (API error: 15012)

  on xendi.tf line 411, in resource "vcd_nsxv_dnat" "demo_dnat_wordpress_ssh":
 411: resource "vcd_nsxv_dnat" "demo_dnat_wordpress_ssh" {



Error: unable to update NAT rule with ID 196612: error while updating NAT rule : vShield Edge Invalid IP Address input /32 for field originalAddress. (API error: 15012)

  on xendi.tf line 430, in resource "vcd_nsxv_snat" "web":
 430: resource "vcd_nsxv_snat" "web" {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first thing that comes to my mind is actually returning an error during READ instead of warning. But there are a few problems and I believe this isn't good enough:

  • READ is part of CREATE/UPDATE so creation could fail because of too small value.
  • Because READ accesses the configuration from statefile, even changing the value to a higher one (say from 1 second to 300 seconds) would still at first try to READ with 1 second (because plan/apply triggers READ before UPDATE). If it fails immediately then the user never gets to 300 seconds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like current behavior is actually correct. Given that VM does get created even though without IP address, we shouldn't report an error. In a sense, we're missing a special state in the Terraform state machine to model such incomplete executions. All in all, let's leave current implementation intact.

@lvirbalas
Copy link
Collaborator

Tried one more time with Org VDC network with DHCP pool:

resource "vcd_network_routed" "demo_routed_net_web" {
  name         = "demo-net-web"
  edge_gateway = "${vcd_edgegateway.demo_gw.name}"

  gateway = "192.168.0.1"

  static_ip_pool {
    start_address = "192.168.0.2"
    end_address   = "192.168.0.100"
  }

  dhcp_pool {
    start_address = "192.168.0.101"
    end_address   = "192.168.0.132"
  }
}

Now the IP assigned to VM was from this pool and the log said it got it after lease check (as opposed to guest tools):

$ tail -f go-vcloud-director.log | grep IPs
<...>
2020/02/17 16:14:34 [DEBUG] VM 'demo-vm-wordpress' NICs with indexes [0] did not all report their IPs using guest tools
2020/02/17 16:14:34 [DEBUG] VM 'demo-vm-wordpress' NICs with indexes [0] did not all report their IPs using DHCP leases
<...>
2020/02/17 16:16:07 [DEBUG] VM 'demo-vm-wordpress' NICs with indexes [0] did not all report their IPs using guest tools
2020/02/17 16:16:07 [TRACE] VM 'demo-vm-wordpress' NICs with indexes [0] all reported their IPs after lease check

And IP from the pool inside Terraform:

Apply complete! Resources: 18 added, 0 changed, 0 destroyed.

Outputs:

demo_vm_wordpress_ip = 192.168.0.101

Copy link
Collaborator

@lvirbalas lvirbalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very practical feature, LTGM now!

Copy link
Contributor

@dataclouder dataclouder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Didainius Didainius merged commit a7172f4 into vmware:master Feb 19, 2020
@Didainius Didainius deleted the dhcp_lease_lookup_ip branch February 19, 2020 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vcd_vapp_vm - investigate options to ensure DHCP ip is reported in resource after provisioning
4 participants