Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set resource id before polling operation and re-create failed deployments #59

Merged
merged 2 commits into from
Feb 9, 2021

Conversation

alexmunda
Copy link
Contributor

@alexmunda alexmunda commented Feb 4, 2021

Set the ID of resources that need to poll for an operation before polling the operation. This prevents the case where a resource is created in the DB but the async resource operation fails and Terraform has no state entry so it thinks the resource does not exist (even though it does and is in a failed state), so the user can't run terraform destroy to clean up the failed resource.

This PR also sets the Terraform ID of resources in a failed state to "" during read. This allows HCP resources (ie a Consul cluster) to be automatically replaced if they failed to provision in the first place. Without this logic, the user would need to look at the HCP UI to determine their cluster/HVN/peering is in a FAILED state, and delete/recreate manually using Terraform (or the UI and remove from state manually).

Failed provision:

❯ terraform apply --auto-approve
hcp_hvn.example_hvn: Refreshing state... [id=/project/51545062-b7a8-4066-8e32-ca283cb147fc/hashicorp.network.hvn/hcp-tf-example-hvn]
hcp_consul_cluster.example_consul_cluster: Creating...
hcp_consul_cluster.example_consul_cluster: Still creating... [10s elapsed]
hcp_consul_cluster.example_consul_cluster: Still creating... [20s elapsed]

Error: unable to create Consul cluster (hcp-tf-example-consul-cluster): create Consul cluster operation (7920f945-933d-4bcc-bdfa-b151e6e50e8b) failed [code=3, message=failed to deploy consul cluster: failed to generate Consul config file: failed to create consul config generator: invalid configuration options: 1 error occurred:
	* rpc error: code = InvalidArgument desc = datacenter cannot be "HCP-TF-EXAMPLE-CONSUL-CLUSTER". Please use only [a-z0-9-_].

]

On next terraform plan:

❯ tf plan
hcp_hvn.example_hvn: Refreshing state... [id=/project/51545062-b7a8-4066-8e32-ca283cb147fc/hashicorp.network.hvn/hcp-tf-example-hvn]
hcp_consul_cluster.example_consul_cluster: Refreshing state... [id=/project/51545062-b7a8-4066-8e32-ca283cb147fc/hashicorp.consul.cluster/hcp-tf-example-consul-cluster]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # hcp_consul_cluster.example_consul_cluster must be replaced
-/+ resource "hcp_consul_cluster" "example_consul_cluster" {
      + cloud_provider                = (known after apply)
      + cluster_id                    = "hcp-tf-example-consul-cluster"
      + connect_enabled               = true
      + consul_automatic_upgrades     = (known after apply)
      + consul_ca_file                = (known after apply)
      + consul_config_file            = (known after apply)
      + consul_private_endpoint_url   = (known after apply)
      + consul_public_endpoint_url    = (known after apply)
      + consul_root_token_accessor_id = (known after apply)
      + consul_root_token_secret_id   = (sensitive value)
      + consul_snapshot_interval      = (known after apply)
      + consul_snapshot_retention     = (known after apply)
      + consul_version                = (known after apply)
      + datacenter                    = (known after apply)
      + hvn_id                        = "hcp-tf-example-hvn"
      + id                            = (known after apply)
      + organization_id               = (known after apply)
      + project_id                    = (known after apply)
      + public_endpoint               = false
      + region                        = (known after apply)
      + scale                         = (known after apply)
      + tier                          = "development"
    }

Plan: 1 to add, 0 to change, 1 to destroy.

I really wish we could return a warning diag instead of just logging and return nil but 🤷

Copy link
Contributor

@roaks3 roaks3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely makes sense, but also wondering how things will behave on the next run (assuming there was a failure), since we wouldn't want Terraform to assume that the resource is ready to be used. Perhaps this is a common pattern though? Or maybe this is a place where a state field would help us?

@alexmunda
Copy link
Contributor Author

@roaks3 Yes exactly! I was thinking about how to make the subsequent run obvious that there was a failure and we came to the same conclusion 😄. I am adding a state check on read right now to remove failed resources.

@alexmunda
Copy link
Contributor Author

If hashicorp/terraform-plugin-sdk#657 (comment) gets resolved in the future, we should return a warning diag.

@alexmunda
Copy link
Contributor Author

If we add a state field on these resources, it would show a diff when it shows the diff on the re-create. Thoughts?

@alexmunda alexmunda changed the title Set resource id before polling operation (in case of operation failure) Set resource id before polling operation and re-create failed deployments Feb 5, 2021
@xargs-P
Copy link
Contributor

xargs-P commented Feb 5, 2021

If we add a state field on these resources, it would show a diff when it shows the diff on the re-create. Thoughts?

I like this idea. So the users are aware there is an issue...and why (on the next run) TF wants to replace it. Do we know how AWS ec2 instances are treated in a similar situation? 🤔

@alexmunda
Copy link
Contributor Author

@xargs-P
Copy link
Contributor

xargs-P commented Feb 5, 2021

ah ok, so they expose the instance state to TF. 👌

@alexmunda
Copy link
Contributor Author

alexmunda commented Feb 5, 2021

ah ok, so they expose the instance state to TF. 👌

@xargs-P It doesn't appear to be on the resource schema, (which is currently what we have now) but they do check it on the response so this PR implements a similar behavior.

@alexmunda
Copy link
Contributor Author

If we add a state field on these resources, it would show a diff when it shows the diff on the re-create. Thoughts?

Turns out this isn't true. Since state is Computed: true, it will always show (known after apply)

Copy link
Contributor

@bcmdarroch bcmdarroch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good!
I tried to test this out locally but unfortunately ran into some issues running my local provider. 😩 Hoping to investigate next week!

@xargs-P
Copy link
Contributor

xargs-P commented Feb 6, 2021

If we add a state field on these resources, it would show a diff when it shows the diff on the re-create. Thoughts?

Turns out this isn't true. Since state is Computed: true, it will always show (known after apply)

🤔 Would we need to mirror a solution closer to AWS's?

@bcmdarroch
Copy link
Contributor

Already approved - but now I can say I've exercised this locally too 😎

@alexmunda alexmunda merged commit b04ab5e into main Feb 9, 2021
@alexmunda alexmunda deleted the set-id-before-wait branch February 9, 2021 17:18
aidan-mundy pushed a commit that referenced this pull request Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants