Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't clear cluster from state if cleanup fails #2030

Merged

Conversation

chrisst
Copy link
Contributor

@chrisst chrisst commented Jul 9, 2019

Fixes hashicorp/terraform-provider-google#3875

Release Note for Downstream PRs (will be copied)

`google_container_cluster` keep clusters in state if they are created in an error state and don't get correctly cleaned up.

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician, I work on Magic Modules.
This PR seems not to have generated downstream PRs before, as of f94002446953a5130bfed62b64fed49b434f45ae.

Pull request statuses

No diff detected in terraform-google-conversion.
No diff detected in Ansible.
No diff detected in Inspec.

New Pull Requests

I built this PR into one or more new PRs on other repositories, and when those are closed, this PR will also be merged and closed.
depends: hashicorp/terraform-provider-google-beta#929
depends: hashicorp/terraform-provider-google#3995

Copy link
Member

@rileykarson rileykarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this combined with #2021, if I created a cluster in an error state and it failed to get deleted it will be persisted to state. Then on the next modification of the cluster, assuming it's a terminal error like in the original issue, I'd receive an error message because the cluster is in an error state. Is that your expectation too?

@chrisst
Copy link
Contributor Author

chrisst commented Jul 10, 2019

I would hope that the cluster should remain in state but be tainted so that on the next plan/apply it should mark it as force replace.
However I think that if we are erroring out on read for clusters that are in bad states we will not be able to fix them with terraform any more. Since plan/apply/etc will all call read first.

@chrisst chrisst force-pushed the gke-delete-failures branch from f940024 to 344fadd Compare July 10, 2019 20:29
@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician, I work on Magic Modules.
I see that this PR has already had some downstream PRs generated. Any open downstreams are already updated to your most recent commit, 344fadd.

Pull request statuses

terraform-provider-google-beta already has an open PR.
No diff detected in terraform-google-conversion.
terraform-provider-google already has an open PR.
No diff detected in Ansible.
No diff detected in Inspec.

New Pull Requests

I didn't open any new pull requests because of this PR.

@chrisst chrisst force-pushed the gke-delete-failures branch from 344fadd to 92c0467 Compare July 10, 2019 20:55
@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician, I work on Magic Modules.
I see that this PR has already had some downstream PRs generated. Any open downstreams are already updated to your most recent commit, 92c0467.

Pull request statuses

terraform-provider-google-beta already has an open PR.
No diff detected in terraform-google-conversion.
terraform-provider-google already has an open PR.
No diff detected in Ansible.
No diff detected in Inspec.

New Pull Requests

I didn't open any new pull requests because of this PR.

Copy link
Member

@rileykarson rileykarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind commenting with our expectations of what will happen when a user hits this particular case at L#1107? Since we'll see Terraform taint the resource but fail to be able to delete it, it's a little unintuitive and should provide useful context when we revisit this in the future.

If you're able to get debug logs for when users enter this state, I'd be curious if a retry on the delete call would help as well.

@chrisst
Copy link
Contributor Author

chrisst commented Jul 10, 2019

In reality it looks like this will leave the cluster in terraform state, but will error out on read and refresh. It's not ideal since somebody will need to clean up the dangling cluster before terraform will run again but it gets us 1 step closer to a better world IMO. This is because it will be explicit about the error'ed cluster rather than silently causing a dangling resource and erroring on the next apply due to a resource collision.

For what it's worth the "taint on failure to create" behavior only exists in 0.12 and is not the behavior in 0.11 where it will just be stored in state and will continue to just error on refresh.

chrisst and others added 2 commits July 11, 2019 03:34
Tracked submodules are build/terraform-beta build/terraform-mapper build/terraform build/ansible build/inspec.
@modular-magician modular-magician merged commit c3b5f29 into GoogleCloudPlatform:master Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

container_cluster resources error on create but leave dangling resources
4 participants