Google container node pool, clobber duplicated resource and destroying wrong resource #9402

lucasteligioridis · 2021-06-21T04:56:30Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v0.14.6

Affected Resource(s)

google_container_node_pool

Terraform Configuration Files

resource "google_container_node_pool" "pool_1" {
  name = "pool_1"
  project = "my-project"
...
}

# Exact duplicate configuration from 'pool_1' except the terraform name has a `temp_` suffix.
# Keep in mind the node pool name was mistakenly left as the same.
resource "google_container_node_pool" "temp_pool_1" {
  name = "pool_1"
  project = "my-project"
...
}

Expected Behavior

When creating the new "temp" node pools after running a terraform apply, the expectation was that there would have been a conflict with the name.
Since the original pool_1 resource was already created and running.

There was a conflict error that was exactly like this:

Error: error creating NodePool: googleapi: Error 409: Already exists: projects/my-project/zones/my-zone/clusters/my-cluster/nodePools/pool_1., alreadyExists

  on temp_pool_1.tf line 1, in resource "google_container_node_pool" "temp_pool_1":
   1: resource "google_container_node_pool" "temp_pool_1" {

After this, I changed the name within the resource to temp_pool_1, so the resource would have looked like:

resource "google_container_node_pool" "temp_pool_1" {
  name = "temp_pool_1"
  project = "my-project"
...
}

After this, I would have expected new resources to be created with the above settings and would have been left at that.

Actual Behavior

The actual behavior was when the terraform apply was done after the first initial 409 error and the name change of the resource was made was that there was a diff when I expected a new resource that looked like this:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # temp_pool_1 is tainted, so must be replaced
-/+ resource "google_container_node_pool" "temp_api" {
      ~ id                  = "projects/my-project/locations/my-location/clusters/my-cluster/nodePools/pool_1" -> (known after apply)
      ~ name                = "pool_1" -> "temp_pool_1"

As you can see here, the id is the important aspect to probably look at. It looks like the state was saved previously in the terraform apply that failed with the 409 error and clobbered my existing resource. So now terraform thinks I have completely renamed that resource and will proceed to actually destroy pool_1 and replace with temp_pool_1 instead of creating the resources on the side.

Since I was in development I went ahead and went through the apply and that's exactly what happened.

After that successfully went through and did what I predicted, I saw that my original node pool, pool_1 was now destroyed. Running a terraform apply immediately after that now wants to create my pool_1 resources again from scratch.

I'm sure there is a race condition or a clobbering of the state resources where there shouldn't be in this specific case, but just glad I caught this one in development and the diff clearly explained an issue anyway. But this could definitely trip someone up in production or accidentally cause a wrong node pool to be destroyed unintentionally.

Look forward to hearing back :)

The text was updated successfully, but these errors were encountered:

edwardmedia · 2021-06-21T16:22:47Z

@lucasteligioridis this is an interesting use case. Because the way the gke & node pool resources are designed, it needs a series of calls in order to fix the problem caused by the initial apply in your case. You were just stopped in the middle of these steps. If you continue tf apply, it will fix the problem. I am not sure if there is a good way at the provider level we can shorten the steps.
@ScottSuarez what do you think?

ScottSuarez · 2021-06-22T00:52:52Z

I think we can do a prefetch on the resource to make sure it doesn't exist during create before tainting the resource. @edwardmedia

https://github.com/GoogleCloudPlatform/magic-modules/pull/4904/files

edwardmedia · 2021-06-22T13:31:46Z

@ScottSuarez good idea

lucasteligioridis · 2021-06-23T23:29:15Z

Thanks for the quick turn around team ❤️ 👏🏼

github-actions · 2021-07-24T02:05:56Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

lucasteligioridis added the bug label Jun 21, 2021

edwardmedia self-assigned this Jun 21, 2021

edwardmedia added the waiting-response label Jun 21, 2021

github-actions bot removed the waiting-response label Jun 21, 2021

edwardmedia added the waiting-response label Jun 21, 2021

edwardmedia assigned ScottSuarez Jun 21, 2021

ScottSuarez mentioned this issue Jun 22, 2021

container_node_pool : only store in state on create if actually creating the resource GoogleCloudPlatform/magic-modules#4904

Merged

5 tasks

github-actions bot removed the waiting-response label Jun 22, 2021

ScottSuarez closed this as completed in GoogleCloudPlatform/magic-modules#4904 Jun 23, 2021

This was referenced Jun 23, 2021

container_node_pool : only store in state on create if actually creating the resource #9424

Merged

container_node_pool : only store in state on create if actually creating the resource hashicorp/terraform-provider-google-beta#3378

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google container node pool, clobber duplicated resource and destroying wrong resource #9402

Google container node pool, clobber duplicated resource and destroying wrong resource #9402

lucasteligioridis commented Jun 21, 2021

edwardmedia commented Jun 21, 2021

ScottSuarez commented Jun 22, 2021

edwardmedia commented Jun 22, 2021

lucasteligioridis commented Jun 23, 2021

github-actions bot commented Jul 24, 2021

Google container node pool, clobber duplicated resource and destroying wrong resource #9402

Google container node pool, clobber duplicated resource and destroying wrong resource #9402

Comments

lucasteligioridis commented Jun 21, 2021

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Expected Behavior

Actual Behavior

edwardmedia commented Jun 21, 2021

ScottSuarez commented Jun 22, 2021

edwardmedia commented Jun 22, 2021

lucasteligioridis commented Jun 23, 2021

github-actions bot commented Jul 24, 2021