Error deleting node pool: googleapi: Error 404: Not found #3896

dhawal55 · 2019-06-21T07:31:53Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.13

Affected Resource(s)

google_container_node_pool

Terraform Configuration Files

resource "google_container_cluster" "primary" {
  provider = "google-beta"
  count    = "${var.enabled ? 1 : 0}"
  name     = "${local.cluster}"
  location   = "${var.region}"

  initial_node_count = "1"
  network            = "${var.network}"
  subnetwork         = "${var.subnetwork}"

  timeouts {
    create = "${var.gke_cluster_update_timeout}"
    update = "${var.gke_cluster_update_timeout}"
    delete = "${var.gke_cluster_update_timeout}"
  }

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  pod_security_policy_config {
    enabled = true
  }

  maintenance_policy {
    daily_maintenance_window {
      start_time = "19:00" # 11am PDT
    }
  }

  # disable basic auth
  master_auth {
    username = ""
    password = ""
  }

  master_authorized_networks_config {
    cidr_blocks {
      cidr_block = "0.0.0.0/0"
    }
  }

  min_master_version = "${var.min_master_version}"
  remove_default_node_pool = true
}

module "general-green" {
  # required meta-params
  enabled = "${var.enabled && lookup(local.general_pool, "enabled", 0)}"
  source  = "./node_pool-green"

  # required identity
  name    = "general-green"
  cluster = "${local.cluster}"
  region  = "${var.region}"

  # optional
  auto_repair        = "${lookup(local.general_pool, "auto_repair",        "")}"
  auto_upgrade       = "${lookup(local.general_pool, "auto_upgrade",       "")}"
  disk_size_gb       = "${lookup(local.general_pool, "disk_size_gb",       "")}"
  initial_node_count = "${lookup(local.general_pool, "initial_node_count", "")}"
  local_ssd_count    = "${lookup(local.general_pool, "local_ssd_count",    "")}"
  machine_type       = "${lookup(local.general_pool, "machine_type",       "")}"
  max_node_count     = "${lookup(local.general_pool, "max_node_count",     "")}"
  min_cpu_platform   = "${lookup(local.general_pool, "min_cpu_platform",   "")}"
  min_node_count     = "${lookup(local.general_pool, "min_node_count",     "")}"
  preemptible        = "${lookup(local.general_pool, "preemptible",        "")}"
  node_pool_service_account = "${google_service_account.node_service_account.email}"

  wait_for_cluster = "${google_container_cluster.primary.name}"
}

node_pool-green:

resource "google_container_node_pool" "pool" {
  provider = "google-beta"
  count    = "${local.enabled ? 1 : 0}"

  name    = "${var.name}"
  cluster = "${var.cluster}"
  location  = "${var.region}"

  initial_node_count = "${local.initial_node_count}"

  autoscaling {
    max_node_count = "${local.max_node_count}"
    min_node_count = "${local.min_node_count}"
  }

  management {
    auto_repair  = "${local.auto_repair}"
    auto_upgrade = "${local.auto_upgrade}"
  }

  node_config {
    disk_size_gb     = "${local.disk_size_gb}"
    local_ssd_count  = "${local.local_ssd_count}"
    machine_type     = "${local.machine_type}"
    min_cpu_platform = "${local.min_cpu_platform}"
    preemptible      = "${local.preemptible}"
    service_account  = "${local.node_pool_service_account}"

    labels {
      cluster     = "${var.cluster}"
      environment = "${contains(list("prod"), terraform.workspace) ? "prod" : "nonprod"}"
      node-pool   = "${var.name}"
    }

    workload_metadata_config {
      node_metadata = "SECURE"
    }
  }
}

Debug Output

I don't have the debug output yet. It fails sometimes only

Panic Output

https://gist.github.com/dhawal55/8d8b6e4248cab6eb88e54e582cf09af5

Expected Behavior

Terraform should have deleted the node pool and not failed with "Not Found" error

Actual Behavior

The deletes node pool command goes through and the node pool is deleted but terraform tries again to delete the node pool and fails with "Not found" error. This doesn't occur everytime so probably there is some race condition where it fails.

Steps to Reproduce

terraform apply

Important Factoids

References

terraform destroy

The text was updated successfully, but these errors were encountered:

chrisst · 2019-07-03T20:43:37Z

I can't be completely sure without debug logs but it looks like you have a race condition when trying to delete the cluster and the pool. When the cluster is deleted before the pool has finished deleting the call to delete the pool is failing. If this is the problem then looking at your config it's probably because there isn't a direct relation (via interpolation) between the node pool and the cluster in the terraform config. You can confirm by checking the terraform graph and seeing the two resources are dependent.
If they have no dependency then you will need to add one in order to make the delete calls in the right order. Since this is part of a module execution you'll need to open a bug against the module or work with the maintainer there to get a fix.

dhawal55 · 2019-07-04T21:14:06Z

@chrisst Thank you for your help. I was hoping that wait_for_cluster = "${google_container_cluster.primary.name}" in my module will make it dependent on the google_container_cluster.primary resource. However, I don't see any dependency in the terraform graph. Do I have to refer to one of the exported attributes? Let me try that.

dhawal55 · 2019-07-06T20:33:13Z

I got the dependency working by adding a local-exec to refer to wait_for_cluster variable inside google_container_node_pool :

provisioner "local-exec" {
    command = "echo ${var.wait_for_cluster}"
  }

ghost · 2019-08-06T13:50:21Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

ghost added the bug label Jun 21, 2019

chrisst self-assigned this Jul 3, 2019

chrisst added the waiting-response label Jul 3, 2019

ghost removed the waiting-response label Jul 4, 2019

dhawal55 closed this as completed Jul 6, 2019

ghost locked and limited conversation to collaborators Aug 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error deleting node pool: googleapi: Error 404: Not found #3896

Error deleting node pool: googleapi: Error 404: Not found #3896

dhawal55 commented Jun 21, 2019

chrisst commented Jul 3, 2019

dhawal55 commented Jul 4, 2019

dhawal55 commented Jul 6, 2019 •

edited

Loading

ghost commented Aug 6, 2019

Error deleting node pool: googleapi: Error 404: Not found #3896

Error deleting node pool: googleapi: Error 404: Not found #3896

Comments

dhawal55 commented Jun 21, 2019

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

chrisst commented Jul 3, 2019

dhawal55 commented Jul 4, 2019

dhawal55 commented Jul 6, 2019 • edited Loading

ghost commented Aug 6, 2019

dhawal55 commented Jul 6, 2019 •

edited

Loading