Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error deleting node pool: googleapi: Error 404: Not found #3896

Closed
dhawal55 opened this issue Jun 21, 2019 · 4 comments
Closed

Error deleting node pool: googleapi: Error 404: Not found #3896

dhawal55 opened this issue Jun 21, 2019 · 4 comments
Assignees
Labels

Comments

@dhawal55
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.13

Affected Resource(s)

  • google_container_node_pool

Terraform Configuration Files

resource "google_container_cluster" "primary" {
  provider = "google-beta"
  count    = "${var.enabled ? 1 : 0}"
  name     = "${local.cluster}"
  location   = "${var.region}"

  initial_node_count = "1"
  network            = "${var.network}"
  subnetwork         = "${var.subnetwork}"

  timeouts {
    create = "${var.gke_cluster_update_timeout}"
    update = "${var.gke_cluster_update_timeout}"
    delete = "${var.gke_cluster_update_timeout}"
  }

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  pod_security_policy_config {
    enabled = true
  }

  maintenance_policy {
    daily_maintenance_window {
      start_time = "19:00" # 11am PDT
    }
  }

  # disable basic auth
  master_auth {
    username = ""
    password = ""
  }

  master_authorized_networks_config {
    cidr_blocks {
      cidr_block = "0.0.0.0/0"
    }
  }

  min_master_version = "${var.min_master_version}"
  remove_default_node_pool = true
}

module "general-green" {
  # required meta-params
  enabled = "${var.enabled && lookup(local.general_pool, "enabled", 0)}"
  source  = "./node_pool-green"

  # required identity
  name    = "general-green"
  cluster = "${local.cluster}"
  region  = "${var.region}"

  # optional
  auto_repair        = "${lookup(local.general_pool, "auto_repair",        "")}"
  auto_upgrade       = "${lookup(local.general_pool, "auto_upgrade",       "")}"
  disk_size_gb       = "${lookup(local.general_pool, "disk_size_gb",       "")}"
  initial_node_count = "${lookup(local.general_pool, "initial_node_count", "")}"
  local_ssd_count    = "${lookup(local.general_pool, "local_ssd_count",    "")}"
  machine_type       = "${lookup(local.general_pool, "machine_type",       "")}"
  max_node_count     = "${lookup(local.general_pool, "max_node_count",     "")}"
  min_cpu_platform   = "${lookup(local.general_pool, "min_cpu_platform",   "")}"
  min_node_count     = "${lookup(local.general_pool, "min_node_count",     "")}"
  preemptible        = "${lookup(local.general_pool, "preemptible",        "")}"
  node_pool_service_account = "${google_service_account.node_service_account.email}"

  wait_for_cluster = "${google_container_cluster.primary.name}"
}

node_pool-green:

resource "google_container_node_pool" "pool" {
  provider = "google-beta"
  count    = "${local.enabled ? 1 : 0}"

  name    = "${var.name}"
  cluster = "${var.cluster}"
  location  = "${var.region}"

  initial_node_count = "${local.initial_node_count}"

  autoscaling {
    max_node_count = "${local.max_node_count}"
    min_node_count = "${local.min_node_count}"
  }

  management {
    auto_repair  = "${local.auto_repair}"
    auto_upgrade = "${local.auto_upgrade}"
  }

  node_config {
    disk_size_gb     = "${local.disk_size_gb}"
    local_ssd_count  = "${local.local_ssd_count}"
    machine_type     = "${local.machine_type}"
    min_cpu_platform = "${local.min_cpu_platform}"
    preemptible      = "${local.preemptible}"
    service_account  = "${local.node_pool_service_account}"

    labels {
      cluster     = "${var.cluster}"
      environment = "${contains(list("prod"), terraform.workspace) ? "prod" : "nonprod"}"
      node-pool   = "${var.name}"
    }

    workload_metadata_config {
      node_metadata = "SECURE"
    }
  }
}

Debug Output

I don't have the debug output yet. It fails sometimes only

Panic Output

https://gist.github.com/dhawal55/8d8b6e4248cab6eb88e54e582cf09af5

Expected Behavior

Terraform should have deleted the node pool and not failed with "Not Found" error

Actual Behavior

The deletes node pool command goes through and the node pool is deleted but terraform tries again to delete the node pool and fails with "Not found" error. This doesn't occur everytime so probably there is some race condition where it fails.

Steps to Reproduce

  1. terraform apply

Important Factoids

References

  • terraform destroy
@ghost ghost added the bug label Jun 21, 2019
@chrisst
Copy link
Contributor

chrisst commented Jul 3, 2019

I can't be completely sure without debug logs but it looks like you have a race condition when trying to delete the cluster and the pool. When the cluster is deleted before the pool has finished deleting the call to delete the pool is failing. If this is the problem then looking at your config it's probably because there isn't a direct relation (via interpolation) between the node pool and the cluster in the terraform config. You can confirm by checking the terraform graph and seeing the two resources are dependent.
If they have no dependency then you will need to add one in order to make the delete calls in the right order. Since this is part of a module execution you'll need to open a bug against the module or work with the maintainer there to get a fix.

@chrisst chrisst self-assigned this Jul 3, 2019
@dhawal55
Copy link
Author

dhawal55 commented Jul 4, 2019

@chrisst Thank you for your help. I was hoping that wait_for_cluster = "${google_container_cluster.primary.name}" in my module will make it dependent on the google_container_cluster.primary resource. However, I don't see any dependency in the terraform graph. Do I have to refer to one of the exported attributes? Let me try that.

@ghost ghost removed the waiting-response label Jul 4, 2019
@dhawal55
Copy link
Author

dhawal55 commented Jul 6, 2019

I got the dependency working by adding a local-exec to refer to wait_for_cluster variable inside google_container_node_pool :

provisioner "local-exec" {
    command = "echo ${var.wait_for_cluster}"
  }

@dhawal55 dhawal55 closed this as completed Jul 6, 2019
@ghost
Copy link

ghost commented Aug 6, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Aug 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants