Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE still cannot be reliably spin up using Terraform 0.12.3 (without defining node pool) #4391

Closed
Leectan opened this issue Sep 3, 2019 · 5 comments
Assignees
Labels

Comments

@Leectan
Copy link

Leectan commented Sep 3, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.12.3

  • provider.google v2.13.0
  • provider.google-beta v2.13.0

Affected Resource(s)

  • google_container_cluster
  • google_container_node_pool

Terraform Configuration Files

module "gke-sbp-cluster" {
// providers = {
// google = "google-beta"
// }
source = "./modules/gke-base-infra"
cluster_name = var.cluster_name
project_id = var.project_id
vpc_network = google_compute_network.vpc_network.name
vpc_subnetwork = google_compute_subnetwork.vpc_subnetwork.name
location = var.location
master_ipv4_cidr_block = var.master_ipv4_cidr_block
use_ip_aliases = true
// logging_service = var.logging_service
// monitoring_service = var.monitoring_service
// daily_maintenance_window_start_time = var.daily_maintenance_window_start_time
enable_private_endpoint = false
// cluster_client_certificate = true
// enable_kubernetes_dashboard = true
enable_private_nodes = true
enable_network_policy = true
// http_load_balancing = false
master_authorized_networks_cidr_blocks = var.master_authorized_networks_cidr_blocks
cluster_secondary_range_name = var.cluster_secondary_range_name
// cluster_autoscaling = var.cluster_autoscaling
// enable_legacy_abac = false
// pod_security_policy_config = true
// intranode_visibility = true

// #node pool configs below
// node_pool_name = "gke-node_pool-1"
// initial_node_count = 4
//// max_node_count = 4
//// min_node_count = 2
// node_auto_repair = true
// node_auto_upgrade = true
// node_disk_size_gb = 100
// node_image_type = "COS"
// node_machine_type = "n1-standard-1"
// oauth_scopes = [
// "https://www.googleapis.com/auth/logging.write",
// "https://www.googleapis.com/auth/monitoring",
// "https://www.googleapis.com/auth/devstorage.read_only"
//
// ]
// node_preemtible = false
// gke_service_account = module.gke_service_account.email
//// node_cluster_auto_scaling = true
//// cpu_maximum = 4
//// cpu_minimum = 2
//// memory_maximum = 8
//// memory_minimum = 4
}

Debug Output

Panic Output

Expected Behavior

I expect the process would run to finish.

Actual Behavior

Error: Error waiting for creating GKE cluster: All cluster resources were brought up, but the cluster API is reporting that: 4 nodes out of 4 are unhealthy.

Steps to Reproduce

terraform init.
terraform apply

  1. terraform apply

Important Factoids

References

@ghost ghost added the bug label Sep 3, 2019
@Leectan Leectan changed the title GKE still cannot be reliably spin up using Terraform 0.12.3 GKE still cannot be reliably spin up using Terraform 0.12.3 (without defining node pool) Sep 3, 2019
@rileykarson rileykarson self-assigned this Sep 3, 2019
@rileykarson
Copy link
Collaborator

It's hard to say why this is failing, especially when using a module. Are you able to replicate this with a google_container_cluster resource directly, or share debug logs?

@Leectan
Copy link
Author

Leectan commented Sep 3, 2019

created brand new resources with minimal requirements, it seems to fail at remove_default_node_pool = true declare with node_pool resource at the same time.

provider "google" {
  version = "2.14.0"
  region = "us-east1"
  zone = "us-east1-b"
}

provider "google-beta" {
  version = "2.14.0"
  region = "us-east1"
  zone = "us-east1-b"
}

resource "google_compute_network" "vpc_network" {
  name = var.vpc_network_name
  auto_create_subnetworks = false
  routing_mode = "REGIONAL"
  delete_default_routes_on_create = true
  project = var.project_id
}

resource "google_compute_subnetwork" "vpc_subnetwork" {
  ip_cidr_range = var.vpc_subnetwork_cidr_range
  project = var.project_id
  name = var.vpc_subnetwork_name
  network = google_compute_network.vpc_network.self_link
  private_ip_google_access = true
  secondary_ip_range {
    ip_cidr_range = var.network_secondary_range
    range_name = var.network_secondary_range_name
  }
  enable_flow_logs = true
}

resource "google_container_cluster" "my-gke-cluster" {
  provider = "google-beta"
  name = "my-gke-cluster"
  location = "us-east1"
  project = var.project_id
  initial_node_count = 2
  cluster_autoscaling {
    enabled = true
    resource_limits {
      resource_type = "cpu"
        maximum = 8
        minimum = 2
    }
    resource_limits {
      resource_type = "memory"
      maximum = 16
      minimum = 4
    }
  }

  remove_default_node_pool = true


}

resource "google_container_node_pool" "node_pool_1" {
  provider = "google-beta"
  name = "gke-node-pools"
  project = var.project_id
  cluster = google_container_cluster.my-gke-cluster.name
  management {
    auto_repair = true
    auto_upgrade = true
  }

  autoscaling {
    max_node_count = 10
    min_node_count = 2
  }
  depends_on = [google_container_cluster.my-gke-cluster]
}

Cluster spin up fine without the remove_default_node_pool, but then it doesn't accept the node_pool value I specified in the resource block. It just created the default node pool.

But if specified remove_default_node_pool = true with node_pool resource, cluster will spin up, then deleting the default node pool, but never spin up the node_pool resource block. Been struggling with this issue for last few weeks now....

@ghost ghost removed the waiting-response label Sep 3, 2019
@rileykarson
Copy link
Collaborator

This may be related to #4024, someone in that issue saw errors under similar circumstances.

Can you share debug logs? I've never seen an issue like this, and can't reproduce it.

@rileykarson
Copy link
Collaborator

Closing as stale.

@ghost
Copy link

ghost commented Mar 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 29, 2020
@github-actions github-actions bot added service/container forward/review In review; remove label to forward labels Jan 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants