Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear when to use google_container_cluster or google_container_node_pool #475

Closed
paddycarver opened this issue Sep 27, 2017 · 8 comments · Fixed by GoogleCloudPlatform/magic-modules#1329

Comments

@paddycarver
Copy link
Contributor

The google_container_cluster resource has a node_pool field that can be used to define the node pools of the cluster. But there's also a google_container_node_pool resource that can also define node pools in a cluster. But there's no guidance on when/how to use these, whether they should be used together, or why they're separated in the first place.

@paddycarver
Copy link
Contributor Author

I think we can probably resolve this by updating the documentation pages for both these resources to explain that google_container_cluster should manage the node pools when you have a single, authoritative list of node pools--this should, generally, be the common case. However, google_container_node_pool should be used when you want to distribute authority on node pool configuration in a cluster, e.g., when an infrastructure team manages the cluster, and then each developer team manages their own node pools, sometimes with different requirements. We should probably also note that google_container_node_pool won't remove node pools that are added outside of Terraform. And we should show how to use lifecylcle.ignore_changes to make google_container_cluster work with google_container_node_pool.

@rochdev
Copy link

rochdev commented Nov 8, 2017

@paddycarver Just to confirm my understanding, in order to manage node pools using gcloud_container_node_pool the following steps have to be taken:

  1. Create a google_container_cluster which will create a default pool, and use lifecycle.ignore_changes to ignore the pool.
  2. Create a null_resource that will delete that pool
  3. Create any number of pools using google_container_node_pool

My reasoning is that by using lifecycle.ignore_changes, no changes can ever be done to that node pool, so it should simply be removed and replaced with a google_container_node_pool.

Are there other ways to manage updateable node pools that can be managed externally?

@matti
Copy link

matti commented Jan 18, 2018

I think I'm finally successful with:

resource "google_container_cluster" "stateful" {
  lifecycle {
    ignore_changes = ["node_pool"]
  }
  node_pool = {}
}

This will create an extra node pool (I don't understand how null_resource can be used to delete that (that sounds awful)), but now it works as expected. If this is the correct way to go, this is the example that should be in the docs.

**EDIT: not so sure anymore, I'm giving up with separate google_container_node_pools and just in-lining them to my google_container_cluster (it's a massive list) -- I don't understand how this went this complex. There's clearly something wrong with this design/docs.

***EDIT2: well, that prevents me from removing a node pool in the future without recreating the cluster.

@mattdodge
Copy link

mattdodge commented Jan 23, 2018

I may be in the minority for this, but I do think that in production you should almost always be managing your cluster and your node pool separately. Primarily because of @matti's second edit, that any changes to the node pool will require the entire cluster to go down and come back up, no zero-downtime deploys are possible. That means you're left with that pesky default node pool though. Terraform is kind of in a tough spot here I think, I think the fault really lies with GCP's inability to launch a cluster without any node pool (despite the fact that you can delete all of the node pools?).

Anyways, I posted it here too, but here's an example of how to use a null_resource to delete the default node pool after the cluster is created.

resource "google_container_cluster" "cluster" {
  name = "my-cluster"
  zone = "us-west1-a"
  initial_node_count = 1
}

resource "google_container_node_pool" "pool" {
  name = "my-cluster-nodes"
  node_count = "3"
  zone = "us-west1-a"
  cluster = "${google_container_cluster.cluster.name}"
  node_config {
    machine_type = "n1-standard-1"
  }
  # Delete the default node pool before spinning this one up
  depends_on = ["null_resource.default_cluster_deleter"]
}

resource "null_resource" "default_cluster_deleter" {
  provisioner "local-exec" {
    command = <<EOF
      gcloud container node-pools \
	--project my-project \
	--quiet \
	delete default-pool \
	--cluster ${google_container_cluster.cluster.name}
EOF
  }
}

@roobert
Copy link

roobert commented Apr 23, 2018

For anyone else who finds this issue it looks like there is now the remove_default_node_pool parameter (#1245).

The following config will create a cluster (cluster0) with two attached node pools (nodepool{0,1}) and no default node-pool):

"resource" "google_container_cluster" "cluster0" {
  "name" = "cluster0"
  "zone" = "europe-west1-b"
  "remove_default_node_pool" = true
  "additional_zones" = ["europe-west1-c", "europe-west1-d"]
  "node_pool" = {
    "name" = "default-pool"
  }
  "lifecycle" = {
    "ignore_changes" = ["node_pool"]
  }
}

"resource" "google_container_node_pool" "nodepool0" {
  "name" = "nodepool0"
  "cluster" = "cluster0"
  "node_count" = 1
  "zone" = "europe-west1-b"
  "depends_on" = ["google_container_cluster.cluster0"]
  "node_config" = {
    "machine_type" = "f1-micro"
  }
}

"resource" "google_container_node_pool" "nodepool1" {
  "name" = "nodepool1"
  "cluster" = "cluster0"
  "node_count" = 3
  "zone" = "europe-west1-d"
  "depends_on" = ["google_container_cluster.cluster0"]
}

Updating node pool properties and adding/deleting node pools to the cluster seems to behave as expected.

I think this issue is probably still valid as it's not really clear from the docs whether this is the preferred method for managing node pools or not.

@michaelbannister
Copy link
Contributor

According to the docs, GKE chooses the master VM’s size based on the initial number of nodes, so if you’re going to have a large cluster, you may want that initial number to be bigger than 1, even though you’re going to delete it!
https://kubernetes.io/docs/admin/cluster-large/#size-of-master-and-master-components
If anyone knows this to be outdated, I’d love to hear it :)

@rochdev
Copy link

rochdev commented Apr 25, 2018

@michaelbannister This only seems to apply when using the kube-up.sh script to manage the masters yourself on GCE. With GKE however, the masters are managed by Google, in which case it becomes their responsibility to deal with scaling to support your nodes.

@ghost
Copy link

ghost commented Mar 4, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants