Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_pools don't support regional clusters #1300

Closed
Stono opened this issue Apr 6, 2018 · 6 comments
Closed

node_pools don't support regional clusters #1300

Stono opened this issue Apr 6, 2018 · 6 comments
Labels

Comments

@Stono
Copy link

Stono commented Apr 6, 2018

Hey,
We're trying to rebuild our clusters using the new regional clusters added in 1.9.0, however we use google_container_node_pool to add custom pools to our cluster.

The node pools however do not work against regional clusters:

Error: Error applying plan:

1 error(s) occurred:

* module.kubernetes.google_container_node_pool.custom-pool: 1 error(s) occurred:

* google_container_node_pool.custom-pool: Error creating NodePool: googleapi: Error 400: v1 API cannot be used to access GKE regional clusters. See https://goo.gl/uHKp3k for more information., badRequest

Proposal
I think the google_container_node_pool resource should be changed with:

  • A region parameter added which has the description of "The location of the kubernetes regional master, for use with regional clusters"
  • The zone parameter description changed to "The location of the kubernetes zonal master, for use with zonal clusters"
  • The parameters should be mutually exclusive, with zone using the existing code process and region using the newer v1beta1 api which handles regional clusters (https://cloud.google.com/kubernetes-engine/docs/reference/api-organization#beta)

Associated issue: #829
Associated PR for master regional clusters: #1181

@Stono
Copy link
Author

Stono commented Apr 6, 2018

@danawillow @ashish-amarnath I tried!

@darrenhaken
Copy link
Contributor

I'll take this up @Stono

@Stono
Copy link
Author

Stono commented Apr 6, 2018

Thank you oh random stranger @darrenhaken who definitely doesn't work in my office :)

@morgante
Copy link

I'm also encountering this bug.

nat-henderson pushed a commit that referenced this issue Apr 25, 2018
This PR also switched us to using the beta API in all cases, and that had a side effect which is worth noting, note included here for posterity.

=====
The problem is, we add a GPU, and as per the docs, GKE adds a taint to
the node pool saying "don't schedule here unless you tolerate GPUs",
which is pretty sensible.

Terraform doesn't know about that, because it didn't ask for the taint
to be added. So after apply, on refresh, it sees the state of the world
(1 taint) and the state of the config (0 taints) and wants to set the
world equal to the config. This introduces a diff, which makes the test
fail - tests fail if there's a diff after they run.

Taints are a beta feature, though. :) And since the config doesn't
contain any taints, terraform didn't see any beta features in that node
pool ... so it used to send the request to the v1 API. And since the v1
API didn't return anything about taints (since they're a beta feature),
terraform happily checked the state of the world (0 taints I know about)
vs the config (0 taints), and all was well.

This PR makes every node pool refresh request hit the beta API. So now
terraform finds out about the taints (which were always there) and the
test fails (which it always should have done).

The solution is probably to write a little bit of code which suppresses
the report of the diff of any taint with value 'nvidia.com/gpu', but
only if GPUs are enabled. I think that's something that can be done.
@Stono
Copy link
Author

Stono commented May 9, 2018

This works now so closing the issue :-) ta

@Stono Stono closed this as completed May 9, 2018
chrisst pushed a commit to chrisst/terraform-provider-google that referenced this issue Nov 9, 2018
…#1320)

This PR also switched us to using the beta API in all cases, and that had a side effect which is worth noting, note included here for posterity.

=====
The problem is, we add a GPU, and as per the docs, GKE adds a taint to
the node pool saying "don't schedule here unless you tolerate GPUs",
which is pretty sensible.

Terraform doesn't know about that, because it didn't ask for the taint
to be added. So after apply, on refresh, it sees the state of the world
(1 taint) and the state of the config (0 taints) and wants to set the
world equal to the config. This introduces a diff, which makes the test
fail - tests fail if there's a diff after they run.

Taints are a beta feature, though. :) And since the config doesn't
contain any taints, terraform didn't see any beta features in that node
pool ... so it used to send the request to the v1 API. And since the v1
API didn't return anything about taints (since they're a beta feature),
terraform happily checked the state of the world (0 taints I know about)
vs the config (0 taints), and all was well.

This PR makes every node pool refresh request hit the beta API. So now
terraform finds out about the taints (which were always there) and the
test fails (which it always should have done).

The solution is probably to write a little bit of code which suppresses
the report of the diff of any taint with value 'nvidia.com/gpu', but
only if GPUs are enabled. I think that's something that can be done.
@ghost
Copy link

ghost commented Nov 18, 2018

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants