Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading GKE kubernetes version resulted in error 429: quota exceeded #3782

Closed
vsimon opened this issue Jun 4, 2019 · 4 comments · Fixed by GoogleCloudPlatform/magic-modules#1888
Assignees
Labels

Comments

@vsimon
Copy link

vsimon commented Jun 4, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.12.0
provider.google: version = "~> 2.7"

Affected Resource(s)

  • google_container_cluster
  • google_container_node_pool

Terraform Configuration Files

resource "google_container_cluster" "default" {
  name    = var.name
  project = var.project

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true

  initial_node_count = 1

  min_master_version = var.kubernetes_version
}

resource "google_container_node_pool" "default" {
  name       = "default-pool"
  project    = var.project
  cluster    = google_container_cluster.default.name
  node_count = var.node_count
  version    = var.kubernetes_version

  timeouts {
    update = "20m"
  }

  node_config {
    machine_type = "n1-standard-2"

    metadata = {
      disable-legacy-endpoints = "true"
    }

    oauth_scopes = [
      # gke-default
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/service.management.readonly",
      "https://www.googleapis.com/auth/servicecontrol",
      "https://www.googleapis.com/auth/trace.append",
      # pubsub
      "https://www.googleapis.com/auth/pubsub",
    ]
  }
}

Debug Output

Panic Output

Expected Behavior

Terraform upgrades the cluster kubernetes version without error.

Actual Behavior

Got a googleapi: Error 429: Quota exceeded

Diff

  # module.cluster.google_container_cluster.default will be updated in-place                                                                                                                                                  
  ~ resource "google_container_cluster" "default" {                                                                                                                                                                           
...                                                                                                                                                                     
      ~ min_master_version       = "1.11.8-gke.6" -> "1.11.10-gke.4"
...



  # module.cluster.google_container_node_pool.default will be updated in-place
  ~ resource "google_container_node_pool" "default" {
...
      ~ version             = "1.11.8-gke.6" -> "1.11.10-gke.4"
...

Console output

module.cluster.google_container_cluster.default: Modifying... [id=cluster]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 10s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 20s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 30s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 40s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 50s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 1m0s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 1m10s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 1m20s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 1m30s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 1m40s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 1m50s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 2m0s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 2m10s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 2m20s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 2m30s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 2m40s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 2m50s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 3m0s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 3m10s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 3m20s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 3m30s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 3m40s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 3m50s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 4m0s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 4m10s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 4m20s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 4m30s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 4m40s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 4m50s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 5m0s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 5m10s elapsed]
module.cluster.google_container_cluster.default: Still modifying... [id=cluster, 5m20s elapsed]
module.cluster.google_container_cluster.default: Modifications complete after 5m29s [id=cluster]
module.cluster.google_container_node_pool.default: Modifying... [id=us-central1-c/cluster/default-pool]

Error: Error waiting for updating GKE node pool version: error while retrieving operation: googleapi: Error 429: Quota exceeded for quota metric 'container.googleapis.com/default' and limit 'defaultPerMinutePerProject' of service 'container.googleapis.com' for consumer 'project_number:764086051850'., rateLimitExceeded

  on ../modules/cluster/main.tf line 15, in resource "google_container_node_pool" "default":
  15: resource "google_container_node_pool" "default" {

Steps to Reproduce

  1. terraform apply

Important Factoids

Authenticated as a user instead of a service account.
Variables are name, project, node_count, kubernetes_version, I don't think their specific values affect this bug. node_count was 3. kubernetes_version was "1.11.10-gke.4".

The nodes were eventually upgraded, after this error was displayed. And a second terraform apply results in no changes.

References

  • #0000
@ghost ghost added the bug label Jun 4, 2019
@chrisst
Copy link
Contributor

chrisst commented Jun 5, 2019

It looks like you're getting rate limited for the container API. One possible solution is asking for a quota increase. I'll look at tuning the Operation polling to be a little less aggressive but the trade off would be slowing down Terraform for everybody else so I'd prefer to not back off too far.

@vsimon
Copy link
Author

vsimon commented Jun 5, 2019

OK thanks for looking in to tuning, yes it might be possible that the default polling rates might not match up with the default quotas for the container API.

@chrisst
Copy link
Contributor

chrisst commented Jun 5, 2019

A coworker helped me realize that your particular error can be solved by adding a retry to the call to poll for success. This probably should have been added a while ago :) but I think it means we don't have to tune polling because that will be a much less clean solution.

@ghost
Copy link

ghost commented Jul 7, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Jul 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants