Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.77.0 google_container_cluster timeout 1m #9691

Closed
hawksight opened this issue Jul 31, 2021 · 7 comments
Closed

3.77.0 google_container_cluster timeout 1m #9691

hawksight opened this issue Jul 31, 2021 · 7 comments
Assignees
Labels
bug forward/review In review; remove label to forward service/container

Comments

@hawksight
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.0.3
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v3.77.0
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/template v2.2.0

Affected Resource(s)

  • google_container_cluster

Terraform Configuration Files

Using a module, but for brevity, build a cluster with timeouts, something like:

resource "google_container_cluster" "cluster" {
  name           = var.CLUSTER_NAME
  project        = var.META_PROJECT
  
  ....

  timeouts {
    create = "15m"
    delete = "10m"
    update = "10m"
  }
}

Debug Output

Sorry I don't have any debug or panic output at the moment but here is the error:

module.cluster.google_container_cluster.cluster: Still creating... [1m30s elapsed]
╷
│ Error: timeout while waiting for state to become 'success' (timeout: 1m0s)
│
│   with module.cluster.google_container_cluster.cluster,
│   on ../terraform-modules/gcloud-k8s/main.tf line 88, in resource "google_container_cluster" "cluster":
│   88: resource "google_container_cluster" "cluster" {
│

Planned fine, this happened at apply time.

Panic Output

N/A

Expected Behaviour

Expect a normal build as with the 3.76.0 provider.

Actual Behaviour

Error as above, it seems the timeouts have been ignored and somehow a minute timeout was put in place.
This meant the cluster did not create and the terraform run errored out.

Steps to Reproduce

  1. Ensure your using 1.0.3 terraform, and 3.77.0 provider
  2. terraform plan
  3. terraform apply
  4. See the error

Important Factoids

The build will work as normal on provider 3.76.0

References

See message(s) in slack: https://googlecloud-community.slack.com/archives/C1VNJ4EG7/p1627653699001500

@hawksight hawksight added the bug label Jul 31, 2021
@edwardmedia edwardmedia self-assigned this Aug 1, 2021
@edwardmedia
Copy link
Contributor

@hawksight I don't see any related changes in the release of v3.77.0. Are you able to repro with the resource only (not the module)?

@rileykarson
Copy link
Collaborator

We saw this failure in our CI as well, it seemed ephemeral. We got back 500 errors from GKE for a couple hours, and handled them less well than we could, I think the at the retry transport layer (but it's worth double checking logs). I saw no reason to suspect 3.77.0 explicitly- no related changes I am aware of.

@frytg
Copy link

frytg commented Aug 3, 2021

I've had a similar problem where last week our dev setup for the cluster worked perfectly, but this week when trying to replicate the exact same config to prod it failed with a timeout bug in TF after 25sec.

Helpful was inspecting the GCP Logs (filter by "your-cluster-name" resource.type="gke_cluster") where it logs both the full request and few seconds later an error:

// ...
status: {
  code: 2
  message: "Internal error."
}
// ...

So I tried posting the JSON request (can be found in the log field protoPayload.request.cluster) against the API to debug what was causing the problem. For me the two things that seemed to cause some headache were private_ipv6_google_access = "PRIVATE_IPV6_GOOGLE_ACCESS_BIDIRECTIONAL" and "dnsCacheConfig": { "enabled": true } inside addonsConfig.
For the TF config it worked to initially set ipv6 to PRIVATE_IPV6_GOOGLE_ACCESS_UNSPECIFIED and dns cache to enabled: false and then activating both of these in the cluster in a second step.
Not ideal but works for now.

It feels like this could be more of a bug on the GCP GKE side than for this TF provider setup.

(we're using TF Cloud @ v1.0.3)

@edwardmedia
Copy link
Contributor

@hawksight can you repro? Please post the debug log if you can.

@hawksight
Copy link
Author

@edwardmedia I tried to recreate this yesterday on 3.77.0 and everything worked in the following:

  • Calling the resource individually
  • Calling my existing module with the same config as before

I'm happy to put this down to a temporary glitch or issue on the google API maybe? Perhaps my timing was just unlucky.
It does seem to all work now, for me at least :)

Thank you for the really quick responses.

@edwardmedia
Copy link
Contributor

@hawksight I am glad it works. Closing the issue then

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 12, 2021
@github-actions github-actions bot added service/container forward/review In review; remove label to forward labels Jan 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug forward/review In review; remove label to forward service/container
Projects
None yet
Development

No branches or pull requests

4 participants