Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_container_cluster.cluster: unexpected EOF #4305

Closed
rohitrd0987 opened this issue Aug 20, 2019 · 11 comments
Closed

google_container_cluster.cluster: unexpected EOF #4305

rohitrd0987 opened this issue Aug 20, 2019 · 11 comments
Labels

Comments

@rohitrd0987
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.13

  • provider.google v1.19.0
  • provider.google-beta v2.3.0

I noticed this not working from August 18,2019.

Affected Resource(s)

Creating a GCP Subnet in an exsisting VPC and a GKE private Cluster with Private endpoint.

  • google_XXXXX

Terraform Configuration Files

Terraform code from the below repo iis used which was found in Terraform Registry and made appropriate changes which doesnot require a NAT, VPC,Cloud Router creation for my usecase.
https://github.com/dansible/terraform-google-gke-infra

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://www.hashicorp.com/security
# If reproducing the bug involves modifying the config file (e.g., apply a config,
# change a value, apply the config again, see the bug) then please include both the
# version of the config before the change, and the version of the config after the change.

Debug Output

https://gist.github.com/rohitrd0987/85ebcaf016e128066b7b53c454a3871d

Panic Output

https://gist.github.com/rohitrd0987/2a044c693f5b9a9fdc9526890413c6e9

Expected Behavior

This should have created a subnet ina existing VPC and create a GKE private cluster with private endpoint

Actual Behavior

Erroring out throwing a terraform crash message

Steps to Reproduce

Use the below repo and make sure that you change the public endpoint to private and run terraform apply with appropriate configuration.
https://github.com/dansible/terraform-google-gke-infra
In my usecase I don't need VPC, SA, Cloud NAT.

  1. terraform apply

Important Factoids

N/A

References

  • #0000
@ghost ghost added the bug label Aug 20, 2019
@Chrisjw42
Copy link

Chrisjw42 commented Aug 21, 2019

Hi, I have also been experiencing this.

After running terraform, the resource in question is successfully created, but terraform errors out:

Error: Error applying plan:
1 error(s) occurred:

  • module.xxyy.google_container_cluster.yyxx: 1 error(s) occurred:

  • google_container_cluster.yyxx: unexpected EOF

Please help!

Please note: my google provider version is: 2.7.0,
tf version is: 11.14

More detail

This error is thrown upon completion of a created container_cluster, the same error is being thrown across multiple projects, and it happens right after the default nodepool deletion is finished.

Actually, upon testing, the error is thrown upon completion whetehr or not the defaul nodepool is removed.

This is reproducibility happening with newly named clusters, defined as below:

resource "google_container_node_pool" "non_preemptible_pool" {
    name       = "non-preemptible-pool"
    project    = "${module.gcp_project.project_id}"
    cluster    = "${google_container_cluster.compute_cluster.name}"
    location       = "australia-southeast1-a"
    initial_node_count = 1
    timeouts {
        create = "30m"
        update = "20m"
    }

    node_config {
        machine_type = "n1-highmem-4" # 4 vCPUs & 26 Gb memory
        # Jupyter and dask-scheduler do not want preemptible nodes, but the dask-workers do
        preemptible  = false
        
        labels {
            type = "non-preemptible"
        }

        # FIXME: oauth_scopes are deprecated and should be replaced by a proper service_account
        oauth_scopes = [
            "compute-rw",     # required for persistent storage
            "storage-rw",     # required to access GCR & Writing to bucket
            "logging-write",
            "monitoring",
            "service-control",
            "service-management"
        ]
    }
    management {
        auto_repair  = true
        auto_upgrade = true
    }
    autoscaling {
        min_node_count = 0
        max_node_count = "${var.max_node_count}"
    }

    # Interperet the boolean as an int, i.e. DO NOT create a cluster if this the var is false
    count = "${var.create_cluster}" 
    # depends_on = ["kubernetes_cluster_role_binding.tf_role_binding"]
    depends_on = ["google_container_cluster.compute_cluster"]
}

@robertb724
Copy link

I have been seeing this since Monday as well. Looks like it creates the cluster fine but crashes when it moves onto the node pools

google_container_cluster.cluster: Still creating... (2m30s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* google_container_cluster.cluster: 1 error(s) occurred:

* google_container_cluster.cluster: unexpected EOF

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.


panic: runtime error: invalid memory address or nil pointer dereference
2019-08-21T13:47:08.845Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1584819]
2019-08-21T13:47:08.845Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: 
2019-08-21T13:47:08.845Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: goroutine 25 [running]:
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: github.com/terraform-providers/terraform-provider-google-beta/google-beta.flattenMaintenancePolicy(...)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-google-beta/google-beta/resource_container_cluster.go:2047
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: github.com/terraform-providers/terraform-provider-google-beta/google-beta.resourceContainerClusterRead(0xc0003ab110, 0x1976800, 0xc0003ba820, 0x17, 0xc000186a10)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-google-beta/google-beta/resource_container_cluster.go:909 +0x22d9
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: github.com/terraform-providers/terraform-provider-google-beta/google-beta.resourceContainerClusterCreate(0xc0003ab110, 0x1976800, 0xc0003ba820, 0x0, 0x0)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-google-beta/google-beta/resource_container_cluster.go:859 +0x18f5
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: github.com/terraform-providers/terraform-provider-google-beta/vendor/github.com/hashicorp/terraform/helper/schema.(*Resource).Apply(0xc00035d1f0, 0xc0000ae910, 0xc000168580, 0x1976800, 0xc0003ba820, 0x40ba01, 0xc000371b80, 0x4c1cfc)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-google-beta/vendor/github.com/hashicorp/terraform/helper/schema/resource.go:225 +0x351
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: github.com/terraform-providers/terraform-provider-google-beta/vendor/github.com/hashicorp/terraform/helper/schema.(*Provider).Apply(0xc00039eee0, 0xc0000ae8c0, 0xc0000ae910, 0xc000168580, 0xc00007aa80, 0x18, 0x7f4458319d80)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-google-beta/vendor/github.com/hashicorp/terraform/helper/schema/provider.go:283 +0x9c
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: github.com/terraform-providers/terraform-provider-google-beta/vendor/github.com/hashicorp/terraform/plugin.(*ResourceProviderServer).Apply(0xc0004156e0, 0xc000168160, 0xc0002cb050, 0x0, 0x0)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-google-beta/vendor/github.com/hashicorp/terraform/plugin/resource_provider.go:527 +0x57
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: reflect.Value.call(0xc0000a2c00, 0xc0000a00d8, 0x13, 0x1cbbf00, 0x4, 0xc000371f18, 0x3, 0x3, 0xc00014e0c0, 0x4131d7, ...)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/goenv/versions/1.11.5/src/reflect/value.go:447 +0x454
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: reflect.Value.Call(0xc0000a2c00, 0xc0000a00d8, 0x13, 0xc000337718, 0x3, 0x3, 0x12a05f200, 0xc000337710, 0xc0003377b8)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/goenv/versions/1.11.5/src/reflect/value.go:308 +0xa4
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: net/rpc.(*service).call(0xc00017e1c0, 0xc0000ae690, 0xc000186188, 0xc0001861b0, 0xc000128000, 0xc0001896a0, 0x176aca0, 0xc000168160, 0x16, 0x176ace0, ...)
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/goenv/versions/1.11.5/src/net/rpc/server.go:384 +0x14e
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4: created by net/rpc.(*Server).ServeCodec
2019-08-21T13:47:08.846Z [DEBUG] plugin.terraform-provider-google-beta_v2.3.0_x4:       /opt/goenv/versions/1.11.5/src/net/rpc/server.go:481 +0x47e
2019/08/21 13:47:08 [TRACE] root: eval: *terraform.EvalWriteState
2019/08/21 13:47:08 [TRACE] root: eval: *terraform.EvalApplyProvisioners
2019/08/21 13:47:08 [TRACE] root: eval: *terraform.EvalIf
2019/08/21 13:47:08 [TRACE] root: eval: *terraform.EvalWriteState
2019/08/21 13:47:08 [TRACE] root: eval: *terraform.EvalWriteDiff
2019/08/21 13:47:08 [TRACE] root: eval: *terraform.EvalApplyPost
2019-08-21T13:47:08.848Z [DEBUG] plugin: plugin process exited: path=/terraform/modules/kubernetes/.terraform/plugins/linux_amd64/terraform-provider-google-beta_v2.3.0_x4
2019/08/21 13:47:08 [ERROR] root: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* google_container_cluster.cluster: unexpected EOF
2019/08/21 13:47:08 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* google_container_cluster.cluster: unexpected EOF
2019/08/21 13:47:08 [TRACE] [walkApply] Exiting eval tree: google_container_cluster.cluster
2019/08/21 13:47:08 [TRACE] dag/walk: upstream errored, not walking "google_container_node_pool.preemptible_pool"
2019/08/21 13:47:08 [TRACE] dag/walk: upstream errored, not walking "provider.google-beta (close)"
2019/08/21 13:47:08 [TRACE] dag/walk: upstream errored, not walking "google_container_node_pool.primary_pool"
2019/08/21 13:47:08 [TRACE] dag/walk: upstream errored, not walking "provider.google (close)"
2019/08/21 13:47:08 [TRACE] dag/walk: upstream errored, not walking "meta.count-boundary (count boundary fixup)"
2019/08/21 13:47:08 [TRACE] dag/walk: upstream errored, not walking "root"
2019/08/21 13:47:08 [TRACE] Preserving existing state lineage "b091a3d7-ed02-76c0-655b-e2110026d930"
2019/08/21 13:47:08 [TRACE] Preserving existing state lineage "b091a3d7-ed02-76c0-655b-e2110026d930"
2019/08/21 13:47:08 [TRACE] Preserving existing state lineage "b091a3d7-ed02-76c0-655b-e2110026d930"
2019/08/21 13:47:09 [DEBUG] plugin: waiting for all plugin processes to complete...
2019-08-21T13:47:09.124Z [WARN ] plugin: error closing client during Kill: err="connection is shut down"
2019-08-21T13:47:09.128Z [DEBUG] plugin.terraform-provider-google_v2.13.0_x4: 2019/08/21 13:47:09 [ERR] plugin: plugin server: accept unix /tmp/plugin035859651: use of closed network connection
2019-08-21T13:47:09.131Z [DEBUG] plugin: plugin process exited: path=/terraform/modules/kubernetes/.terraform/plugins/linux_amd64/terraform-provider-google_v2.13.0_x4



!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

Terraform crashed! This is always indicative of a bug within Terraform.
A crash log has been placed at "crash.log" relative to your current
working directory. It would be immensely helpful if you could please
report the crash with Terraform[1] so that we can fix this.

When reporting bugs, please include your terraform version. That
information is available on the first line of crash.log. You can also
get it by running 'terraform --version' on the command line.

[1]: https://github.com/hashicorp/terraform/issues

!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

@viniciusmucugeubeeqo
Copy link

viniciusmucugeubeeqo commented Aug 21, 2019

Just had the same error after last changes and trying to apply for a new cluster.

Terraform Version
Terraform v0.11.14

provider.google v1.20.0
provider.google-beta v1.20.0
provider.kubernetes v1.8.1

The code was running correctly for some time already, since it starts to complain about the creation and deletion of the default node pool.

I needed to remove the following statements from the resource: cluster:

lifecycle {
ignore_changes = [ "node_pool" ]
}
node_pool {
name = "default-pool"
}

I followed to change the following parameters:

remove_default_node_pool = true
initial_node_count = 1

But keeps repeating the same issue.

@robertb724
Copy link

robertb724 commented Aug 21, 2019

It looks like in 2.11.0 of the google-beta-provider they improved the error-handling for these types of things.

https://github.com/terraform-providers/terraform-provider-google/blob/master/CHANGELOG.md#2110-july-16-2019

@rohitrd0987
Copy link
Author

Thanks @robertb724 I tried with Google Beta provider version 2.11.0 and worked out with out any error.

module.k8s.google_container_cluster.cluster: Still creating... (4m31s elapsed)
module.k8s.google_container_cluster.cluster: Still creating... (4m41s elapsed)
module.k8s.google_container_cluster.cluster: Still creating... (4m51s elapsed)
module.k8s.google_container_cluster.cluster: Still creating... (5m1s elapsed)
module.k8s.google_container_cluster.cluster: Still creating... (5m11s elapsed)
module.k8s.google_container_cluster.cluster: Creation complete after 5m15s (ID: test-rohit)

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

@robertb724
Copy link

@rohitrd0987 yes, still curious about what happened over the weekend to cause these errors to increase. Our setup had been working for months on 2.3. Probably a change on the Google side

@thiagofernandocosta
Copy link

It looks like in 2.11.0 of the google-beta-provider they improved the error-handling for these types of things.

https://github.com/terraform-providers/terraform-provider-google/blob/master/CHANGELOG.md#2110-july-16-2019

@robertb724
Saved my life

Many thanks!!!

@rohitrd0987
Copy link
Author

@robertb724 I was also working with 2.3.0 for more than 3 months, when I started using Istio plugin as a add-on config in my tf scripts for GKE.
From the bug fixes in 2.11.0
container: google_container_cluster will now wait to act until the cluster can be operated on, respecting timeouts.
My understanding was this happens by default with 2.3.0 as well but anyways thanks for your help.
I'm trying to find the root cause with google on this, if I find anything I'll post it here.

@slevenick
Copy link
Collaborator

Hey there, this appears to be a duplicate of several other issues and it is fixed in v2.11.0 of the google provider. The preferred solution is to upgrade your version of the google provider to 2.11.0+

This is due to a nil reference to maintenance window. A workaround is listed in the following bug if upgrading the provider is not an option #4010

@rohitrd0987
Copy link
Author

@ghost
Copy link

ghost commented Sep 21, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Sep 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants