Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_dataflow_flex_template_job - Error: The terraform-provider-google-beta_v5.12.0_x5 plugin crashed! #17046

Closed

Comments

@rishitandon1
Copy link

rishitandon1 commented Jan 19, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.5.7
hashicorp/google-beta v5.12.0

Affected Resource(s)

google_dataflow_flex_template_job

Terraform Configuration Files

Below is a sample terraform code :

resource "google_dataflow_flex_template_job" "dataflow_jobs_static" {
  provider = google-beta
  for_each = var.flex_templates
  name = "test"
  container_spec_gcs_path = "${google_storage_bucket.artifact["dataflow"].url}/${each.value.gcs_object_name}"
  on_delete               = "drain"
  project                 = var.dataeng_project
  region                  = lookup(local.regions, each.value.region, "northamerica-northeast1")
  parameters = each.value.sdk_language == "JAVA" ? merge(
    {
      stagingLocation       = "${google_storage_bucket.tempbucket["temp"].url}/staging"
      tempLocation          = "${google_storage_bucket.tempbucket["temp"].url}/temp"
      serviceAccount        = module.usecase_service_account.email
      network               = var.network
      subnetwork            = var.subnetwork
      usePublicIps          = "false"
      enableStreamingEngine = each.value.enable_streaming_engine
      workerMachineType     = each.value.worker_machine_type
      numWorkers            = each.value.num_workers
      maxNumWorkers         = each.value.max_num_workers
      experiments           = join(",", each.value.additional_experiments)
    }, each.value.custom_parameters) : merge(
    {
      staging_location        = "${google_storage_bucket.tempbucket["temp"].url}/staging"
      temp_location           = "${google_storage_bucket.tempbucket["temp"].url}/temp"
      service_account_email   = module.usecase_service_account.email
      network               = var.network
      subnetwork            = var.subnetwork
      no_use_public_ips       = "true"
      enable_streaming_engine = each.value.enable_streaming_engine
      machine_type            = each.value.worker_machine_type
      num_workers             = each.value.num_workers
      max_num_workers         = each.value.max_num_workers
      experiments             = join(",", each.value.additional_experiments)
  }, each.value.custom_parameters)

  labels = {
    owner-primary-pein   = var.owner_primary_pein
    owner-secondary-pein = var.owner_secondary_pein
    env                  = var.env
    }
}

Panic Output

Error: Plugin did not respond

│ The plugin encountered an error, and failed to respond to the
│ plugin.(*GRPCProvider).ApplyResourceChange call. The plugin logs may
│ contain more details.

Stack trace from the terraform-provider-google-beta_v5.12.0_x5 plugin:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x90 pc=0x34747c6]
goroutine 263 [running]:
github.com/hashicorp/terraform-provider-google-beta/google-beta/services/dataflow.resourceDataflowFlexTemplateJobRead(0xc001f46480, {0x3ed4aa0?, 0xc00231e000})
github.com/hashicorp/terraform-provider-google-beta/google-beta/services/dataflow/resource_dataflow_flex_template_job.go:448 +0x686
github.com/hashicorp/terraform-provider-google-beta/google-beta/services/dataflow.resourceDataflowFlexTemplateJobDelete(0xc001f46480, {0x3ed4aa0?, 0xc00231e000})
github.com/hashicorp/terraform-provider-google-beta/google-beta/services/dataflow/resource_dataflow_flex_template_job.go:674 +0x5f1
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).delete(0x466cfd8?, {0x466cfd8?, 0xc001c26ff0?}, 0xd?, {0x3ed4aa0?, 0xc00231e000?})
github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/schema/resource.go:746 +0x178
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).Apply(0xc000aac700, {0x466cfd8, 0xc001c26ff0}, 0xc0008485b0, 0xc001f46400, {0x3ed4aa0, 0xc00231e000})
github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/schema/resource.go:806 +0x605
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*GRPCProviderServer).ApplyResourceChange(0xc00147f3c8, {0x466cfd8?, 0xc001c26ed0?}, 0xc0018be230)
github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/schema/grpc_provider.go:1021 +0xe8d
github.com/hashicorp/terraform-plugin-mux/tf5muxserver.muxServer.ApplyResourceChange({0xc000abc6f0, 0xc000abc750, {0xc001c2f840, 0x2, 0x2}, {0x0, 0x0, 0x0}, {0x0, 0x0, ...}, ...}, ...)
github.com/hashicorp/[email protected]/tf5muxserver/mux_server_ApplyResourceChange.go:27 +0x102
github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).ApplyResourceChange(0xc000725400, {0x466cfd8?, 0xc001c26240?}, 0xc000662000)
github.com/hashicorp/[email protected]/tfprotov5/tf5server/server.go:818 +0x574
github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_ApplyResourceChange_Handler({0x3e3fa60?, 0xc000725400}, {0x466cfd8, 0xc001c26240}, 0xc001f46000, 0x0)
github.com/hashicorp/[email protected]/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:385 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0013425a0, {0x466cfd8, 0xc001c260f0}, {0x4676e78, 0xc000f944e0}, 0xc002510480, 0xc0016d3aa0, 0x60ed500, 0x0)
google.golang.org/[email protected]/server.go:1372 +0xe49
google.golang.org/grpc.(*Server).handleStream(0xc0013425a0, {0x4676e78, 0xc000f944e0}, 0xc002510480)
google.golang.org/[email protected]/server.go:1783 +0x1031
google.golang.org/grpc.(*Server).serveStreams.func2.1()
google.golang.org/[email protected]/server.go:1016 +0x68
created by google.golang.org/grpc.(*Server).serveStreams.func2
google.golang.org/[email protected]/server.go:1027 +0x12e
Error: The terraform-provider-google-beta_v5.12.0_x5 plugin crashed!
This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.

Expected Behavior

Create a dataflow flex template job.

Actual Behavior

Panic! error while creating flex template job and google-beta provider crashing intermittently.

Steps to Reproduce

Error being received either while doing terraform apply or terraform destroy

References

Additional information

I have tested by moving the "enableStreamingEngine" attribute to the main resource from parameters block as suggested in the above mentioned issues but faced the same issue. I also tested by removing this attribute completely as well and still facing the same issue. Issue happens about 90% of the time and re-running the job sometimes fixes the issue as well.
Also tested with multiple previous versions of the google-beta provider and still the same issue.

This behavior is breaking our CI/CD pipeline and any help or suggestions would be greatly appreciated.

b/321385982

@github-actions github-actions bot added crash forward/review In review; remove label to forward service/dataflow labels Jan 19, 2024
@edwardmedia edwardmedia self-assigned this Jan 19, 2024
@edwardmedia
Copy link
Contributor

@rishitandon1 can you repro the issue with a simple config (without dynamic code), and also share the steps to reach this point?

@rishitandon1
Copy link
Author

@rishitandon1 can you repro the issue with a simple config (without dynamic code), and also share the steps to reach this point?

Hi Edward, thank you for your response. I didn’t observe this error for a long time with static or even with dynamic code.
We are deploying flex template jobs based on the flextemplates we have released to artifactory and then use ‘http’ data source to access the released template in the same tf config file where we are deploying the flex template job.

Sometimes re-running the job fixes the issue and there are times where the pipeline runs seamlessly without any plugin crashes.

The errors are sometimes similar to the issues mentioned in #16713 and #16799 and other times the plugin points to panic: runtime error: invalid memory address or nil pointer dereference.
Please let me know in case of any further information required on this.

Thanks in advance.

@melinath
Copy link
Collaborator

The line referenced in the panic is https://github.com/hashicorp/terraform-provider-google-beta/blob/v5.12.0/google-beta/services/dataflow/resource_dataflow_flex_template_job.go#L448. It seems like the only way that could happen is if job.Environment is ever nil, but I don't know how that could happen.

@edwardmedia edwardmedia removed the forward/review In review; remove label to forward label Jan 19, 2024
@edwardmedia edwardmedia removed their assignment Jan 19, 2024
@rishitandon1
Copy link
Author

The line referenced in the panic is https://github.com/hashicorp/terraform-provider-google-beta/blob/v5.12.0/google-beta/services/dataflow/resource_dataflow_flex_template_job.go#L448. It seems like the only way that could happen is if job.Environment is ever nil, but I don't know how that could happen.

@melinath @edwardmedia , tried to look at the code attached above and how job.Environment can be nil but couldn't find any more information regarding the same.
I also tested with several iterations of removing some of the configured attributes as parameters and testing but saw similar results.
I have tested with a JAVA and PY based dataflow job as well and the plugin could crash on either one.

@rishitandon1
Copy link
Author

@melinath @edwardmedia , is there something else that I could try or look at to find a workaround or get this resolved ?

@melinath
Copy link
Collaborator

@rishitandon1 if you're able to get TF_LOG=DEBUG level logs, that would include the API requests/responses. My best guess is that when this error happens, the API is responding with something that doesn't include the job.Environment, but it would be good to confirm (and there may be other clues in the API response as to what's happening.) If you can get those logs, could you share (a cleaned version of) the API response that causes this panic?

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.