Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_compute_backend_service failing to apply multiple backends #3937

Open
hawksight opened this issue Jun 27, 2019 · 18 comments
Open

google_compute_backend_service failing to apply multiple backends #3937

hawksight opened this issue Jun 27, 2019 · 18 comments
Labels
forward/linked persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work service/compute-l7-load-balancer size/l

Comments

@hawksight
Copy link

hawksight commented Jun 27, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Description of Problem

I'm experiencing issues when trying to build a google_compute_backend_service with multiple backends (instance groups) in order to target all the nodes of my GKE cluster.

I have cluster module & a cluster-lb module which I execute from an environment terraform configuration. I am outputting the instance groups at the end of the cluster module based on a data resource to ensure I get the urls to all cluster nodes eg..

output "K8S_INSTANCE_GROUP_URLS" {
  value       = data.google_container_cluster.information.instance_group_urls
  description = "URLs to the instance groups for all nodes"
}

For simplicity sake I am taking a variable in the cluster-lb module which is that list.

variable "backend_group_list" {
  description = "Map backend indices to list of backend maps."
  type        = list
  default     = []
}

In my module code I am trying to configure the backend subblock as described here, which has a specific format of (i think):

backend = [
    { group = <url> },
    { group = <url> }
]

(seems to be what this imples)

or backend block is specified twice?

backend { group = <url> }
backend { group = <url> }

The topic of documentation is covered in #3498 and I initially added some error logs in this comment.

Terraform Version

Terraform v0.12.2
+ provider.google v2.9.1
+ provider.null v2.1.2
+ provider.random v2.1.2
+ provider.template v2.1.2

Affected Resource(s)

  • google_compute_backend_service

Terraform Configuration Files

cluster-lb module backends

variable "backend_group_list" {
  description = "Map backend indices to list of backend maps."
  type        = list
  default     = []
}

variable "backend_public" {
  description = "Parameters to the public backend"
  type = object({
    enabled         = bool
    health_path     = string
    port_name       = string
    port_number     = number
    timeout_seconds = number
    iap_enabled     = bool
  })

  default = {
    enabled         = true
    health_path     = "/"
    port_name       = "http"
    port_number     = 30100
    timeout_seconds = 30
    iap_enabled     = false
  }
}

variable "backend_private" {
  description = "Parameters to the private backend"
  type = object({
    enabled         = bool
    health_path     = string
    port_name       = string
    port_number     = number
    timeout_seconds = number
    iap_enabled     = bool
  })

  default = {
    enabled         = true
    health_path     = "/"
    port_name       = "http"
    port_number     = 30100
    timeout_seconds = 30
    iap_enabled     = true
  }
}

variable "backend_monitor" {
  description = "Parameters to the monitoring backend"
  type = object({
    enabled         = bool
    health_path     = string
    port_name       = string
    port_number     = number
    timeout_seconds = number
    iap_enabled     = bool
  })

  default = {
    enabled         = true
    health_path     = "/"
    port_name       = "monitor"
    port_number     = 30101
    timeout_seconds = 30
    iap_enabled     = true
  }
}

resource "google_compute_backend_service" "public" {
  project     = var.project
  name        = "${var.name}-backend-public"
  port_name   = var.backend_public["port_name"]
  protocol    = "HTTP"
  timeout_sec = var.backend_public["timeout_seconds"]
  dynamic "backend" {
    for_each = [ for b in var.backend_group_list : b ]
    content {
      group = backend.value
    }
  }

  health_checks = list(google_compute_health_check.public.self_link)
}

resource "google_compute_backend_service" "private" {
  project     = var.project
  name        = "${var.name}-backend-private"
  port_name   = var.backend_private["port_name"]
  protocol    = "HTTP"
  timeout_sec = var.backend_private["timeout_seconds"]
  dynamic "backend" {
    for_each = var.backend_group_list
    content {
      group                        = backend.value
      // adding null values otherwise reapplication fails
      balancing_mode               = null
      capacity_scaler              = null
      description                  = null
      max_connections              = null
      max_connections_per_instance = null
      max_rate                     = null
      max_rate_per_instance        = null
      max_utilization              = null
    }
  }
  health_checks = list(google_compute_health_check.private.self_link)
  
  iap {
    oauth2_client_id     = var.iap_oauth_id
    oauth2_client_secret = var.iap_oauth_secret
  }
}

resource "google_compute_backend_service" "monitor" {
  project     = var.project
  name        = "${var.name}-backend-monitor"
  port_name   = var.backend_monitor["port_name"]
  protocol    = "HTTP"
  timeout_sec = var.backend_monitor["timeout_seconds"]
  dynamic "backend" {
    for_each = var.backend_group_list
    content {
      group = backend.value
    }
  }
  health_checks = list(google_compute_health_check.monitor.self_link)

  iap {
    oauth2_client_id     = var.iap_oauth_id
    oauth2_client_secret = var.iap_oauth_secret
  }
}

Debug Output

I've posted the encrypted version (sing hashicorp key from keybase) in this gist:
https://gist.github.com/hawksight/bde83268020c8701fc9ac35c1b6d3fb8

Used the following to encrypt:

keybase pgp encrypt -i ~/Logs/1561630982-terraform.log -o ~/Logs/1561630982-terraform.log.crypt hashicorp

Wasn't confident there wouldn't be any sensitive details in the debug log, hence encryption.
Let me know if I need to share another way.

Panic Output

None

Expected Behavior

I have three backends which I am manually specifying with different names. They are all backends to the same set of GKE nodes. Our clusters use multi-zone node pools and usually have two node pools. In GKE, this means you have an instance group for each zone for each node pool. In the example I am showing here, I have setup with two node pools in a single zone, so two instance groups equating to two backends to specify.

In the plan I expect to see two backend blocks as I am using the dynamic` provisioner from 0.12 to generate a block for each group URL / self-link passed in.

In the application I expect the backend to be created and have both instance groups as its target, not the fail with the error provided.

Actual Behavior

The plan worked although it only specifies one backend in the output.
It only knows the groups after application, which I find unhelpful.
Even when the cluster is prebuilt the plan still doesn't see that I have more than one instance group to add. This is probably something to with the way terraform plans things, but unsure on specifics.

Here's an example plan output:

  # module.cluster-lb.google_compute_backend_service.monitor will be created
  + resource "google_compute_backend_service" "monitor" {
      + connection_draining_timeout_sec = 300
      + creation_timestamp              = (known after apply)
      + fingerprint                     = (known after apply)
      + health_checks                   = (known after apply)
      + id                              = (known after apply)
      + load_balancing_scheme           = "EXTERNAL"
      + name                            = "vpc-du-lb-backend-monitor"
      + port_name                       = "http"
      + project                         = "MASKED"
      + protocol                        = "HTTP"
      + self_link                       = (known after apply)
      + session_affinity                = (known after apply)
      + timeout_sec                     = 30

      + backend {
          + balancing_mode  = "UTILIZATION"
          + capacity_scaler = 1
          + group           = (known after apply)
          + max_utilization = 0.8
        }

      + cdn_policy {
          + signed_url_cache_max_age_sec = (known after apply)

          + cache_key_policy {
              + include_host           = (known after apply)
              + include_protocol       = (known after apply)
              + include_query_string   = (known after apply)
              + query_string_blacklist = (known after apply)
              + query_string_whitelist = (known after apply)
            }
        }

      + iap {
          + oauth2_client_id            = "MASKED"
          + oauth2_client_secret        = (sensitive value)
          + oauth2_client_secret_sha256 = (sensitive value)
        }
    }

  # module.cluster-lb.google_compute_backend_service.private will be created
  + resource "google_compute_backend_service" "private" {
      + connection_draining_timeout_sec = 300
      + creation_timestamp              = (known after apply)
      + fingerprint                     = (known after apply)
      + health_checks                   = (known after apply)
      + id                              = (known after apply)
      + load_balancing_scheme           = "EXTERNAL"
      + name                            = "vpc-du-lb-backend-private"
      + port_name                       = "http"
      + project                         = "MASKED"
      + protocol                        = "HTTP"
      + self_link                       = (known after apply)
      + session_affinity                = (known after apply)
      + timeout_sec                     = 30

      + backend {
          + balancing_mode               = (known after apply)
          + capacity_scaler              = (known after apply)
          + description                  = (known after apply)
          + group                        = (known after apply)
          + max_connections              = (known after apply)
          + max_connections_per_instance = (known after apply)
          + max_rate                     = (known after apply)
          + max_rate_per_instance        = (known after apply)
          + max_utilization              = (known after apply)
        }

      + cdn_policy {
          + signed_url_cache_max_age_sec = (known after apply)

          + cache_key_policy {
              + include_host           = (known after apply)
              + include_protocol       = (known after apply)
              + include_query_string   = (known after apply)
              + query_string_blacklist = (known after apply)
              + query_string_whitelist = (known after apply)
            }
        }

      + iap {
          + oauth2_client_id            = "MASKED"
          + oauth2_client_secret        = (sensitive value)
          + oauth2_client_secret_sha256 = (sensitive value)
        }
    }

  # module.cluster-lb.google_compute_backend_service.public will be created
  + resource "google_compute_backend_service" "public" {
      + connection_draining_timeout_sec = 300
      + creation_timestamp              = (known after apply)
      + fingerprint                     = (known after apply)
      + health_checks                   = (known after apply)
      + id                              = (known after apply)
      + load_balancing_scheme           = "EXTERNAL"
      + name                            = "vpc-du-lb-backend-public"
      + port_name                       = "http"
      + project                         = "MASKED"
      + protocol                        = "HTTP"
      + self_link                       = (known after apply)
      + session_affinity                = (known after apply)
      + timeout_sec                     = 30

      + backend {
          + balancing_mode  = "UTILIZATION"
          + capacity_scaler = 1
          + group           = (known after apply)
          + max_utilization = 0.8
        }

      + cdn_policy {
          + signed_url_cache_max_age_sec = (known after apply)

          + cache_key_policy {
              + include_host           = (known after apply)
              + include_protocol       = (known after apply)
              + include_query_string   = (known after apply)
              + query_string_blacklist = (known after apply)
              + query_string_whitelist = (known after apply)
            }
        }
    }

I get the following errors when trying to apply:

Error: Provider produced inconsistent final plan

When expanding the plan for
module.cluster-lb.google_compute_backend_service.public to include new values
learned so far during apply, provider "google" produced an invalid new value
for .backend: block set length changed from 1 to 2.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent final plan

When expanding the plan for
module.cluster-lb.google_compute_backend_service.monitor to include new values
learned so far during apply, provider "google" produced an invalid new value
for .backend: block set length changed from 1 to 2.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent final plan

When expanding the plan for
module.cluster-lb.google_compute_backend_service.private to include new values
learned so far during apply, provider "google" produced an invalid new value
for .backend: block set length changed from 1 to 2.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

Steps to Reproduce

  1. Create a backend_service and try to pass multiple groups to it by dynamically generating using a dyanmic block or other loop method. Use my coder as an example?

  2. Try to plan and see if you get multiple backends specified

  3. Apply and see if you get errors.

Important Factoids

I've recently been upgrading to 0.12, so I really don't know if my dyanmic block is the right solution, or if I can use a for_each instead or some combination. I've found it quite hard toi distinguish from the limited examples when each variation / combination of: for, for_each and dynamic should be used.

My code works perfectly when there is only on instance group in the list. But I only tried that to prove out the code if TF compliant. My real world use case always has many instance groups to add.

Notice on my private backend service, I have explicitly set all the other block options to null. This is because when I did successfully build with one instance group, the subsequent application failed because the attributes were not set. So on re-application those parameters seem to not be optional anymore, hence the null values. Thanks to the author of this comment for the example.

I also tried turning my input list into the format:

[ { group = URL}, {group = URL } ...]

References

b/374162106

@ghost ghost added the bug label Jun 27, 2019
@rileykarson rileykarson self-assigned this Jun 27, 2019
@jaceq
Copy link
Contributor

jaceq commented Sep 20, 2019

I have the same issue:

Error: Provider produced inconsistent final plan

When expanding the plan for
module.https.google_compute_backend_service.https to include new
values learned so far during apply, provider "google" produced an invalid new
value for .backend: block set length changed from 1 to 3.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

I also use dynamic block, funny thing is that is I apply again, it works... nevertheless it's quite annoying.

@rileykarson
Copy link
Collaborator

rileykarson commented Oct 14, 2019

I believe this is a variant of #4328, although I'm not sure.

I haven't gotten the chance to dig in more, and likely won't for a while so I'm unassigning myself so someone else can pick it up if they're available.

@rileykarson rileykarson removed their assignment Oct 14, 2019
@vigohe
Copy link

vigohe commented Oct 18, 2019

Similar issue using dynamic over backend block

resource "google_compute_backend_service" "default" {
  project     = google_project.this.project_id
  name        = "${var.slug}-backend"
  port_name   = "istio-http"
  protocol    = "HTTP"
  timeout_sec = 30
  dynamic "backend" {
    for_each = module.gke-cluster.cluster.instance_group_urls
    content {
      group = backend.value
      balancing_mode               = null
      capacity_scaler              = null
      description                  = null
      max_connections              = null
      max_connections_per_instance = null
      max_rate                     = null
      max_rate_per_instance        = null
      max_utilization              = null
    }
  }
  health_checks = [google_compute_health_check.this.self_link]
  enable_cdn = false
  depends_on = [module.gke-cluster.node_pools]
}

error after apply:

Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_backend_service.default to include
new values learned so far during apply, provider "google" produced an invalid
new value for .backend: planned set element
cty.ObjectVal(map[string]cty.Value{"balancing_mode":cty.UnknownVal(cty.String),
"capacity_scaler":cty.UnknownVal(cty.Number),
"description":cty.UnknownVal(cty.String), "group":cty.UnknownVal(cty.String),
"max_connections":cty.UnknownVal(cty.Number),
"max_connections_per_endpoint":cty.NullVal(cty.Number),
"max_connections_per_instance":cty.UnknownVal(cty.Number),
"max_rate":cty.UnknownVal(cty.Number),
"max_rate_per_endpoint":cty.NullVal(cty.Number),
"max_rate_per_instance":cty.UnknownVal(cty.Number),
"max_utilization":cty.UnknownVal(cty.Number)}) does not correlate with any
element in actual.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

If I apply it just one more time does works

@hawksight
Copy link
Author

hawksight commented Oct 24, 2019

@rileykarson - I had a read through of that issue, but perhaps someone could explain it layman's terms?

My rough interpretation is that going forwards in the provider that you can't use self_link and name interchangeably because ones a URL and the other a string?
I'm also unsure if that has any bearing here.

I've revisited this and I am in a similar place to @vigohe - except I cannot apply my plan more than once. When I re-plan with TF I get the following output:

      - backend {
          - balancing_mode               = "UTILIZATION" -> null
          - capacity_scaler              = 1 -> null
          - group                        = "REDACT_URL" -> null
          - max_connections              = 0 -> null
          - max_connections_per_endpoint = 0 -> null
          - max_connections_per_instance = 0 -> null
          - max_rate                     = 0 -> null
          - max_rate_per_endpoint        = 0 -> null
          - max_rate_per_instance        = 0 -> null
          - max_utilization              = 0.8 -> null
        }
      - backend {
          - balancing_mode               = "UTILIZATION" -> null
          - capacity_scaler              = 1 -> null
          - group                        = "REDACT_URL" -> null
          - max_connections              = 0 -> null
          - max_connections_per_endpoint = 0 -> null
          - max_connections_per_instance = 0 -> null
          - max_rate                     = 0 -> null
          - max_rate_per_endpoint        = 0 -> null
          - max_rate_per_instance        = 0 -> null
          - max_utilization              = 0.8 -> null
        }
      + backend {
          + balancing_mode               = (known after apply)
          + capacity_scaler              = (known after apply)
          + description                  = (known after apply)
          + group                        = (known after apply)
          + max_connections              = (known after apply)
          + max_connections_per_instance = (known after apply)
          + max_rate                     = (known after apply)
          + max_rate_per_instance        = (known after apply)
          + max_utilization              = (known after apply)
        }

(where I REDACTED_URL - the urls were right to the correct instance groups)

So I checked in the UI and the backend_service(s) have been created once with TF. However I can't do anything with them because of this error (same as @vigohe)

Error: Provider produced inconsistent final plan

When expanding the plan for
module.cluster-lb.google_compute_backend_service.private[0] to include new
values learned so far during apply, provider "google" produced an invalid new
value for .backend: planned set element
cty.ObjectVal(map[string]cty.Value{"balancing_mode":cty.UnknownVal(cty.String),
"capacity_scaler":cty.UnknownVal(cty.Number),
"description":cty.UnknownVal(cty.String), "group":cty.UnknownVal(cty.String),
"max_connections":cty.UnknownVal(cty.Number),
"max_connections_per_endpoint":cty.NullVal(cty.Number),
"max_connections_per_instance":cty.UnknownVal(cty.Number),
"max_rate":cty.UnknownVal(cty.Number),
"max_rate_per_endpoint":cty.NullVal(cty.Number),
"max_rate_per_instance":cty.UnknownVal(cty.Number),
"max_utilization":cty.UnknownVal(cty.Number)}) does not correlate with any
element in actual.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

^ same error for all 3 similar backend_service.

For reference this is my code snippets:

# From my cluster moduel
output "K8S_INSTANCE_GROUP_URLS" {
  value       = data.google_container_cluster.information.instance_group_urls
  description = "URLs to the instance groups for all nodes"
}

# Var in put in lb module
variable "backend_group_list" {
  description = "Map backend indices to list of backend maps."
  type        = list
  default     = []
}

# main.tf that call lb module
...
backend_group_list = module.cluster.K8S_INSTANCE_GROUP_URLS
...

# The backen_service dynamic block
...
  dynamic "backend" {
    for_each = var.backend_group_list
    content {
      group = backend.value
      // adding null values otherwise reapplication fails
      balancing_mode               = null
      capacity_scaler              = null
      description                  = null
      max_connections              = null
      max_connections_per_instance = null
      max_rate                     = null
      max_rate_per_instance        = null
      max_utilization              = null
    }
  }
...

Going back to the linked issue - the output K8S_INSTANCE_GROUP_URLS is already known in the current applications as the cluster is built, and TF can obviously see the existing backend_service it just created, as it shows it wanting to remove the backends and recreate it.

Does TF not render dynamic blocks at planning? Even if the input is already available to iterate over?

My TF commands for reference:

tf plan -refresh -out plan -var-file=inputs.tfvars -target='module.cluster-lb'
tf apply plan

And some up to date version when testing today

Terraform v0.12.12
+ provider.google v2.10.0
+ provider.null v2.1.2
+ provider.random v2.1.2
+ provider.template v2.1.2

Note - I upgraded providers & same issue. eg. provider.google v2.18.0

@rileykarson
Copy link
Collaborator

The linked issue doesn't seem to be related, we encounter a similar error when values change at apply time. The proposal in the issue is that we set the value from the user's config in state instead of the API-returned value. They're equivalent, but Terraform Core can't tell. Terraform 0.11 didn't care when they mismatched, but Terraform 0.12 errors out.

This looks related to dynamic blocks instead, and is probably a hashicorp/terraform issue, although it's hard to say.

@hawksight
Copy link
Author

👍 for speedy reply.
I have uploaded debug logs here incase they point out something. (encrypted with hashicorp key)

It looks to be getting the right response back from google in terms of all the nulls have been replaced with the default values for the instance groups. But that about as much as I can gleam from there.

@rileykarson
Copy link
Collaborator

@paddycarver can you take a look? You've got the key + probably more context on dynamic than I do.

@hawksight
Copy link
Author

hawksight commented Oct 24, 2019

So I've worked out that at least in my instance, the issue seems not to be build time as it will build fresh as desired even if the plan doesn't show you the dynamic block output. (destroyed all the module.cluster-lb resources and built fresh)

The issue is once the backend_service is built, the plan will always try to recreate the backends with all values (known after apply). Trouble is that when you apply anything else after the services are built, then that error is produced as previously pasted. (all subsequent applications)

Whilst I rarely make changes to a backend_service once created, it's not something I can delete and recreate in production. And it also means I can't run the rest of my code base with a single tf plan / tf apply as I will get errors. So I'd still be stuck using -target on the cli.

@ogreface
Copy link
Contributor

ogreface commented Dec 20, 2019

Same issue. The solution by @vigohe works for me. I have to apply twice, but it works. Changes/deletes/adds after that all work as expected.

resource "google_compute_region_backend_service" "service_vip" {
  provider = "google-beta"
  name     = "${var.lb_prefix_name}-vip-bs"

  health_checks         = var.lb_health_checks
  project               = var.project
  region                = var.region
  protocol              = "TCP"
  load_balancing_scheme = "INTERNAL"

  dynamic "backend" {
    for_each = var.lb_service_groups
    content {
      group = backend.key
      failover = lookup(backend.value, "failover", false)
    }
  }
}

In this case, var.lb_service_groups is a map { instance.self_link: {}}

@wyardley
Copy link

We're seeing this issue with our integration tests, because idempotency is one thing we're testing for, would prefer not to simply reapply.
Interestingly, I don't think we'd been seeing this with 2.17 provider, but are getting it consistently with 2.20.1. Will try to investigate further.

@hawksight
Copy link
Author

I bit the bullet and moved one of my projects to use my 0.12 updated code. Unfortunately I don't seem to be able apply again in my case, as suggested in other comments.

I use the following commands:

tf plan -refresh -out plan -var-file=inputs.tfvars
tf apply plan

I checked it wasn't due to plan usage and got the same errors apply interactively:

tf apply -var-file=inputs.tfvars

Version are:

Terraform v0.12.20
+ provider.google v2.20.2
+ provider.null v2.1.2
+ provider.random v2.2.1

Note that my input to the dynamic block var.backend_group_list is a list and not a map like @ogreface's seems to be.

GROUPS = [
  "https://www.googleapis.com/compute/v1/projects/<redacted>..",
  "https://www.googleapis.com/compute/v1/projects/<redacted>..",
  "https://www.googleapis.com/compute/v1/projects/<redacted>..",
]

I am also not using the google-beta provider, only the normal google-provider. I may give beta a go.

@ogreface - do you generate var.lb_service_groups by pulling information from data providers?

@hawksight
Copy link
Author

Actually think I have resolved my issue after some poking around.

tl;dr

Issue was passing the results of a data lookup (on the k8s cluster) as an output in one module(gcloud-k8s), and trying to use those as the input to another module (gcloud-lb-custom).

longer read

I had a setup as such:

environment
|--main.tf
|--inputs.tfvars
terraform-modules
|-- gcloud-k8s
     |-- main.tf
     |-- outputs.tf
|-- gcloud-lb-custom
     |-- main.tf
     |-- variables.tf

What I was doing

In each environment, I'd call my module (gcloud-k8s) to build a cluster. At the end of said module I had a data lookup on the cluster which depended on all node pool creations. This would become the output K8S_INSTANCE_GROUP_URLS

Then I'd build the load balancer through my next module (gcloud-lb-custom) which would take in input variable backend_group_list. Obviously when calling that module, I'd fill that input with the other modules output:

backend_group_list = module.cluster.K8S_INSTANCE_GROUP_URLS

This has been erroring ever since upgrading to 0.12. It used to work in 0.11. Hence raising this issue.

What I changed to see if my loop was correct

I basically took the output from tf output and set that in variables.tf for the loadbalancer module (gcloud-lb-custom). When I tf plan everything planned correctly. When I removed an instance group, the plan reconfigured the backends correctly going from 3 backends to 2 in this instance.

This made me think the issue was something to do with me passing input to one module from the output of another.

What I'm now doing

I've moved that data lookup into the lb module (gcloud-lb-custom) and that lookup is configured via two other outputs from the cluster module (gcloud-k8s):

module "cluster-lb" {
  source = "../terraform-modules/gcloud-lb-custom"
  cluster      = module.cluster.K8S_NAME
  cluster_zone = module.cluster.K8S_ZONE
  ...
}

Inside the cluster module:

data "google_container_cluster" "hack" {
  name       = var.cluster
  zone       = var.cluster_zone
  project    = var.project
}

And further down in the module, use that lookup to pass in the list of instance_group_urls, so my dynamic backend looks like:

  dynamic "backend" {
    for_each = data.google_container_cluster.hack.instance_group_urls
    content {
      group = backend.value
      // adding null values otherwise reapplication fails
      balancing_mode               = "UTILIZATION"
      capacity_scaler              = 1
      description                  = null
      max_connections              = 0
      max_connections_per_instance = 0
      max_rate                     = 0
      max_rate_per_instance        = 0
      max_utilization              = 0.8
    }
  }

It seems to work fairly well now so far.

I did also upgrade the google provider to latest:

Terraform v0.12.20
+ provider.google v3.7.0
+ provider.google-beta v3.7.0
+ provider.null v2.1.2
+ provider.random v2.2.1

TIL

Probably not the first or last time I'll be bitten by passing things from one module to another. Arguably its cleaner to fetch the urls inside the load balancer module but I would have thought the output would be stored in state and used during the plan (probably misunderstanding internal workings of terraform plan).

As a side effect, I have yet to see that error message again, but will be doing lots of testing around this.
If anyone else has the issue, hopefully some of the example above will help you find a solution.

@ogreface
Copy link
Contributor

ogreface commented Feb 4, 2020

@ogreface - do you generate var.lb_service_groups by pulling information from data providers?

Not data providers technically, but they are references to other blocks in the same module. Glad you have a solution though!

@cynful
Copy link
Contributor

cynful commented Feb 20, 2020

Same issue. Errors on first apply, passes on second apply

I've tried the work around described by @hawksight however based on the way that I'm dynamically assigning the backend blocks I get a splat error instead. This may or may not be related to how the dependency graph gets walked.

 Error: Splat of null value
       
         on ../modules/gce_backend_services/main.tf line 15, in resource "google_compute_backend_service" "default":
         15:         data.google_container_cluster.cluster.node_pool.*.instance_group_urls,
           |----------------
           | data.google_container_cluster.cluster.node_pool is null
       
       Splat expressions (with the * symbol) cannot be applied to null sequences.

So next, I tried to pass the attribute to the module as a variable.

resource "google_compute_backend_service" "default" {
  dynamic "backend" {
    for_each = flatten(
      matchkeys(
        var.node_pool[*].instance_group_urls,
        var.node_pool[*].name,
        split(",", var.backend_services[count.index]["groups"]),
      ),
    )
...
# calling it as module
module "test-gce-module-backend-services" {
  source = "../modules/gce_backend_services"
  name   = "test-gce-module-backend-services"

  # this module output is google_container_cluster.cluster.node_pool
  node_pool = module.test-gke-module-cluster.node_pool

  backend_services = [
    {
      groups = "default-pool"
    },
    {
     groups = "second-pool"

This seems to works, but now gives me the same error as @vigohe
This error will also pass after a second apply

 Error: Provider produced inconsistent final plan
       
       When expanding the plan for
       module.test-gce-module-backend-services.google_compute_backend_service.default[1]
       to include new values learned so far during apply, provider
       "registry.terraform.io/-/google" produced an invalid new value for .backend:
       planned set element
       cty.ObjectVal(map[string]cty.Value{"balancing_mode":cty.UnknownVal(cty.String),
       "capacity_scaler":cty.NumberIntVal(1), "description":cty.StringVal(""),
       "group":cty.UnknownVal(cty.String), "max_connections":cty.NullVal(cty.Number),
       "max_connections_per_endpoint":cty.NullVal(cty.Number),
       "max_connections_per_instance":cty.NullVal(cty.Number),
       "max_rate":cty.NullVal(cty.Number),
       "max_rate_per_endpoint":cty.NullVal(cty.Number),
       "max_rate_per_instance":cty.UnknownVal(cty.Number),
       "max_utilization":cty.MustParseNumberVal("0.8")}) does not correlate with any
       element in actual.
       
       This is a bug in the provider, which should be reported in the provider's own
       issue tracker.

Main concern:
This definitely appears to be a idempotent issue, and its breaking our integration tests (kitchen-terraform)
My next potential move is to try to upgrade to 3.x.x for the provider (none of the changelog appears to address this problem), but I really hope this issue gets solved in 2.20.x

$ terraform version
Terraform v0.12.20
+ provider.google v2.20.1
+ provider.google-beta v2.20.1
+ provider.null v2.1.2

@paddycarver paddycarver removed their assignment Feb 20, 2020
@rileykarson rileykarson added persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work and removed bug labels Feb 21, 2020
@danawillow danawillow added this to the Sprint 7 milestone Feb 24, 2020
@wyardley
Copy link

Not 100% sure if it's the cause of this or not, but one thing we noticed is that for resource creation, the dependency order is "backend => urlmap"; for deletion it's "urlmap => backend". For modification, it will try to use the same order as creation

modular-magician added a commit to modular-magician/terraform-provider-google that referenced this issue Sep 1, 2020
* suppress diff for secret_access_key on bigquery data transfer params

* add sensitiveParams for secret access key

* add customize diff, fix spelling

* add custom import and post create

Signed-off-by: Modular Magician <[email protected]>
modular-magician added a commit that referenced this issue Sep 1, 2020
* suppress diff for secret_access_key on bigquery data transfer params

* add sensitiveParams for secret access key

* add customize diff, fix spelling

* add custom import and post create

Signed-off-by: Modular Magician <[email protected]>
nhsieh added a commit to pivotal/docs-platform-automation that referenced this issue Jun 30, 2021
@github-actions github-actions bot added forward/review In review; remove label to forward service/compute-l7-load-balancer labels Oct 25, 2023
rizwanreza pushed a commit to pivotal/docs-platform-automation that referenced this issue Oct 4, 2024
@melinath melinath removed the forward/review In review; remove label to forward label Oct 17, 2024
@pawelJas
Copy link

pawelJas commented Oct 18, 2024

It seems this is a duplicate to #4543 which is currently being worked on.
@melinath could we close this as duplicate, please?

@melinath
Copy link
Collaborator

@c2thorn could you take a look to verify whether these are duplicates?

@c2thorn
Copy link
Collaborator

c2thorn commented Oct 18, 2024

After discussing offline, it is likely that #4543 gets resolved as the same time as this, but we'll keep this bug open since it stems from a different user action.

@c2thorn c2thorn removed their assignment Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forward/linked persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work service/compute-l7-load-balancer size/l
Projects
None yet
Development

No branches or pull requests