Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Previously null gcfs_config forcing replacement with false #2085

Closed
JerkyTreats opened this issue Sep 11, 2024 · 11 comments
Closed

Previously null gcfs_config forcing replacement with false #2085

JerkyTreats opened this issue Sep 11, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@JerkyTreats
Copy link

TL;DR

Where previously we had no gcfs_config, where I assume it was null, we now have terraform plans forcing node_pool replacement on gcfs_config enabled=false.

Docs mention this is optional, so null should equal false?

Image streaming in Google console node_pool is confirmed Disabled

          + gcfs_config { # forces replacement
              + enabled = false # forces replacement
            }

I'm assuming this is some change with 33.0.0?

Expected behavior

No destructive node pool replacement where gcfs_config was previously null and is now explicitly false

Observed behavior

destructive nodepool replacement

Terraform Configuration

module "gke" {
  source     = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version    = "33.0.0"
}

Terraform Version

1.9.5

Additional information

No response

@JerkyTreats JerkyTreats added the bug Something isn't working label Sep 11, 2024
@apeabody
Copy link
Contributor

Hi @JerkyTreats - Which version of the Google Terraform Provider are you using?

Context: Looks like the gcfs_config replacement should have been resolved in v6.2.0: hashicorp/terraform-provider-google#19365. Otherwise I suspect we will also see permadrift in the reverse direction.

@JerkyTreats
Copy link
Author

Ah, we are running provider 5.44.

Google provider 6.2.0 should fix the issue.

Closing this, thanks for linking to upstream issue.

@JerkyTreats
Copy link
Author

@apeabody - I updated to v6.2.0 with 33.0.0, and the force replacement is gone- hurray.

So I apply the change and went on my way... but checked again today and tf plan shows:

          + gcfs_config {
              + enabled = false
            }

It appears that regardless of apply, plan continuously wants to update this.

Not sure if this hashicorp/terraform-provider-google/ side or terraform-google-modules/terraform-google-kubernetes-engine/

Can you confirm if this is something reproducible on your end?

@apeabody
Copy link
Contributor

apeabody commented Sep 13, 2024

Hi @JerkyTreats - So the force replacement was fixed by the updated provider, but I suspect your remaining issue might be resolved by (upcoming) #2093?

@JerkyTreats
Copy link
Author

I see, yes probably that fixes.

I had expected a workaround would be to add an explicit enable_gcfs = false in the node_pool declaration, but it continues to trigger the update-in-place even with that added.

#2093 should definitely fix it though.

@wyardley
Copy link
Contributor

wyardley commented Sep 13, 2024

@apeabody I'm seeing new behavior with 33.0.2:

with

  enable_gcfs               = false

(which I think is the correct way, based on the module docs, and what I already had working), I'm getting:

│ Error: Incorrect attribute value type
│ 
│   on .terraform/modules/gke/modules/private-cluster/cluster.tf line 398, in resource "google_container_cluster" "primary":
│  398:           enabled = gcfs_config.value
│ 
│ Inappropriate value for attribute "enabled": bool required.

Maybe this is a result of #2093?

if I remove that line temporarily, I get back to the plan wanting to do this again:

      ~ node_config {
          + gcfs_config {
              + enabled = false
            }
        }

Let me know if you want me to file a new issue or need more information.

@wyardley
Copy link
Contributor

@apeabody I think I have a potential fix incoming

diff --git a/autogen/main/cluster.tf.tmpl b/autogen/main/cluster.tf.tmpl
index 901de66e..0f39c57c 100644
--- a/autogen/main/cluster.tf.tmpl
+++ b/autogen/main/cluster.tf.tmpl
@@ -516,7 +516,7 @@ resource "google_container_cluster" "primary" {
       min_cpu_platform            = lookup(var.node_pools[0], "min_cpu_platform", "")
       enable_confidential_storage = lookup(var.node_pools[0], "enable_confidential_storage", false)
       dynamic "gcfs_config" {
-        for_each = lookup(var.node_pools[0], "enable_gcfs", false) ? [true] : [false]
+        for_each = lookup(var.node_pools[0], "enable_gcfs", null) != null ? [var.node_pools[0].enable_gcfs] : [false]
         content {
           enabled = gcfs_config.value
         }

@apeabody
Copy link
Contributor

@apeabody I think I have a potential fix incoming

diff --git a/autogen/main/cluster.tf.tmpl b/autogen/main/cluster.tf.tmpl
index 901de66e..0f39c57c 100644
--- a/autogen/main/cluster.tf.tmpl
+++ b/autogen/main/cluster.tf.tmpl
@@ -516,7 +516,7 @@ resource "google_container_cluster" "primary" {
       min_cpu_platform            = lookup(var.node_pools[0], "min_cpu_platform", "")
       enable_confidential_storage = lookup(var.node_pools[0], "enable_confidential_storage", false)
       dynamic "gcfs_config" {
-        for_each = lookup(var.node_pools[0], "enable_gcfs", false) ? [true] : [false]
+        for_each = lookup(var.node_pools[0], "enable_gcfs", null) != null ? [var.node_pools[0].enable_gcfs] : [false]
         content {
           enabled = gcfs_config.value
         }

Thanks @wyardley - I just opened this PR (#2095), but happy to use your PR. We should also add enable_gcfs = false to an example node pool to provide text coverage for the dynamic block.

@wyardley
Copy link
Contributor

Yeah, mine would have done the wrong thing in the null case, I think. Yours looks good, and seems to fix the problem in a quick test.

@wyardley
Copy link
Contributor

FWIW, I'm still getting a permadiff on this after updating to 33.0.3 with TPG 6.0.2, either with it explicitly set to false or not. Not sure if you'd like to reopen this or have me open a new issue? There's no force replacement (at least for me, maybe because of GoogleCloudPlatform/magic-modules#11553), but I'm still seeing a permadiff.

I'd already had enable_gcfs explicitly set to false as mentioned, but confirmed that removing it still has it try to set node_config.gcfs_config.enabled to false

I did see some potential problems with GoogleCloudPlatform/magic-modules#11553 in the case where the google_container_cluster default node-pool is used, but I don't think they should be at play here.

@apeabody
Copy link
Contributor

apeabody commented Sep 13, 2024

FWIW, I'm still getting a permadiff on this after updating to 33.0.3 with TPG 6.0.2, either with it explicitly set to false or not. Not sure if you'd like to reopen this or have me open a new issue? There's no force replacement (at least for me, maybe because of GoogleCloudPlatform/magic-modules#11553), but I'm still seeing a permadiff.

I'd already had enable_gcfs explicitly set to false as mentioned, but confirmed that removing it still has it try to set node_config.gcfs_config.enabled to false

I did see some potential problems with GoogleCloudPlatform/magic-modules#11553 in the case where the google_container_cluster default node-pool is used, but I don't think they should be at play here.

Thanks @wyardley - When you get a chance open a new issue (tag me) specific to the current permadiff, in particular a snip of your config that reproduces it and the proposed plan. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants