Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore role_entity order in storage_bucket_acl #1525

Closed
wants to merge 2 commits into from

Conversation

tmshn
Copy link
Contributor

@tmshn tmshn commented May 23, 2018

When you set role_entity in storage_bucket_acl, it will be sorted in the following order: project-owners, project-editors, project-viewers and rest.

Therefore, a diff persists after apply if you don't set role_entity in the right order.

resource "google_storage_bucket_acl" "foo" {
  bucket = "${google_storage_bucket.foo.name}"

  role_entity = [
    "OWNER:user-${google_service_account.bar.email}",
    "READER:project-viewers-${data.google_project.this.number}",
    "OWNER:project-editors-${data.google_project.this.number}",
    "OWNER:user-${google_service_account.baz.email}",
    "OWNER:project-owners-${data.google_project.this.number}",
  ]
}

With this configuration, following diff appears every time even after apply.

  ~ google_storage_bucket_acl.foo
      role_entity.0:  "OWNER:[email protected]" => "OWNER:project-owners-0123456789"
      role_entity.1:  "READER:project-viewers-0123456789" => "OWNER:project-editors-0123456789"
      role_entity.2:  "OWNER:project-editors-0123456789" => "READER:project-viewers-0123456789"
      role_entity.3:  "OWNER:[email protected]" => "OWNER:[email protected]"
      role_entity.4:  "OWNER:project-owners-0123456789" => "OWNER:[email protected]"

However, its order doesn't matter in the sense of access control.

To suppress this annoying diff, I changed the type of role_entity argument from schema.TypeList to schema.TypeSet.

Note: issue #892 looks related, but may be different problem.

@danawillow danawillow added the v2 label May 25, 2018
@danawillow
Copy link
Contributor

I'm going to mark this as v2 for now, since technically this is a breaking change for anybody that might have been relying on indexes of this field. Feel free to tell me that there's no possible way anybody was doing that though; if so I'll take another look.

@tmshn
Copy link
Contributor Author

tmshn commented May 25, 2018

I'm not sure about this, but I think I didn't change how role_entity is set in the state.

https://github.com/tmshn/terraform-provider-google/blob/gcsacl-entities-unordered/google/resource_storage_bucket_acl.go#L179-L184

So the order of google_storage_bucket_acl.foo.role_entity seems preserved for users.

How do you think?

@danawillow
Copy link
Contributor

Changing it from a list to a set changes how it's set in the state, since instead of being saved as role_entity.0, role_entity.1, etc., it'll use a hash of the resource as the index (so something more like role_entity.3289345793). Does that make sense?

@tmshn
Copy link
Contributor Author

tmshn commented May 25, 2018

Oh I got it. I found the code and understand; https://github.com/hashicorp/terraform/blob/v0.11.7/helper/schema/field_writer_map.go#L283

Thanks for the clear explanation 👍

@tmshn
Copy link
Contributor Author

tmshn commented May 25, 2018

Well, then, it'll be impossible to do something like this?

resource "some_resource" "foo" {
  count   = "${length(google_storage_bucket_acl.foo.role_entity)}"
  somearg = "${element(google_storage_bucket_acl.foo.role_entity, count.index)}"
}

@danawillow
Copy link
Contributor

I think that would work, but I need to check with @paddycarver. Paddy, would something like that start not working with a change to sets or is it only if people are explicitly using the index (like, field = "${google_storage_bucket_acl.foo.role_entity.0}")

@rosbo rosbo requested review from paddycarver, rosbo and danawillow and removed request for paddycarver and rosbo May 28, 2018 21:57
@paddycarver
Copy link
Contributor

So I tried this out using the following config:

data "google_compute_instance_group" "test" {
  name = "test-instance-group"
  zone = "us-central1-a"
}

data "template_file" "init" {
  count    = "${length(data.google_compute_instance_group.test.instances)}"
  template = "$${instance}"

  vars {
    instance = "${element(data.google_compute_instance_group.test.instances, count.index)}"
  }
}

And after applying (with an instance group with 4 instances) sure enough, 4 versions of the template were in state, and they listed all the instances.

Changing the count.index to 1 also worked, showing the same instance in four different templates, though knowing which it would be is difficult. However, using ${data.google_compute_instance_group.test.instances.1} broke things, returning 4, weirdly.

Also, after asking the core team, it seems that it working with element is not a thing that happened by design, but is likely the set getting converted to a list. Which means the ordering is theoretically unstable, even though I was unable to reproduce that.

So "not working" depends on how you define it. I think it will give some output, and even arguably correct output in some cases, though incorrect output in some cases. The important question I don't know the answer to is "will it give the same answer it would have as a list?"

Another question to consider here is how likely people are to be referencing the role_entity field inside the resource, and whether that risk is worth us delaying to 2.0.0 for.

Personally, I'm a bit gun-shy after 1.13.0, so I'm hesitant to break things on people, but I could be convinced with an argument that there's really no use case here that we're breaking, meaning it's a breaking change only in technicality, but pragmatically it's not.

@morgante
Copy link

IMO the benefit of fixing this (very annoying bug) outweighs the potential breaking change.

Indexing into the ACL is unlikely to be a common use case. At most, I can imagine people relying on the overall resource being provisioned but the actual role entities ACLs themselves are only useful for GCS ACLs and likely not referenced elsewhere.

I'd vote for including this fix sooner rather than later.

@paddycarver
Copy link
Contributor

I agree that it's hard to envision situations that the fix would break in, but I think it's also a cosmetic fix, as the only real side-effect of the bug is extra API requests and it showing up in plan. It doesn't really affect the operation of your infrastructure. (It affects things that I, personally, Terraform, so I can attest to it both being annoying, but also relatively harmless, compared to something that, say, falsely forces a destroy/create cycle on every apply.)

@morgante
Copy link

I agree that it's hard to envision situations that the fix would break in, but I think it's also a cosmetic fix, as the only real side-effect of the bug is extra API requests and it showing up in plan.

I disagree that introducing a permadiff is merely a cosmetic problem. A lot of my infrastructure (and some Google internal tools) are built around the idea of ensuring that Terraform doesn't see any drift in resources. As such, this permadiff makes it impossible to validate that a plan has been fully applied with no differences — instead our jobs just continuously run, valiantly attempting to converge the infrastructure.

Even though it's ultimately unlikely to break any infrastructure, I think a terraform plan which can never be resolved is a relatively big issue.

Also, for what it's worth, this does introduce some problems with attempting to update other properties of the bucket ACLs which seem to have an undocumented race condition. Since this resource attempts to update every time, I keep running into that race condition even though I'm not actually changing anything. (Still working to figure out if it's a bug in GCS or a bug in Terraform.)

paddycarver added a commit that referenced this pull request Jun 22, 2018
Add a CustomDiff function to storage bucket ACLs that will ignore a diff
if the config and state have the same role_entities, even if they're in
a different order.

Fixes #1525.
@paddycarver
Copy link
Contributor

I disagree that introducing a permadiff is merely a cosmetic problem. A lot of my infrastructure (and some Google internal tools) are built around the idea of ensuring that Terraform doesn't see any drift in resources. As such, this permadiff makes it impossible to validate that a plan has been fully applied with no differences — instead our jobs just continuously run, valiantly attempting to converge the infrastructure.

I think that's fair, and a point I hadn't considered. Thanks for raising it, I'll try to keep it in mind in the future. I think expecting Terraform to converge on a stable state is a reasonable expectation.

Even though it's ultimately unlikely to break any infrastructure, I think a terraform plan which can never be resolved is a relatively big issue.

I'd say I agree, but probably disagree about relative to what. For example, relative to a release the causes an error in a user's previously valid config, or worse, a release that causes a previously valid config to silently behave in a different way, I'd say a permadiff is the lesser of those three evils. It's something that should be resolved, for sure, but I don't think it necessarily outweighs the other issues at play, especially in the aftermath of a particularly disruptive release.

Also, for what it's worth, this does introduce some problems with attempting to update other properties of the bucket ACLs which seem to have an undocumented race condition. Since this resource attempts to update every time, I keep running into that race condition even though I'm not actually changing anything. (Still working to figure out if it's a bug in GCS or a bug in Terraform.)

That is bothersome! I think, personally, I'd prefer to resolve the race condition, regardless of what we do here, though I can appreciate this exacerbates the problem.

Fortunately, @ndmckinley suggested a way around this that avoided the breaking change and still resolved the permadiff situation today! I put together a PR for it, and opened #1692 to first of all make this fail in our tests, and second of all fix the problem.

@tmshn
Copy link
Contributor Author

tmshn commented Jun 22, 2018

Good discussion. I agree with both of you:

  • indexing ACL role_entities less likely to happen, so this seems practically not a braking change
  • permanent diff is an unignorable harmful bug, but still relatively less severe

Anyway, @paddycarver opened a new solution in #1692, so we can close or pend this until v2.

Is there any schedule for v2?

@paddycarver
Copy link
Contributor

paddycarver commented Jun 22, 2018

indexing ACL role_entities less likely to happen, so this seems practically not a braking change

I agree in theory, but we would also break anyone:

  • using count and element (potentially, there seems to be no guarantee order will be stable, then, resulting in a permadiff--which is exactly what we're trying to avoid) or using count with [#] or .# notation (as it's now keyed off hash, not off position)
  • using interpolation functions that operate on lists, potentially--how likely is it that someone's using concat to combine two groups of role_entities into one? I have no idea.

I think it's unlikely to break most configs, but how many configs are in that minority that would be broken, I don't feel confident on. I think it could be zero, but I also wouldn't be terribly surprised if it were non-zero.

Is there any schedule for v2?

We don't have any date for v2 yet, and are trying to coordinate it with other things, which makes it hard to predict with any accuracy. We are definitely in the planning stages for it, however.

@ghost
Copy link

ghost commented Nov 17, 2018

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants