Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dirty collection attribute refresh #2065

Closed
wants to merge 76 commits into from

Conversation

VenelinMartinov
Copy link
Contributor

@VenelinMartinov VenelinMartinov commented Jun 6, 2024

This fixes an issue with empty collection attributes refreshing dirty under PlanResourceChange. Note that this is an issue present in TF as well.
It comes from the fact that some (most?) providers don't distinguish between null and empty collection values, for example both AWS and GCP labels have this issue.

We've opted for a targeted fix in Refresh - this maintains our TF-conformance on Create/Update.
The fix is that on refresh we go over the read values and compare with the previous state. Any properties which:

  1. Are a TF attribute
  2. Are a collection
  3. Are null or empty
  4. Are different to the previous state

get "normalized" to what was in the state. This avoids us displaying excessive dirty refreshes for these values.

Note that the actual fix is from #2073 and due to @t0yv0
A lot of the commit history here is from #2072 which separated the tests from here to facilitate easier reviews. I should have merged here before deleting 🤷

Fixes #2047
Fixes #1967

@VenelinMartinov VenelinMartinov changed the title Vvm/dirty map refresh repro Dirty map refresh repro Jun 6, 2024
@VenelinMartinov VenelinMartinov changed the base branch from master to vvm/pulumiTest_integration_tests June 6, 2024 12:05
@VenelinMartinov
Copy link
Contributor Author

Found a regression caused by this change: pulumi/pulumi-gcp#2079 (comment)

Without the change the Firewall resource refreshes dirty: gcp:compute:Firewall ts-firewall updated (0.48s) [diff: +sourceRanges,sourceServiceAccounts,targetServiceAccounts,targetTags]

But that's what happens in TF too:

  # google_compute_firewall.py_firewall has changed
  ~ resource "google_compute_firewall" "py_firewall" {
        id                      = "projects/pulumi-development/global/firewalls/py-firewall"
        name                    = "py-firewall"
      + source_ranges           = []
      + source_service_accounts = []
      + target_service_accounts = []
      + target_tags             = []
        # (10 unchanged attributes hidden)

        # (2 unchanged blocks hidden)
    }

With the change it still refreshes dirty but in a different parameter.

@t0yv0
Copy link
Member

t0yv0 commented Jun 12, 2024

The change intentionally deviates from TF behavior so that by itself is not a problem.

With the change it still refreshes dirty but in a different parameter.

Curious any more details here?

@VenelinMartinov
Copy link
Contributor Author

VenelinMartinov commented Jun 12, 2024

pulumi/pulumi-gcp#2079 (comment) has the Read GRPC call. It's in the allows property, the member with "icmp" - ports changes from [] to nil in the Read.

ports there is not set on the inputs but is returned in the Create state - I think we override that.

Yeah:

{
    "method": "/pulumirpc.ResourceProvider/Create",
    "request": {
        "urn": "urn:pulumi:p-it-venelins-m-webserver-69654475::webserver::gcp:compute/firewall:Firewall::ts-firewall",
        "properties": {
            "__defaults": [
                "name",
                "priority"
            ],
            "allows": [
                {
                    "__defaults": [],
                    "protocol": "icmp"
                },
                {
                    "__defaults": [],
                    "ports": [
                        "22",
                        "80"
                    ],
                    "protocol": "tcp"
                }
            ],
            "name": "ts-firewall-d80a454",
            "network": "04da6b54-80e4-46f7-96ec-b56ff0331ba9",
            "priority": 1000,
            "project": "pulumi-development",
            "sourceTags": [
                "foo"
            ]
        },
        "preview": true
    },
    "response": {
        "properties": {
            "__meta": "{\"_new_extra_shim\":{},\"e2bfb730-ecaa-11e6-8f88-34363bc7c4c0\":{\"create\":1200000000000,\"delete\":1200000000000,\"update\":1200000000000}}",
            "allows": [
                {
                    "ports": [
                        "22",
                        "80"
                    ],
                    "protocol": "tcp"
                },
                {
                    "ports": [],
                    "protocol": "icmp"
                }
            ],
            "creationTimestamp": "04da6b54-80e4-46f7-96ec-b56ff0331ba9",
            "denies": [],
            "description": null,
            "destinationRanges": "04da6b54-80e4-46f7-96ec-b56ff0331ba9",
            "direction": "04da6b54-80e4-46f7-96ec-b56ff0331ba9",
            "disabled": null,
            "enableLogging": "04da6b54-80e4-46f7-96ec-b56ff0331ba9",
            "id": "04da6b54-80e4-46f7-96ec-b56ff0331ba9",
            "logConfig": null,
            "name": "ts-firewall-d80a454",
            "network": "04da6b54-80e4-46f7-96ec-b56ff0331ba9",
            "priority": 1000,
            "project": "pulumi-development",
            "selfLink": "04da6b54-80e4-46f7-96ec-b56ff0331ba9",
            "sourceRanges": null,
            "sourceServiceAccounts": null,
            "sourceTags": [
                "foo"
            ],
            "targetServiceAccounts": null,
            "targetTags": null
        }
    },
    "metadata": {
        "kind": "resource",
        "mode": "client",
        "name": "gcp"
    }
}

@VenelinMartinov
Copy link
Contributor Author

I guess given enough provider examples any behaviour here will be wrong somewhere - I'll try to find the exact opposite behaviour of this, you must have hit it in AWS

@t0yv0
Copy link
Member

t0yv0 commented Jun 12, 2024

I'd like to debug the GCP example. Setting this up shortly.

@VenelinMartinov
Copy link
Contributor Author

VenelinMartinov commented Jun 12, 2024

Ah, I ran it with --diff:

    ~ gcp:compute/firewall:Firewall: (update)
        [id=projects/pulumi-development/global/firewalls/py-firewall-639a397]
        [urn=urn:pulumi:dev::webserver-py::gcp:compute/firewall:Firewall::py-firewall]
        [provider=urn:pulumi:dev::webserver-py::pulumi:providers:gcp::default_7_25_0::28550d5a-7d52-492a-b894-a7f38e268c2b]
        --outputs:--
      ~ allows               : [
          ~ [0]: {
                  ~ ports   : [
                      + [0]: "22"
                      + [1]: "80"
                    ]
                  ~ protocol: "icmp" => "tcp"
                }
          ~ [1]: {
                  - ports   : [
                  -     [0]: "22"
                  -     [1]: "80"
                    ]
                  ~ protocol: "tcp" => "icmp"
                }
        ]
Resources:
    ~ 1 to update
    3 unchanged

This is likely a reordering issue, not a problem with the values.

@t0yv0
Copy link
Member

t0yv0 commented Jun 12, 2024

I've had a look and I know what's wrong with the GCP example.

The correction kicks in with this:

  REPLACING allow.$.ports FROM cty.ListValEmpty(cty.String) TO cty.NullVal(cty.List(cty.String))

But what's unexpected here is that the state value is actually cty.ListValEmpty(cty.String) so this should not apply.

How does this happen? Turns out "github.com/hashicorp/go-cty/cty" is buggy around sets:

cty.Path apply expects only ints or strings but cty.Transform injects entire set element values into IndexStep. Min repro:

func TestSimpleSet(t *testing.T) {
	vtop := cty.SetVal([]cty.Value{
		cty.ObjectVal(map[string]cty.Value{
			"foo": cty.StringVal("bar"),
		}),
	})

	cty.Transform(vtop, func(p cty.Path, v cty.Value) (cty.Value, error) {
		vv, err := p.Apply(vtop)
		if err != nil || vv.Equals(v).False() {
			t.Logf("err %v", err)
			t.Logf("path %v", p)
		}
		return v, nil
	})
}

As a result, looking for a matching state element under the path fails to find one, and assumes it is missing or explicit null.

This is not good enough:

	matchingOldStateValue := func(t cty.Type, p cty.Path) cty.Value {
		if v, err := p.Apply(oldState); err == nil {
			return v
		}
		return cty.NullVal(t)
	}

@t0yv0
Copy link
Member

t0yv0 commented Jun 12, 2024

zclconf/go-cty#180 is where it's bottoming out. As a workaround we can skip paths that contain set elements and just bail for this transformation. We can bail on error from p.Apply. Or we can rewrite p.Apply into a "better" version.

Comment on lines 47 to 48
switch int(blk.Nesting) {
case 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we casting to an int here?

Suggested change
switch int(blk.Nesting) {
case 1:
switch blk.Nesting {
case NestingModeSingle:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model is internal and not accessible, but looks like we have a copy of the model we can use.

"github.com/pulumi/pulumi/sdk/v3/go/common/util/contract"
)

func findSchemaContext(topLevel *schema.Resource, path walk.SchemaPath) schemaContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to expose nesting level info to shim.*. Nothing jumps out to me.

@t0yv0 t0yv0 requested review from lunaris and lukehoban June 13, 2024 14:08
@t0yv0
Copy link
Member

t0yv0 commented Jun 13, 2024

We're weighing now whether to include this change or not. This introduces an intentional deviation in Read results for Pulumi bridged providers vs TF baseline to compensate for old normalizeNullValues code in TF that intentionally confuses empty and null collections leading to dirty refresh in Pulumi. With this change the Read result is modified to match the state and avoid having a diff. Under the new CLI https://github.com/pulumi/pulumi/releases/tag/v3.120.0 some (or all?) these dirty refresh scenarios may refresh cleanly.

Should we still include?

@t0yv0
Copy link
Member

t0yv0 commented Jun 13, 2024

Reasons to include: the change is fairly surgical and well motivated, hedge against any subsequent changes in the CLI

Reasons not to include: any change carries some unforeseen risk, having an intentional difference from TF behavior creates more maintenance

Information that could help decide:

@t0yv0
Copy link
Member

t0yv0 commented Jun 13, 2024

@mjeffryes sounds like you had some input here.

@mjeffryes
Copy link
Member

Sure.

  • Change pulumi refresh to report diff relative to desired state instead of relative to only output changes  pulumi#16146 has landed and appears to address all our test failure for "dirty refresh" and should mitigate that problem for users. It does change the outputs in state, so that means that resources that depend on that output might try to update after a "clean" refresh (assuming the null/empty distinction matters to the dependent resource).

  • The normalization change we've prepared would completely hide when upstream returns a nil instead of empty or vice-versa, so it would suppress updates to dependent resources even if the difference between null and empty was somehow significant.

  • It's honestly not clear how often this difference in behavior is actually significant, but the behavior without normalization is both more transparent to the user and more consistent with the upstream TF provider. Hence, I'm inclined to try rolling out PRC without normalization and see if users encounter any real situations where they wish we would have normalized to suppress the update.

@VenelinMartinov
Copy link
Contributor Author

VenelinMartinov commented Jun 14, 2024

Running RPC downstream tests without this PR here: #1962 (comment)

Will report back if any tests need this change.

EDIT: Quite a few failures in AWS related to tags: https://github.com/pulumi/pulumi-aws/actions/runs/9515746758/job/26232504775?pr=4078

Also failures in cloudflare related to tags: https://github.com/pulumi/pulumi-cloudflare/actions/runs/9515335359/job/26229764573?pr=813

EDIT2: The AWS failures were because it hasn't yet picked up the new CLI version. After testing in pulumi/pulumi-aws#4081, it looks good.

@VenelinMartinov
Copy link
Contributor Author

VenelinMartinov commented Jun 14, 2024

Also note that under the new refresh behaviour the tests fixed in this PR no longer fail without it: #2100. If we were to merge it we need to add test coverage for the fix.

EDIT: I'll try to pull the downstream test failures into tests here.

@t0yv0
Copy link
Member

t0yv0 commented Jun 14, 2024

Yes sounds like these things got fixed in the CLI so perhaps this PR is not necessary anymore. Is that too simplistic of a take?

@VenelinMartinov
Copy link
Contributor Author

Closing this as the issues this meant to fix are fixed by the new CLI refresh.

PlanResourceChange passes downstream tests without this, so let's revisit if we see any issues related to collections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PlanResourceChange dirty refresh on empty maps PlanResourceChange undesirable updates
4 participants