Initial experimental support for deferred actions when count or for_each are unknown #34651

apparentlymart · 2024-02-10T02:34:52Z

This PR represents the first real functional milestone toward solving #30937, in a very early experimental capacity.

There are still various details not quite right here yet, but this pile of commits has already got big enough so I think it's time to land it and then we can continue working on the details in subsequent smaller PRs.

The end result of all of this is that:

In any module that's opted in to the unknown_instances language experiment, it's valid for count or for_each in a module or resource block to be unknown. (That was actually mostly true before, but there was a bug making it not work for module calls, and turning it on before just caused the plan to be broken rather than doing anything useful.)
For any resource block that has an unknown count or for_each, or that is declared beneath a module call with unknown count or for_each, Terraform asks the provider to confirm that what we know of the configuration is plannable, but then just makes a note that all of the instances of this resource must have their actions deferred to a future round.
For any resource that refers to a resource that's already been deferred for another reason, all of the planning actions run as normal but then the result isn't actually added to the plan, and instead Terraform again just makes a note that it's been deferred to a future round.
When a module used as part of a component in a stack produces any deferred actions, any downstream components have all of their actions forced deferred, to ensure that we preserve the dependency ordering between components. However, the planning actions do otherwise still run as normal, so that operators can get feedback if we can prove that anything about the configuration is invalid with the information available.

My main priority in this first round was to keep the new codepaths as isolated as possible from the existing code, to minimize the risk of impacting the behavior for anyone not participating in the experiment. Therefore there are some rough edges and compromises called out in comments throughout. Since none of this new code should be reachable in stable releases, I'd ask that reviewers pardon my dust and trust that we'll continue improving this in future PRs.

With that said though: if you do see something that seems like it could break non-experimental usage, please call that out! (The fact that I needed to make very few changes to existing tests gives me confidence, but it can't hurt to double-check.)

The new test coverage here is also relatively light, and I apologize for that. My next step here is going to be to try this with a more realistic stack configuration from our set of examples from stacks private preview, and so that will be my primary testing vehicle immediately, and then I'll add more automated tests gradually in future PRs.

I tried to make up for the fact that this is a huge change by breaking it into digestible commits, so I suggest that reviewers take it on a commit-by-commit basis rather than trying to review the entire diff at once.

alisdair

Seems good! None of my comments are of significance and shouldn't block merge. My focus in review was on a possible leak of the experimental behaviour, and I found none.

alisdair · 2024-02-12T15:36:41Z

internal/instances/expander.go

@@ -419,7 +419,7 @@ func (e *Expander) GetResourceInstanceRepetitionData(addr addrs.AbsResourceInsta
 }

 // GetModuleCallInstanceKeys determines the child instance keys for one specific


This comment's method name doesn't match the method below.

Thanks! I'll fix this one before merge just because it's not likely to cascade down onto other commits.

(For your other feedback, I'm going to keep it in mind for later work but leave it unchanged for now just to avoid churning this big pile of commits too much, since it's all experimental code we're explicitly intending to revisit anyway.)

(For consistency I should really have named this method GetResourceInstanceKeys, but since changing that would impact the caller from another commit I'm just noting that in a comment here so I can refer back to it while planning later work.)

alisdair · 2024-02-12T17:47:40Z

internal/plans/deferring/deferred.go

+// This must be called zero or one times before any other use of the receiver.
+// Changing this setting after a [Deferred] has already been used, or
+// concurrently with any other method call, will cause inconsistent and
+// undefined behavior.


Given these constraints, did you consider preventing mutation of this value, and instead requiring it to be set at construction time? Perhaps through a second constructor?

I did specifically consider adding an "opts" argument to NewDeferred, but it seemed weird to complicate that signature for what is currently just a single boolean flag.

I suppose a separate constructor would be a viable approach too, but honestly I decided not to sweat it too much for now. It is true though that this design does make it possible for someone to accidentally misuse this at runtime after initial construction, and so might be worth revisiting this in a later iteration if it seems likely that we're going to continue following this approach once we've got feedback from the experiment.

alisdair · 2024-02-12T18:05:44Z

internal/plans/deferring/deferred.go

+// It's invalid to call this method for an address that was already reported
+// as deferred using [Deferred.ReportResourceInstanceDeferred], and so this
+// method will panic in that case. Callers should always test whether a resource
+// instance action should be deferred _before_ reporting that it has been.


I was a little confused by this paragraph of commentary (and the corresponding one below before the panic). I think the contract we're enforcing here is that callers are required to call these two methods in a specific order for each resource instance, at most once each:

First call ShouldDeferResourceInstanceChanges

If it returns true, then you must call ReportResourceInstanceDeferred

The contract is to crash upon calling these in any other order, or any more often. That's a severe restriction, which may be entirely reasonable, but it was a bit hard for me to follow and I wonder if it's a possible maintenance hazard.

Assuming that I've understood what's happening, which is quite possibly not the case, I have two ideas:

Bundle these two methods into one, so that it's impossible to call them in the wrong order. Something like:

func (d *Deferred) ShouldDeferResourceInstanceChanges(addr addrs.AbsResourceInstance) bool { ret := d.shouldDeferResourceInstanceChanges(addr) if ret { d.reportResourceInstanceDeferred(addr) } return ret }

Or, update the comments throughout to clarify the expectation that these methods are always to be called in this specific order, as a pair, and only once per addr, because we require that the graph walk only evaluates deferrals once per instance.

I agree that this is a little confusing, and I think a big part of what makes it confusing is that many different resource instances are potentially in play, and this constraint only applies for one specific resource instance address. In practice then, the sequence of calls is:

Some upstream graph node earlier called ReportResourceInstanceDeferred to report that it has deferred a change, regardless of why that happened.

The current graph node then calls ShouldDeferResourceInstanceChanges with its own address, which causes this function to check whether that any of address's upstream dependencies were deferred. (In the case of this example, yes.)

The current graph node then responds to that result by calling ReportResourceInstanceDeferred with its own resource instance address. (This would then be the step 1 of a downstream incarnation of this sequence, if any)

Bundling the check in with the report is an interesting idea. I think we will still need a separate exported ReportResourceInstanceDeferred because in the end this will not be the only reason for a resource instance graph node to need to report that it made a deferral decision: it would do so also if a provider responded to PlanResourceChange by saying "I cannot plan this yet", once we get to that functionality in a later PR. But I admittedly cannot think of any reason why one would call SourceDeferResourceInstanceChanges, get a true response, and not immediately call ReportResourceInstanceDeferred, so it does seem plausible to build it so that asking the question and performing the side-effect of reporting are an atomic action from the caller's perspective.

(I do typically try to avoid writing functions that are both accessors and mutators at the same time as far as that's possible to do so, but I do agree that in this case the separation seems more likely to encourage bugs than combining them would.)

alisdair · 2024-02-12T18:30:05Z

internal/terraform/context_apply2_test.go

+				// with "b:", even though the rest of the name is unknown.
+				"name": cty.UnknownVal(cty.String).Refine().
+					StringPrefixFull("b:").
+					NotNull().


This is a neat idea for a test case! I had forgotten about the helpful overlap between refinements and deferred actions so seeing it called out here is great.

Honestly I didn't actually explicitly intend to capture this here... it just emerged naturally from how I constructed the test fixture! But I did quite like to see it once I saw it, so I kept it. 😀

This method must hold a read lock while it does its work, or else it will race with calls to register the expansion mode for other module calls.

We allow experiments only in alpha builds, and so we propagate the flag for whether that's allowed in from "package main". We previously had that plumbed in only as far as the rpcapi startup. This plumbs the flag all the way into package stackeval so that we can in turn propagate it to Terraform's module config loader, which is ultimately the one responsible for ensuring that language experiments can be enabled only when the flag is set. Therefore it will now be possible to opt in to language experiments in modules that are used in stack components.

This is kinda awkward because this address type represents both resources whose own instances are not expanded yet and resources belonging to whole modules whose instance keys haven't been expanded yet, and those two cases have different address types. However, in return for this awkward API only for the rare case where we need to isolate the module instance address, the rest of the system gets to not worry very much about this distinction: it can share most code between the two cases, since they both need similar evaluation treatment anyway.

The internal API here was a little unclear, so this is just some retroactive additional documentation resulting from me having "relearned" how this API works after some time away from it, in the hope that future maintainers will not need to repeat that exercise.

When we defer planning actions that have direct side-effects -- that is, resource instance actions -- we need to remember that we did so because we then need to force deferring planning for any other resources downstream to preserve the required order of operations. That means we need to keep track of what we've already deferred so we can check whether a newly-visited resource instance must also be deferred. We also need to track deferrals for the purpose of reporting them as part of the plan presented to the end-user. This change _starts_ to implement that, but for now the public API is limited only to generating a boolean signal for whether there are any deferrals at all, because that's the minimum possible reporting that allows for correct behavior (so the caller can treat the resulting plan slight differently in the workflow). In future we'll also want to be able to return placeholder values for not-yet-expanded resource instances, but that's not fully built out here yet since for an initial implementation we can cheat and use cty.DynamicVal as a universal "we know nothing at all" placeholder. This API is likely to grow and change in forthcoming commits as we continue wiring these new behaviors into the Terraform modules runtime.

As with the various other context-tracking containers we use during graph walks, we'll need a deferring.Deferred object shared across all graph nodes' EvalContexts and with the expression evaluator so that we can report deferrals and then make decisions based on deferrals previously registered upstream. As of this commit we're not yet doing anything with this new object, but in future commits we'll use it to report what we've deferred and to "artifically" defer anything that has a side-effect and depends on something else that's already been deferred.

Usually language experiments have single-module scope, but this particular one can potentially force modules other than the one that opt in to do some behavior differently for correct results. To minimize the risk of the new experiment code impacting those who are not participating in the experiment, we'll want to branch into a new DynamicExpand codepath only when at least one module is participating in the experiment, but graph nodes alone have a more tightly-scoped view that prevents them from answering that question directly, and so we'll temporarily ask the ConfigTransformer to help in detecting that up top and propagating the result into all of the resource nodes. This commit doesn't yet make use of the new temporary field; that'll follow in a subsequent commit that introduces the new partial-expansion-aware alternative DynamicExpand codepath.

Previously we were incorrectly checking for the experiment being enabled on the global EvalContext, but that doesn't make sense because experiments are explicitly module-specific. Now we'll ask the question separately for each module so we can use the module-scoped EvalContext.

An earlier commit changed the EvalContext model to support evaluation in both fully-expanded and partial-expanded modules, but the provider-contributed functions feature landed semi-concurrently with it and so inadvertently introduced a codepath that only worked in the fully-expanded case. This will now handle both situations, since all we really need is the addrs.Module, which we can obtain in both modes.

For now this is just an up-top weird exception to minimize the impact on the code that's runs when not participating in the unknown_instances experiment. In future it would be nice to rework this GetResource method to rely on the ResourceInstanceKeys result even in the known case, since that will simplify this function and remove a some duplicate code, but our priority for now is minimizing the possibility that the code for this experiment can impact users who are not participating in the experiment.

When at least one module in the configuration is opted in to the unknown_instances language experiment, we'll now divert into an experimental adaptation of nodeExpandPlannableResource.DynamicExpand which knows how to deal with partial-expanded modules and resources. Whenever a partial-expanded resource prefix is detected, we'll make a special graph node for it which goes through some of the motions of planning but does so only really for the side-effect of checking the configuration for dynamic errors. We do also save the result as a placeholder value for the object for completeness, although it's not really used for much yet because the partial-expansion tracker saves this primarily for reporting information about the deferred changes as extra context in the plan, and we're not implementing that yet. This isn't yet quite right because we're not preserving enough information from plan phase to apply phase to correctly handle references to the resources that had deferred actions. However, since this is currently just an experiment anyway, this is a useful interim milestone that shows at least that things are functional enough to converge on the correct result in the end. We'll correct the evaluation misses (as documented in the TestContext2Apply_deferredActionsResourceForEach test) in subsequent commits, which will require tracking a little more information in the saved plan so that we can re-populate all of the deferral metadata when we start the apply phase.

When the modules runtime is being used inside the stacks runtime, it's possible that a component could refer to another component that has deferred changes in its plan. In that case, we do still want to plan the downstream component (to give earlier feedback if there are obvious problems with it) but we need to force all planned actions to be treated as deferred so that we preserve the correct dependency ordering across all objects described in a stack. This commit only deals with the modules runtime handling that case. We'll make the stacks runtime use it in a subsequent commit.

This is the bare minimum functionality to ensure that we defer all actions in any component that depends on a component that already had deferred actions. We will also eventually need to propagate out a signal to the caller for whether the stack plan as a whole is complete or incomplete, but we'll save that for later commits, since the stack orchestration in Terraform Cloud will do the right thing regardless, aside from the cosmetic concern that it won't yet know to show a message to the user saying that there are deferred changes.

github-actions · 2024-02-12T20:13:23Z

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

github-actions · 2024-03-14T01:59:27Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

apparentlymart added enhancement core stacks Relating to the Stacks features labels Feb 10, 2024

apparentlymart requested a review from a team February 10, 2024 02:34

apparentlymart self-assigned this Feb 10, 2024

apparentlymart force-pushed the f-deferred-actions-resources branch from 30798ce to 4193749 Compare February 10, 2024 02:52

vercel bot deployed to Preview February 10, 2024 02:56 View deployment

alisdair approved these changes Feb 12, 2024

View reviewed changes

apparentlymart added 13 commits February 12, 2024 12:05

instances: Fix data race in Expander.ExpandAbsModuleCall

15cb549

This method must hold a read lock while it does its work, or else it will race with calls to register the expansion mode for other module calls.

apparentlymart force-pushed the f-deferred-actions-resources branch from 4193749 to 268c6fb Compare February 12, 2024 20:05

vercel bot deployed to Preview February 12, 2024 20:10 View deployment

apparentlymart merged commit e6665c4 into main Feb 12, 2024
6 checks passed

apparentlymart deleted the f-deferred-actions-resources branch February 12, 2024 20:13

apparentlymart mentioned this pull request Feb 29, 2024

terraform: BuiltinEvalContext function eval in partial-expanded module #34650

Closed

github-actions bot locked as resolved and limited conversation to collaborators Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial experimental support for deferred actions when count or for_each are unknown #34651

Initial experimental support for deferred actions when count or for_each are unknown #34651

apparentlymart commented Feb 10, 2024

alisdair left a comment

alisdair Feb 12, 2024

apparentlymart Feb 12, 2024

apparentlymart Feb 12, 2024

alisdair Feb 12, 2024

apparentlymart Feb 12, 2024

alisdair Feb 12, 2024

apparentlymart Feb 12, 2024

alisdair Feb 12, 2024

apparentlymart Feb 12, 2024

github-actions bot commented Feb 12, 2024

github-actions bot commented Mar 14, 2024

		@@ -419,7 +419,7 @@ func (e *Expander) GetResourceInstanceRepetitionData(addr addrs.AbsResourceInsta
		}

		// GetModuleCallInstanceKeys determines the child instance keys for one specific

Initial experimental support for deferred actions when count or for_each are unknown #34651

Initial experimental support for deferred actions when count or for_each are unknown #34651

Conversation

apparentlymart commented Feb 10, 2024

alisdair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 12, 2024

github-actions bot commented Mar 14, 2024