-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial experimental support for deferred actions when count or for_each are unknown #34651
Conversation
30798ce
to
4193749
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good! None of my comments are of significance and shouldn't block merge. My focus in review was on a possible leak of the experimental behaviour, and I found none.
internal/instances/expander.go
Outdated
@@ -419,7 +419,7 @@ func (e *Expander) GetResourceInstanceRepetitionData(addr addrs.AbsResourceInsta | |||
} | |||
|
|||
// GetModuleCallInstanceKeys determines the child instance keys for one specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment's method name doesn't match the method below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll fix this one before merge just because it's not likely to cascade down onto other commits.
(For your other feedback, I'm going to keep it in mind for later work but leave it unchanged for now just to avoid churning this big pile of commits too much, since it's all experimental code we're explicitly intending to revisit anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For consistency I should really have named this method GetResourceInstanceKeys
, but since changing that would impact the caller from another commit I'm just noting that in a comment here so I can refer back to it while planning later work.)
// This must be called zero or one times before any other use of the receiver. | ||
// Changing this setting after a [Deferred] has already been used, or | ||
// concurrently with any other method call, will cause inconsistent and | ||
// undefined behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given these constraints, did you consider preventing mutation of this value, and instead requiring it to be set at construction time? Perhaps through a second constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did specifically consider adding an "opts" argument to NewDeferred
, but it seemed weird to complicate that signature for what is currently just a single boolean flag.
I suppose a separate constructor would be a viable approach too, but honestly I decided not to sweat it too much for now. It is true though that this design does make it possible for someone to accidentally misuse this at runtime after initial construction, and so might be worth revisiting this in a later iteration if it seems likely that we're going to continue following this approach once we've got feedback from the experiment.
// It's invalid to call this method for an address that was already reported | ||
// as deferred using [Deferred.ReportResourceInstanceDeferred], and so this | ||
// method will panic in that case. Callers should always test whether a resource | ||
// instance action should be deferred _before_ reporting that it has been. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little confused by this paragraph of commentary (and the corresponding one below before the panic). I think the contract we're enforcing here is that callers are required to call these two methods in a specific order for each resource instance, at most once each:
- First call
ShouldDeferResourceInstanceChanges
- If it returns true, then you must call
ReportResourceInstanceDeferred
The contract is to crash upon calling these in any other order, or any more often. That's a severe restriction, which may be entirely reasonable, but it was a bit hard for me to follow and I wonder if it's a possible maintenance hazard.
Assuming that I've understood what's happening, which is quite possibly not the case, I have two ideas:
-
Bundle these two methods into one, so that it's impossible to call them in the wrong order. Something like:
func (d *Deferred) ShouldDeferResourceInstanceChanges(addr addrs.AbsResourceInstance) bool { ret := d.shouldDeferResourceInstanceChanges(addr) if ret { d.reportResourceInstanceDeferred(addr) } return ret }
-
Or, update the comments throughout to clarify the expectation that these methods are always to be called in this specific order, as a pair, and only once per addr, because we require that the graph walk only evaluates deferrals once per instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is a little confusing, and I think a big part of what makes it confusing is that many different resource instances are potentially in play, and this constraint only applies for one specific resource instance address. In practice then, the sequence of calls is:
- Some upstream graph node earlier called
ReportResourceInstanceDeferred
to report that it has deferred a change, regardless of why that happened. - The current graph node then calls
ShouldDeferResourceInstanceChanges
with its own address, which causes this function to check whether that any of address's upstream dependencies were deferred. (In the case of this example, yes.) - The current graph node then responds to that result by calling
ReportResourceInstanceDeferred
with its own resource instance address. (This would then be the step 1 of a downstream incarnation of this sequence, if any)
Bundling the check in with the report is an interesting idea. I think we will still need a separate exported ReportResourceInstanceDeferred
because in the end this will not be the only reason for a resource instance graph node to need to report that it made a deferral decision: it would do so also if a provider responded to PlanResourceChange
by saying "I cannot plan this yet", once we get to that functionality in a later PR. But I admittedly cannot think of any reason why one would call SourceDeferResourceInstanceChanges
, get a true
response, and not immediately call ReportResourceInstanceDeferred
, so it does seem plausible to build it so that asking the question and performing the side-effect of reporting are an atomic action from the caller's perspective.
(I do typically try to avoid writing functions that are both accessors and mutators at the same time as far as that's possible to do so, but I do agree that in this case the separation seems more likely to encourage bugs than combining them would.)
// with "b:", even though the rest of the name is unknown. | ||
"name": cty.UnknownVal(cty.String).Refine(). | ||
StringPrefixFull("b:"). | ||
NotNull(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat idea for a test case! I had forgotten about the helpful overlap between refinements and deferred actions so seeing it called out here is great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I didn't actually explicitly intend to capture this here... it just emerged naturally from how I constructed the test fixture! But I did quite like to see it once I saw it, so I kept it. 😀
This method must hold a read lock while it does its work, or else it will race with calls to register the expansion mode for other module calls.
We allow experiments only in alpha builds, and so we propagate the flag for whether that's allowed in from "package main". We previously had that plumbed in only as far as the rpcapi startup. This plumbs the flag all the way into package stackeval so that we can in turn propagate it to Terraform's module config loader, which is ultimately the one responsible for ensuring that language experiments can be enabled only when the flag is set. Therefore it will now be possible to opt in to language experiments in modules that are used in stack components.
This is kinda awkward because this address type represents both resources whose own instances are not expanded yet and resources belonging to whole modules whose instance keys haven't been expanded yet, and those two cases have different address types. However, in return for this awkward API only for the rare case where we need to isolate the module instance address, the rest of the system gets to not worry very much about this distinction: it can share most code between the two cases, since they both need similar evaluation treatment anyway.
The internal API here was a little unclear, so this is just some retroactive additional documentation resulting from me having "relearned" how this API works after some time away from it, in the hope that future maintainers will not need to repeat that exercise.
When we defer planning actions that have direct side-effects -- that is, resource instance actions -- we need to remember that we did so because we then need to force deferring planning for any other resources downstream to preserve the required order of operations. That means we need to keep track of what we've already deferred so we can check whether a newly-visited resource instance must also be deferred. We also need to track deferrals for the purpose of reporting them as part of the plan presented to the end-user. This change _starts_ to implement that, but for now the public API is limited only to generating a boolean signal for whether there are any deferrals at all, because that's the minimum possible reporting that allows for correct behavior (so the caller can treat the resulting plan slight differently in the workflow). In future we'll also want to be able to return placeholder values for not-yet-expanded resource instances, but that's not fully built out here yet since for an initial implementation we can cheat and use cty.DynamicVal as a universal "we know nothing at all" placeholder. This API is likely to grow and change in forthcoming commits as we continue wiring these new behaviors into the Terraform modules runtime.
As with the various other context-tracking containers we use during graph walks, we'll need a deferring.Deferred object shared across all graph nodes' EvalContexts and with the expression evaluator so that we can report deferrals and then make decisions based on deferrals previously registered upstream. As of this commit we're not yet doing anything with this new object, but in future commits we'll use it to report what we've deferred and to "artifically" defer anything that has a side-effect and depends on something else that's already been deferred.
Usually language experiments have single-module scope, but this particular one can potentially force modules other than the one that opt in to do some behavior differently for correct results. To minimize the risk of the new experiment code impacting those who are not participating in the experiment, we'll want to branch into a new DynamicExpand codepath only when at least one module is participating in the experiment, but graph nodes alone have a more tightly-scoped view that prevents them from answering that question directly, and so we'll temporarily ask the ConfigTransformer to help in detecting that up top and propagating the result into all of the resource nodes. This commit doesn't yet make use of the new temporary field; that'll follow in a subsequent commit that introduces the new partial-expansion-aware alternative DynamicExpand codepath.
Previously we were incorrectly checking for the experiment being enabled on the global EvalContext, but that doesn't make sense because experiments are explicitly module-specific. Now we'll ask the question separately for each module so we can use the module-scoped EvalContext.
An earlier commit changed the EvalContext model to support evaluation in both fully-expanded and partial-expanded modules, but the provider-contributed functions feature landed semi-concurrently with it and so inadvertently introduced a codepath that only worked in the fully-expanded case. This will now handle both situations, since all we really need is the addrs.Module, which we can obtain in both modes.
For now this is just an up-top weird exception to minimize the impact on the code that's runs when not participating in the unknown_instances experiment. In future it would be nice to rework this GetResource method to rely on the ResourceInstanceKeys result even in the known case, since that will simplify this function and remove a some duplicate code, but our priority for now is minimizing the possibility that the code for this experiment can impact users who are not participating in the experiment.
When at least one module in the configuration is opted in to the unknown_instances language experiment, we'll now divert into an experimental adaptation of nodeExpandPlannableResource.DynamicExpand which knows how to deal with partial-expanded modules and resources. Whenever a partial-expanded resource prefix is detected, we'll make a special graph node for it which goes through some of the motions of planning but does so only really for the side-effect of checking the configuration for dynamic errors. We do also save the result as a placeholder value for the object for completeness, although it's not really used for much yet because the partial-expansion tracker saves this primarily for reporting information about the deferred changes as extra context in the plan, and we're not implementing that yet. This isn't yet quite right because we're not preserving enough information from plan phase to apply phase to correctly handle references to the resources that had deferred actions. However, since this is currently just an experiment anyway, this is a useful interim milestone that shows at least that things are functional enough to converge on the correct result in the end. We'll correct the evaluation misses (as documented in the TestContext2Apply_deferredActionsResourceForEach test) in subsequent commits, which will require tracking a little more information in the saved plan so that we can re-populate all of the deferral metadata when we start the apply phase.
When the modules runtime is being used inside the stacks runtime, it's possible that a component could refer to another component that has deferred changes in its plan. In that case, we do still want to plan the downstream component (to give earlier feedback if there are obvious problems with it) but we need to force all planned actions to be treated as deferred so that we preserve the correct dependency ordering across all objects described in a stack. This commit only deals with the modules runtime handling that case. We'll make the stacks runtime use it in a subsequent commit.
This is the bare minimum functionality to ensure that we defer all actions in any component that depends on a component that already had deferred actions. We will also eventually need to propagate out a signal to the caller for whether the stack plan as a whole is complete or incomplete, but we'll save that for later commits, since the stack orchestration in Terraform Cloud will do the right thing regardless, aside from the cosmetic concern that it won't yet know to show a message to the user saying that there are deferred changes.
4193749
to
268c6fb
Compare
Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch. |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
This PR represents the first real functional milestone toward solving #30937, in a very early experimental capacity.
There are still various details not quite right here yet, but this pile of commits has already got big enough so I think it's time to land it and then we can continue working on the details in subsequent smaller PRs.
The end result of all of this is that:
unknown_instances
language experiment, it's valid forcount
orfor_each
in amodule
orresource
block to be unknown. (That was actually mostly true before, but there was a bug making it not work for module calls, and turning it on before just caused the plan to be broken rather than doing anything useful.)resource
block that has an unknowncount
orfor_each
, or that is declared beneath a module call with unknowncount
orfor_each
, Terraform asks the provider to confirm that what we know of the configuration is plannable, but then just makes a note that all of the instances of this resource must have their actions deferred to a future round.My main priority in this first round was to keep the new codepaths as isolated as possible from the existing code, to minimize the risk of impacting the behavior for anyone not participating in the experiment. Therefore there are some rough edges and compromises called out in comments throughout. Since none of this new code should be reachable in stable releases, I'd ask that reviewers pardon my dust and trust that we'll continue improving this in future PRs.
With that said though: if you do see something that seems like it could break non-experimental usage, please call that out! (The fact that I needed to make very few changes to existing tests gives me confidence, but it can't hurt to double-check.)
The new test coverage here is also relatively light, and I apologize for that. My next step here is going to be to try this with a more realistic stack configuration from our set of examples from stacks private preview, and so that will be my primary testing vehicle immediately, and then I'll add more automated tests gradually in future PRs.
I tried to make up for the fact that this is a huge change by breaking it into digestible commits, so I suggest that reviewers take it on a commit-by-commit basis rather than trying to review the entire diff at once.