Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Separate plan and apply for output values #21762

Closed
wants to merge 2 commits into from

Conversation

apparentlymart
Copy link
Contributor

No description provided.

@apparentlymart apparentlymart force-pushed the b-destroy-cycle branch 2 times, most recently from f73c825 to 4214ef8 Compare June 20, 2019 22:07
This is a variant of Remove which adds new edges to ensure that everything
that was depending on the removed node for ordering will retain the same
relative ordering even after the node is removed.
@apparentlymart
Copy link
Contributor Author

This PR started off as trying to fix #21662, but it turned out that it wasn't really a 0.12 regression as such, but rather just that the ability to now depend on an entire module at once makes it easier to run into a limitation that already existed in prior versions: named values (local values, output values, input variables) represented as graph nodes don't play well in the destroy graph.

The test TestContext2Apply_destroyProvisionerWithMultipleLocals is a good illustration of why. It uses the following configuration:

locals {
  value  = "local"
  foo_id = aws_instance.foo.id

  // baz is not in the state during destroy, but this is a valid config that
  // should not fail.
  baz_id = aws_instance.baz.id
}

resource "aws_instance" "baz" {}

resource "aws_instance" "foo" {
  provisioner "shell" {
    id      = self.id
    command = local.value
    when    = "destroy"
  }
}

resource "aws_instance" "bar" {
  provisioner "shell" {
    id      = self.id
    command = local.foo_id
    when    = "destroy"
  }
}

With the code from this branch (which already includes changes intended to work towards fixing #21662, and is thus slightly different behavior than current master), this produces a graph containing a cycle between local.foo_id, aws_instance.foo, and aws_instance.bar.

The reason for this cycle is that the local.foo node is trying to do two jobs at once. The required sequence of operations to destroy with this configuration would be:

  • Evaluate local.foo_id while aws_instance.foo still exists, to set it to the id of the existing aws_instance.foo object.
  • Run aws_instance.bar's destroy-time provisioner, which uses local.foo_id expecting that aws_instance.foo still exists.
  • Destroy aws_instance.bar.
  • Run aws_instance.foo's destroy-time provisioner.
  • Destroy aws_instance.foo.
  • Evaluate local.foo_id again, reflecting the fact that aws_instance.foo no longer exists and thus local.foo_id is now undefined.

The fact that this process both starts end ends with evaluating local.foo_id is the reason for the cycle: the only mechanism we have for evaluating a local value is to include a graph node for it, and currently we can produce only one graph node per local value and thus it ends up trying to represent both use-cases.


Having traced all that through, this then looks a lot like the problem in #11716: that one is discussing a different variant of this problem involving output values. That one didn't create a cycle because create_before_destroy was set on the resources, and so it just produced an incorrect result instead (the output node evaluated too early).

That brings me back to the idea I was considering in that other issue: it's feeling to me like named values shouldn't be nodes in the main graph at all, but rather should be re-evaluated immediately after any change to something they refer to.

In #11716 (when I was focused only on outputs) I was thinking about a brute-force approach of just re-evaluating all the outputs after any resource change as a simple initial fix. However, once we understand the problem to actually include all named values (local values, output values, and input variables) that is no longer sufficient: these objects can have references to each other, so they do need to be in a graph, even if that isn't the main one.

I think my next step for this investigation, then, would be to prototype something like the following:

  • Alongside the main graph, construct also a references-only graph where all of the nodes are of this type:

    type GraphNodeReferenceableObject struct {
        // Module and Addr together represent the absolute address of this
        // object, which can be referenced by other objects that have a
        // RefsModule equal to this Module and have Addr in their Refs
        // slice.
        Module addrs.ModuleInstance
        Addr   addrs.Referenceable
    
        // RefsModule the module where the addresses in Refs are resolved
        RefsModule addrs.ModuleInstance
        // Refs are the addresses this object refers to in its config expressions
        Refs       []addrs.Referenceable
    }

    This graph is also required to be a DAG and contains only the sort of edges produced by terraform.ReferenceTransformer; it intentionally excludes all of the other various edges the normal graph would include, like provider relationships, destroy-time edges, etc.

    Unlike a typical ReferenceTransformer result, it only needs to consider references from named values, because references between resources and other objects are already represented in the main graph.

  • As a final step in the construction of the main graph, remove any nodes that represent named values while preserving the connectivity of the nodes around, using the RemoveNodesTransformer added in this PR.

  • After evaluating any node in the main graph (or a dynamic subgraph thereof) that represents a resource instance, update all of the affected named values before proceeding to the next node:

    • Locate in the references-only graph the nodes for the resource and resource instance whose instance was modified. These will be the start nodes for a secondary graph walk.
    • Walk the references-only graph, evaluating those start nodes and anything that depends on them in topological order. (This could potentially just be a conventional single-threaded graph walk rather than our usual parallel walk, because we know everything we visit here will just be relatively-fast local compute, as opposed to network requests.)

In effect, this separates the operations that have side effects from the values derived from them so that the same object can have multiple side-effects applied to it during the main walk and yet we can keep the derived values in sync after each step.

I have exhausted all the time I have for this right now since I'm about to be on vacation for a while; the above is mostly just an aid to reloading all this context again when I'm next able to spend time on this.

@mark-00
Copy link

mark-00 commented Jun 27, 2019

We have a lot of difficulties with cycle errors when using "terraform destroy"
using the 2 step approach to make a destroy plan and apply it works for us.

I 'm a bit worried about the second dependencies graph you are about to build I don' t fully understand it but allow me to explain a pattern we are often using so you can check if it works in the new setup. We use it to express that a module is dependent of some other resources.

When we call a module we use the role_depends_on attribute to provide a list of resources

module "gkeshared01_euwest4_dev" {
  source = "./modules/uzl_google_gke_private"
  ...
  role_depends_on = [ google_project_iam_member.iam_sa_robot_onhost_opsdev, google_project_iam_member.iam_sa_gke1_robot_onhost_opsdev_2,   google_project_iam_member.iam_sa_robot_onhost_moaprdev,google_project_iam_member.iam_sa_gke1_robot_onhost_moaprdev_2]
}

In the module we create a null resoure that is referencing all of these resources

resource "null_resource" "prerequisites" {
   triggers =  {
     role_depends_on = jsonencode(var.role_depends_on)
   }
}

and we use this in the depends_on attribute of other resources in that module

resource "google_kms_crypto_key" "key_gke" {
  ....
  depends_on = [ null_resource.prerequisites ]
}

@ghost
Copy link

ghost commented Oct 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 29, 2020
@pselle pselle deleted the b-destroy-cycle branch March 4, 2021 20:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants