Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] {r,d}/tfe_workspace_run #534

Closed
drewmullen opened this issue Jun 22, 2022 · 13 comments
Closed

[FR] {r,d}/tfe_workspace_run #534

drewmullen opened this issue Jun 22, 2022 · 13 comments

Comments

@drewmullen
Copy link
Contributor

drewmullen commented Jun 22, 2022

Use-cases

We have situations where we create multiple workspaces based on the same repo to deploy the same configuration in many aws regions. when the workspace is created it automatically kicks off an apply which sometimes fails due to race condition with adding the variable set to the workspace.

Attempted Solutions

n/a

Proposal

Create a new resource and a data source that can trigger an apply on a workspace. The resource is useful for situations where you only want to apply once. The data source fire an apply every time the root is run

resource "tfe_workspace_trigger_apply" "initial_apply" {
  workspace_id = tfe_workspace.mine.id
  refresh      = true
  auto_apply   = true

  # proposed new argument
  # whether resource completion should wait for the apply to be complete (successful) or should just fire the job and continue on
  wait_for_completion = bool
  

  depends_on = [
    tfe_variable_set_workspace_attachment.test.id
  ]
}

data "tfe_workspace_trigger_apply" "reoccuring_apply" {
  ...

  depends_on = [
    tfe_variable_set_workspace_attachment.test.id
  ]
}

Links:

@brandonc
Copy link
Collaborator

@drewmullen Hey, Drew. Thanks for the recent interesting contribution and FR! I'm thinking through the use case that you described, specifically:

when the workspace is created it automatically kicks off an apply

Can you give me a little more detail about how this works? When an auto-apply, VCS-connected workspace is created, my understanding was that you have to trigger the very first run and that it doesn't happen automatically. Is this run triggered in another way? Perhaps with workspace run triggers?

@drewmullen
Copy link
Contributor Author

drewmullen commented Jun 22, 2022

Hi @brandonc !

So in my case I have a set of workspaces that each connect to a VCS thats pre-registered in my organization. when it does, the apply is launched. i assume its auto launching because i have file_triggered_enabled = true but im not 100% sure. I'm do not think im using any of the newer trigger features. Still need to investigate those

heres an example of the workspace state:

$ terraform state show 'module.multi_region_deployment.tfe_workspace.main["eastcoast"]'

# module.multi_region_deployment.tfe_workspace.main["eastcoast"]:
resource "tfe_workspace" "main" {
    allow_destroy_plan            = true
    auto_apply                    = true
    execution_mode                = "remote"
    file_triggers_enabled         = true
    global_remote_state           = false
    id                            = <>
    name                          = "eastcoast"
    operations                    = true
    organization                  = <>
    queue_all_runs                = true
    remote_state_consumer_ids     = []
    speculative_enabled           = true
    structured_run_output_enabled = true
    tag_names                     = []
    terraform_version             = "1.2.3"
    trigger_prefixes              = []

    vcs_repo {
        branch             = <>
        identifier         = <>
        ingress_submodules = false
        oauth_token_id     = <>
    }
}

@alexbasista
Copy link

alexbasista commented Jun 22, 2022

I believe the Run is automatically triggered upon Workspace creation because queue_all_runs is set to true

https://registry.terraform.io/providers/hashicorp/tfe/latest/docs/resources/workspace#queue_all_runs

@drewmullen
Copy link
Contributor Author

Thanks @alexbasista I appreciate you pointing that out! If this proposal is approved then I probably would flip that to false on new workspaces and let this new resource perform the first "manual" apply

Any thoughts from the developers on if this kind of PR would be accepted?

@brandonc
Copy link
Collaborator

I have a superficial objection to this specific proposal because it highlights the resource as a side effect (triggering a run on a workspace) and not really a resource under management. It's not clear when or how the resource as you defined it interacts with CRUD operations. On the other hand, a run is a first class resource in TFE and I'd like some time to evaluate how to structure it and speak to some colleagues about the possibility.

Some questions that spring to mind about a tfe_run resource:

  1. Does it encapsulate configuration versions as well or does it depend on the workspace having config already?
  2. Is waiting on a run status like 'applied' appropriate when creating a tfe_run? (Is this important to your workflow?)
  3. If so, how long do we wait and how do you recover from this timeout as an error?

The problem of automatically applying a workspace that is missing a varset does feel like a legitimate gap, though. We don't have the API support to specify varsets when creating a workspace, which would easily solve this. Until then, I've been trying to devise an escape hatch that solves this issue. The only thing I've come up with involves external which has very few guarantees in Terraform Enterprise.

@drewmullen
Copy link
Contributor Author

Thank you for your consideration and input!

Some questions that spring to mind about a tfe_run resource:

  1. Does it encapsulate configuration versions as well or does it depend on the workspace having config already?
  2. Is waiting on a run status like 'applied' appropriate when creating a tfe_run? (Is this important to your workflow?)
  3. If so, how long do we wait and how do you recover from this timeout as an error?
  1. In my mind this is being used with a workspace that references a VCS so the configuration is already there and we're piping together workspaces that reference the config

  2. I propose this is an argument that dictates the behavior of the resources 'success' parameter. I would like to add an argument that is not in the api wait_for_completion (should probably be better named); false would fire the apply async, true would wait for the job to finish:

  # proposed new argument
  # whether resource completion should wait for the apply to be complete (successful) or should just fire the job and continue on
  wait_for_completion = bool
  1. This is a very fair question. In some resources in the aws provider we allow setting custom timeouts, we could consider that approach with a max cap to protect the service from excessive polling?

The problem of automatically applying a workspace that is missing a varset does feel like a legitimate gap, though. We don't have the API support to specify varsets when creating a workspace, which would easily solve this.

100% agree. I was heavily influenced by the {r/ds}/aws_lambda_invocation as part of the proposal for this. which is effectively a work around for any resources missing from providers and akin to CloudFormation custom resources... a necessary evil 😭

Another question I have is whether destroy should be allowed via this resource. ATM I was not planning to allow passing run_type = {apply,destroy} but it could be. Im just not certain of what the use case would be

@brandonc
Copy link
Collaborator

brandonc commented Jun 23, 2022

aws_lambda_invocation is a very interesting data source to consider. If I may contrast it with what we're proposing:

aws_lambda_invocation represents an invocation as a read operation (the result of the function) which conceptually is a lot cleaner than what we are proposing, which is queuing a run every time a read operation happens. Queuing a run has more known side effects (like blocking other runs) and there is no clear data output to be read. So far, I've been more focused on the run as a resource, because the create action maps more cleanly to what is taking place in Terraform Cloud/Enterprise.

Other thoughts:

Another question I have is whether destroy should be allowed via this resource. ATM I was not planning to allow passing run_type = {apply,destroy} but it could be. Im just not certain of what the use case would be

Runs can be discarded/canceled/force canceled and these might be good candidates for destroy.

If a tfe_run resource did depend on existing config, that would be the most flexible design because it would allow for the existence of a tfe_configuration_version later on.

Tomorrow I'm going to finish testing your workspace varset attachment PR and I'll get back to this after that. Thanks again.

@drewmullen
Copy link
Contributor Author

drewmullen commented Jun 23, 2022

aws_lambda_invocation is a very interesting data source to consider. If I may contrast it with what we're proposing:

just to clarify, there is both a data source & a resource for lambda invocation.

I'm personally more interested in the resource for tfe_run which is the use case i've focused on above (initial run of a new workspace configuration) and would only ever run once. I can imagine scenarios where a DS could be useful so i included it in this proposal as well but, honestly, with workspace run triggers the use cases are minimal for a data source that runs on "every read", as you said.

@jacobtaunton
Copy link

I was checking issues to see if I should report this resource race condition. We also attach varsets and leave default queue_all_runs = true. We'd have to override default value of queue_all_runs to false if we don't want to see workspaces fail on first run. We do not want our users to see errors. My opinion is behavior of this default value queue_all_runs = true should wait for the variables and varsets to be added before completion.

@drewmullen
Copy link
Contributor Author

Wanted to follow up on this to see if theres any input from the team, @brandonc ?

Also - thank you for adopting that PR for the latest release! love the new functionality. Here is the module that is the impetus of that issue and this one too.

@rhughes1
Copy link

I know that some people are leveraging this provider which will trigger a run on the workspace. They would declare a depends_on block on the tfe_workspace resource and the tfe_variable resource.

@brandonc
Copy link
Collaborator

Having had some time to consider, I do believe that a more narrowly focused logical provider like multispace would be a better solution for this.

@jacobtaunton
Copy link

Much of this was fixed when the tfe provider was updated to include attaching var_sets and policy_sets, things outside of the workspace resource that we needed attached prior to the first run. Before that we had to use depends_on with local_exec python scripts, and this caused other complications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants