Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ignore/skip in any resource #17274

Closed
koalalorenzo opened this issue Feb 5, 2018 · 8 comments
Closed

Adds ignore/skip in any resource #17274

koalalorenzo opened this issue Feb 5, 2018 · 8 comments

Comments

@koalalorenzo
Copy link

koalalorenzo commented Feb 5, 2018

I have been using quite a while terraform and I love it, but sometimes I fight against its limits.
Count is amazing way to create multiple resources and have conditions... but there are cases when you have a Count, but you still need to ignore the resource.

We are creating 100+ vms... but some of them don't need a azure data_disk and some others do, some others needs more than one. To easily skip some them, or create more, we are forced to:

  1. Have complex variables/maps
  2. Have multiple definition for azure_data_disk

The option number 2 is not practical as the amount of resources would be too much, The option number 1, is more practical but the variables must be generated from a script in order to do this properly (yes, CSV -> tfvars)... but it is not human readable; and the only way to use it is to update the CSV and let the script do the magic of creating this structure.

To create data_disks we found useful to create variables/maps of the ones that we have to create... but it is.

We could avoid this madness probably if there was a:

count = `184`
skip_this_one = `TERNARY OPERATOR` // this will be considered after counting, so that if index = 120, the resource will not be created, but 121 will be there!

Basically the ternary operator can skip some of the 184 resources.
Why has something like this not been implemented? I guess this is a more philosophical reason that I am curious to know.

BTW, #2831 is probably similar but this is specific case.

@Ycallaer
Copy link

Ycallaer commented Feb 5, 2018

+1, similar issue where we have to created resources from an excel where we need to skip certain resources if someone leaves the cell blank

@rorychatterton
Copy link

It might be worth unpacking the use case in a bit more detail.

In the example you have provided - you are creating 184 virtual machines under the same virtual machine resource using count.

Within Terraform, it is an anti-pattern to use count across resources that are defining different virtual machine roles/application/components. For example, instead of:

"azurerm_virtual_machine" "my_vms" {
    count = 184
}

This logic becomes especially complex if you are using Terraform to apply your custom_data for bootstrapping, or referencing immutable images.

You would generally split up by:

# Doesn't need disk
"azurerm_virtual_machine" "web_server" {
    count = 12
}

# Needs additional Data Disk
"azurerm_virtual_machine" "app_server" {
    count = 12 
}

# Doesn't need disks
"azurerm_virtual_machine" "cache_server" {
    count = 12
}

You can then use a module to wrap up the logic to create/not-create a managed disk.

This becomes particularly important as you scale out the infrastructure as each of your applications should really be residing in separate state files (or workspaces if you are using Terraform Enterprise).

My fear is that introducing a feature like this will encourage bad architectural patterns, such as mixing resource roles (hard to manage growth, and a nightmare to manage day 2 operations), or lumping your entire datacentre definition in a single terraform file (really bad if your state becomes corrupt, or somebody becomes trigger happy on a terraform deletion action!)

Are you able to provide some more information as to why we can't achieve the same results with a mix of modules & splitting up roles?

@koalalorenzo
Copy link
Author

koalalorenzo commented Feb 6, 2018

Thank you, this is the answer that I wanted.

We fixed this already with a script, but it is so nasty and unclear that we don't like it. Even if we were using modules and splitting up roles, I could say that the code would have been more complicated, just for few things. Honestly we are considering to switch to Ansible where at least conditions and loops are easier to manage... but we will lack all the features that Terraform has. 😭

I am a person that loves clean code and structures; what we have done to achieve this not clean: modules, script, looping nasty data structures, splitting up roles: a lot of unhuman, unreadable lines of code; only way to interact with TF is by other scripts; a lot of more line of code just for things to "skip/ignore".

Case: We have around ~100 beefy machines, you can group them under different sizes, different storage, the kind/amount of disks attached, but you will end up in always different kind of groups and after starting writing the terraform setup for each "group" of similar machine, we thought that there must be a simpler way as doing very similar tf files for 3-4 machines each time was painful, put them for different repositories and manually enable CICD for them... no. That is not possible.

So somebody came with the idea of iterating over the actual amount of resources we need (ex: 100 machines, 80 disks, 42 IP etc etc etc). So we have a set of arrays generated by a script. Each element of these arrays have the VM original INDEX ID and other stuff. It works.

  1. This is lazy work, we know that this should be defined in multiple setups: but we are aware of that and we are ready for the consequences! This is for one single project.
  2. If we had skip or ignore, we could get the same, with less unhuman code, no script, no madness.

Now, if you are worried that somebody will use bad patterns I can say that we used a bad pattern intentionally anyway. But we need to get shit done and the current Terraform is not helping when talking about big infrastructure, loops and conditions. I guess that this feature that we are looking for was requested more than once; Just googling you can find a shitload of people.

Solutions ahead of us:

  1. Switch to Ansible/other tools (but NEVER Microsoft ARM templates 🤮 )
  2. Keep doing this nasty stuff.

I think solution number 1 is easier, as would "philosophically" make more sense. @rorychatt what do you say?

@rorychatterton
Copy link

Even if we were using modules and splitting up roles, I could say that the code would have been more complicated, just for few things.

Not necessarily. Less LoC isn't necessarily less complicated, especially if you are attempting to encapsulate additional information about the particular VMs being managed.

Is:

Count Role Environment Playbook firewall_rules
5 WebServer Dev nginx SG-123456:443, SG-123456:80, 10.4.4.3:6443

a huge upgrade over:

module "WebServer" {
    source = "some_terraform_module"

    count = 5
    environment = "dev"
    playbook = "nginx"

    firewall_rules [
    { port =  443, destination = SG-123456 } ,
    { port =  443, destination = SG-123456 } ,
    { port =  443, destination = SG-123456 } 
    ]
}

For single dimensional data, tables are totally cool - but we aren't dealing with single dimensional data in provisioning. Bending a table to manage Security Groups/DNS/database connectivity/users/playbooks/etc is complex, and is going to be ultra fragile as Terraform does lazy evaluation of logic and won't pickup configuration errors.

Honestly we are considering to switch to Ansible where at least conditions and loops are easier to manage... but we will lack all the features that Terraform has.

Both are great tools, and to be honest, I don't see it as an Ansible vs Terraform debate. With my customers, I tend to describe:

  • Terraform as an orchestration engine that also does configuration management. Terraform has a huge variety of deployment resources available ootb at its disposal, and when looking at the cloud orchestration market, has leaps and bounds over the competition in pure resource coverage. That said, outside of running trivial scripts on the vm, it expects any configuration management to happen outside of the tool, and in a way, discourages it as the Hashicorp design sits around declarative, immutable infrastructure.
  • Ansible as a configuration management tool, that also does orchestration. Ansible has a fantastic set of playbooks at its disposal to deploy a huge number of tasks. For managing the desired state of a long running virtual machine it is great. It also has a fairly decent coverage of cloud resources, but can't put a dent on Terraform.

Depending on your complexity, you may want both. When used in tandem, a typical deployment will do something like:

resource "aws_instance" "my_server" {
    # Define VM Attributes
    ami = "ami-XXXX"
    instance_type = "t2.small"
    ...

    # Define local playbook to run
    provisioner "local-exec" {
        command = "<<< CODE TO BOOTSTRAP ANSIBLE PLAYBOOK >>> "
    }
}

Allowing each tool to focus on what it's good at.

That said - if you are only doing virtual machine management, and not deploying much in the way of PaaS services, you might be better just simplifying to one tool. That said, do it for the right reasons.

I am a person that loves clean code and structures; what we have done to achieve this not clean: modules, script, looping nasty data structures, splitting up roles: a lot of unhuman, unreadable lines of code; only way to interact with TF is by other scripts; a lot of more line of code just for things to "skip/ignore".

This is where we can agree to disagree. There is alot that can be done to simplify the definition of modules, and encourage DRY code through modules, while not nerfing your ability to manage day two operations around virtual machines. I don't want this to turn into an essay for the ages, but I am happy to talk through your use cases over PM later if wanted.

Case: We have around ~100 beefy machines, you can group them under different sizes, different storage, the kind/amount of disks attached, but you will end up in always different kind of groups and after starting writing the terraform setup for each "group" of similar machine, we thought that there must be a simpler way as doing very similar tf files for 3-4 machines each time was painful, put them for different repositories

Yep - This is exactly what modules are for. As far as separate repositories: honestly the safety of knowing that a simple F&ckup on one of my definitions isn't going to screw up my entire 180 vms, all to save 30 seconds of time to cd /repo/dir is enough for me! If that isn't enough, there is also the complexity of:

  • Matching Indexes back to playbooks, then the versioning and state of those playbooks, in a declarative manner
  • Removing machines from the middle of the index, and then watching half the virtual machines get destroyed as the index resets itself
  • Coupling vms in the middle of that stack to other services (e.g. I want to add VM 45 and 67 as CNAMES to my registrar would be a nightmare. I have no idea how you would even start)
  • Managing the same applications across multiple environments, when each environment has different numbers/sizes of virtual machines, in a repeatable manner.

Irrespective of whether running Ansible, Terraform, salt, etc - putting all your eggs in one basket is going to end in disaster, and the time spent troubleshooting will far outweigh the additional overhead of separating out repositories.

and manually enable CICD for them... no. That is not possible.

We automate on-boarding for Jenkins, Bitbucket and Terraform to do exactly this, alongside pre-configured templates for different types of deployments.

So somebody came with the idea of iterating over the actual amount of resources we need (ex: 100 machines, 80 disks, 42 IP etc etc etc). So we have a set of arrays generated by a script. Each element of these arrays have the VM original INDEX ID and other stuff. It works.

See earlier comments

Now, if you are worried that somebody will use bad patterns I can say that we used a bad pattern intentionally anyway. But we need to get shit done and the current Terraform is not helping when talking about big infrastructure, loops and conditions.

Terraform is used by thousands of companies, across hundreds of thousands of machines. The biggest implementation I have seen has over 8,000 applications and 25,000 machines under management.

I'm a big believe in get shit done. I'm a big believer in saving future me (or whoever is managing my infrastructure) time and heartache for the sake for a bit of readability early on.

To get back to the original question:

I don't work for Hashicorp, so I don't represent them. However - I'm not convinced that adding a skip-like function will be net positive for the Terraform community. It encourages bad practices in architecting infrastructure as code.

We are better off educating the community, through Reference Architecture models and similar, on how to manage 100's & 1000's of machines in a way that provides the desired flexibility but doesn't bite them in the arse.

@koalalorenzo
Copy link
Author

koalalorenzo commented Feb 6, 2018

Thank you for your reply! After reading, I do agree with what you are saying. It makes sense, except for the "bad practice" side: it is a lazy way to justify the lack of skip and other cool features that terraform could have. Let me explain:

I prefer to give people the opportunity to make mistakes and learn from them, rather than forcing them to use the tool in one way or another. Philosophically speaking you can achieve great things and discover new cool way by allowing mistakes. Would you ever create a new knife if you are aware that it can cut things but also kill pepole? I prefer to instruct people to not use it for killing, but I would still provide the new knife blade.

So now we don't have skip, and because of that we are painfully creating a lot of things. We have no other way to do it. And EVEN with Modules it is massively messy, a lot of code duplicated and variables here and there... just because our knife could kill people. Is that making sense to you? Try to understand our frustrated point of view of end users from a big company with massive request for a single project.

as an example, using Modules is not really practical due also to this: #5480

@koalalorenzo koalalorenzo changed the title Adds ignore/skip resource Adds ignore/skip in any resource Feb 6, 2018
@throwaway8787
Copy link

@rorychatterton - I have a vpc module. It uses all available AZ's. There are scenarios where I may want to skip an AZ because an instance type isn't available.

Your suggestions is that:

  1. I duplicate my vpc module (potentially a mess depending on how often I need to do this)
  2. I abstract my subnets into sub modules (exponentially impossible number when 6 az's are in us-east-1)
  3. Some idea I'm missing?

@teamterraform
Copy link
Contributor

Hi all! Thanks for the great discussion here.

Terraform 0.12 has introduced a new feature that includes a means to filter the elements of a collection:

locals {
  smaller_list = [for x in var.larger_list : x if x.foo == "baz"]
}

This general building block can in turn be used to produce a list to use with count, and thus meet the use case described in this issue.

However, it's still good to be cautious of the fact that count correlates by indices in the list, and so adding and removing items from the list can be more disruptive to existing instances than we'd like. #17179 describes the feature we intend to release to address that limitation: a new argument for_each that can be used instead of count to produce instances that are correlated with keys in a map, rather than indices in a list.

Implementation of #17179 is in progress right now and will be included in a near-future Terraform CLI release. Once that is complete, the for_each feature combined with for expressions for maps will address the use-case this issue is describing:

resource "aws_instance" "example" {
  for_each = { for x in var.list : x.name => x if x.foo = "baz" }
}

Because we already have #17179 representing the for_each feature, we're going to close this one out to consolidate over there. Keep an eye on that other issue to see when the for_each feature is released. Thanks!

@ghost
Copy link

ghost commented Aug 17, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Aug 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants