Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newly refactored module #774

Closed
js-timbirkett opened this issue Mar 10, 2020 · 21 comments
Closed

Newly refactored module #774

js-timbirkett opened this issue Mar 10, 2020 · 21 comments

Comments

@js-timbirkett
Copy link

Hello 👋

Firstly, this module is great and has a lot to deal with as standing up an EKS cluster and nodes / workers is not a trivial task.

There have been issues regarding orchestrating the creation of the control plane and the workers / nodes [#519], issues questioning complexity or coupling within the module [#635], questions with regards to LC / LT worker_groups and whether they should be merged or simplified [#563], and issues around numeric indexes and modifying arrays in general.

I had a go at creating a new, refactored version of this module and my attempt is here:
https://github.com/devopsmakers/terraform-aws-eks

It breaks things out into sub-modules that can be used by both the parent module (to maintain the simple single module gets you a working cluster) and directly by the user of the module for more complex situations (Custom CNI).

It also allows a map of maps for worker groups and the code has been simplified a bit (reduction of the death by lookups in the current worker related code).

I made a few tough decisions that might stimulate some debate and possibly some outrage. These are highlighted in the repos README.

I'm not saying it's perfect (yet), but I think it would make a good base for a fairly major refactor of this module and I'd be glad to assist in any way I can. Feel free to use any ideas or the codebase itself to further improve this module.

Thanks!

@barryib
Copy link
Member

barryib commented Mar 10, 2020

Hello @js-timbirkett thanks for opening this discussion and for your work. It's a good start.

Can you please open a PR so we can iterate and give feedbacks ?

For the design decisions:

  • To me, we can move worker groups in a separate submodule like we already did for managed node groups.
  • As for dropping LC support, I don't have a strong opinion on this. I personally don't use LC at all, but I don't know if there are lot of people out there which still rely on it. See Is the complexity of this module getting too high? #635 (comment)

@max-rocket-internet @dpiddockcmp @antonbabenko please advice (mostly for the LC removal)

@antonbabenko
Copy link
Member

+1 for dropping LC. I think we all agreed in some previous issues that dropping support for LC is the way to go.

I wonder, are there any downsides of using LT at all comparing to LC?

@max-rocket-internet
Copy link
Contributor

I wonder, are there any downsides of using LT at all comparing to LC?

No downsides to LT that I know of, LC is just older and has less features.

@max-rocket-internet
Copy link
Contributor

I had a go at creating a new, refactored version of this module

Sounds great but we need a PRs to review and a path forward for current users.

I look forward to seeing some PRs 😃

@js-timbirkett
Copy link
Author

Hey 👋

The refactored module is still very much in a beta state, when I've finished shaving the Terraform yak and double checking outputs, inputs etc... I'll open a big ole PR for further discussion and improvements.

It'll definitely be a major release and not something that's easily migrate-able to without a lot of Terraform state moving or importing (could probably write up a guide on that).

@chancez
Copy link

chancez commented Mar 18, 2020

@js-timbirkett I'll be giving your worker group module a try. I'm vendoring it though since I need to change ignore_changes for the ASG to not ignore desired capacity since we're using TF, not cluster autoscaler for managing workers.

@js-timbirkett
Copy link
Author

Oh nice @chancez - Thanks for the 🐛 fix!

I'll be doing some final editing of my version of the module early next week when I have some time off work. Then I'll open what will be a rather large PR against this module and see how it goes...

@barryib
Copy link
Member

barryib commented Apr 22, 2020

@js-timbirkett any update here ?

@barryib
Copy link
Member

barryib commented May 8, 2020

@js-timbirkett are you still working on this ?

@ptphuc
Copy link

ptphuc commented Jul 23, 2020

I'm really looking for this changes could happen. Any updates for this @js-timbirkett ?. Many thx for your contribution so far.

@barryib
Copy link
Member

barryib commented Jul 23, 2020

I was wondering if we shouldn't wait for Terraform 0.13 with it's loop support for modules.

With that, we can split the submodule to deploy one worker group and let users loop over the submodule to create their worker groups.

@max-rocket-internet @dpiddockcmp @antonbabenko please advise.

@antonbabenko
Copy link
Member

antonbabenko commented Jul 24, 2020 via email

@dpiddockcmp
Copy link
Contributor

0.13 doesn't appear to require any massive breaking changes to upgrade. It looks like most 0.12 code should mostly run fine under it. Although I haven't experimented very heavily. I wonder if that will speed or slow adoption?

Either way, we should wait a good period of time before adding 0.13 only features that probably have bugs and edge cases. 6 months?

There's also the MNG submodule to consider. It already has the pattern of internal for_each. It would be good for submodules to be consistent in behaviour. So it would need migrating with breaking changes for users if we did go down the for_each module path.

My vote, right now, would be to stick with for_each inside the submodules if we want to implement these changes soon.

@max-rocket-internet
Copy link
Contributor

wait for Terraform 0.13 with it's loop support for modules.

Sounds good to me 🚀

@stale
Copy link

stale bot commented Oct 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Jan 23, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 23, 2021
@barryib barryib removed the stale label Jan 29, 2021
@stale
Copy link

stale bot commented Apr 29, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 29, 2021
@stale
Copy link

stale bot commented Aug 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 2, 2021
@stale
Copy link

stale bot commented Sep 2, 2021

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@stale stale bot closed this as completed Sep 2, 2021
@willquill
Copy link

If you're like me and use this module to create a cluster with no nodes so that you can then delete the aws-node daemon set to disable AWS VPC networking for pods in order to apply a different CNI, like Calico, as described here, and then come back to the module to add worker nodes, you can absolutely configure conditional worker_group creation via a boolean variable.

I put example code for this in Issue #861 in my post here: #861 (comment)

And I actually took it a step further, creating a repository called terraform-eks to act as a personal module with the terraform-aws-eks module nested inside of it. Now I can tag specific commits of my personal repository with the AMI version used in the worker nodes and invoke an entire cluster and all of its dependencies with only 15 lines of code:

module "eks" {
  source                   = "git::ssh://[email protected]:1337/my-project/terraform-eks.git?ref=v1.21.0"
  environment              = var.environment
  env                      = var.env
  tags                     = local.tags
  source_ami_region        = "us-east-2"
  vpc_id                   = local.vpc_id
  eks_subnets              = data.aws_subnet_ids.eks_subnets.ids
  eks_tags                 = var.eks_tags
  eks_worker_instance_type = "m5.4xlarge"
  asg_max_size             = 3
  asg_min_size             = 1
  asg_desired_capacity     = 1
  create_workers           = true
}

With this method, you first apply when create_workers is false, then go do the CNI stuff, and finally come back and set create_workers to true.

Doing something like this is a game changer in terms of avoiding code repetition if you have multiple EKS clusters. The main.tf in my use of the official, public module is 104 lines. If you have four clusters, that's 416 lines of code. With my method, it can be 104 for your personal version of the public module +15+15+15+15, for a total of 164 lines of code (give or take, as you have to account for variables, locals, and data calls).

@antonbabenko antonbabenko unpinned this issue Apr 7, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants