Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create launch template for Managed Node Groups #1138

Merged
merged 19 commits into from
Apr 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions modules/node_groups/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ The role ARN specified in `var.default_iam_role_arn` will be used by default. In
| capacity\_type | Type of instance capacity to provision. Options are `ON_DEMAND` and `SPOT` | string | Provider default behavior |
| desired\_capacity | Desired number of workers | number | `var.workers_group_defaults[asg_desired_capacity]` |
| disk\_size | Workers' disk size | number | Provider default behavior |
| disk\_type | Workers' disk type. Require `create_launch_template` to be `true`| number | `gp3` |
| iam\_role\_arn | IAM role ARN for workers | string | `var.default_iam_role_arn` |
| instance\_types | Node group's instance type(s). Multiple types can be specified when `capacity_type="SPOT"`. | list | `[var.workers_group_defaults[instance_type]]` |
| k8s\_labels | Kubernetes labels | map(string) | No labels applied |
Expand All @@ -35,6 +36,12 @@ The role ARN specified in `var.default_iam_role_arn` will be used by default. In
| source\_security\_group\_ids | Source security groups for remote access to workers | list(string) | If key\_name is specified: THE REMOTE ACCESS WILL BE OPENED TO THE WORLD |
| subnets | Subnets to contain workers | list(string) | `var.workers_group_defaults[subnets]` |
| version | Kubernetes version | string | Provider default behavior |
| create_launch_template | Create and use a default launch template | bool | `false` |
| kubelet_extra_args | This string is passed directly to kubelet if set. Useful for adding labels or taints. Require `create_launch_template` to be `true`| string | "" |
| enable_monitoring | Enables/disables detailed monitoring. Require `create_launch_template` to be `true`| bool | `true` |
| eni_delete | Delete the Elastic Network Interface (ENI) on termination (if set to false you will have to manually delete before destroying) | bool | `true` |
| public_ip | Associate a public ip address with a worker. Require `create_launch_template` to be `true`| string | `false`
| pre_userdata | userdata to pre-append to the default userdata. Require `create_launch_template` to be `true`| string | "" |

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements
Expand All @@ -50,6 +57,7 @@ The role ARN specified in `var.default_iam_role_arn` will be used by default. In
| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 3.22.0 |
| <a name="provider_cloudinit"></a> [cloudinit](#provider\_cloudinit) | n/a |
| <a name="provider_random"></a> [random](#provider\_random) | >= 2.1 |

## Modules
Expand All @@ -61,7 +69,9 @@ No modules.
| Name | Type |
|------|------|
| [aws_eks_node_group.workers](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group) | resource |
| [aws_launch_template.workers](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template) | resource |
| [random_pet.node_groups](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/pet) | resource |
| [cloudinit_config.workers_userdata](https://registry.terraform.io/providers/hashicorp/cloudinit/latest/docs/data-sources/config) | data source |

## Inputs

Expand All @@ -74,6 +84,8 @@ No modules.
| <a name="input_node_groups"></a> [node\_groups](#input\_node\_groups) | Map of maps of `eks_node_groups` to create. See "`node_groups` and `node_groups_defaults` keys" section in README.md for more details | `any` | `{}` | no |
| <a name="input_node_groups_defaults"></a> [node\_groups\_defaults](#input\_node\_groups\_defaults) | map of maps of node groups to create. See "`node_groups` and `node_groups_defaults` keys" section in README.md for more details | `any` | n/a | yes |
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to add to all resources | `map(string)` | n/a | yes |
| <a name="input_worker_additional_security_group_ids"></a> [worker\_additional\_security\_group\_ids](#input\_worker\_additional\_security\_group\_ids) | A list of additional security group ids to attach to worker instances | `list(string)` | `[]` | no |
| <a name="input_worker_security_group_id"></a> [worker\_security\_group\_id](#input\_worker\_security\_group\_id) | If provided, all workers will be attached to this security group. If not given, a security group will be created with necessary ingress/egress to work with the EKS cluster. | `string` | `""` | no |
| <a name="input_workers_group_defaults"></a> [workers\_group\_defaults](#input\_workers\_group\_defaults) | Workers group defaults from parent | `any` | n/a | yes |

## Outputs
Expand Down
111 changes: 111 additions & 0 deletions modules/node_groups/launchtemplate.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
data "cloudinit_config" "workers_userdata" {
for_each = { for k, v in local.node_groups_expanded : k => v if v["create_launch_template"] }
gzip = false
base64_encode = true
boundary = "//"

part {
content_type = "text/x-shellscript"
content = templatefile("${path.module}/templates/userdata.sh.tpl",
{
pre_userdata = each.value["pre_userdata"]
kubelet_extra_args = each.value["kubelet_extra_args"]
}
)

}
}

# This is based on the LT that EKS would create if no custom one is specified (aws ec2 describe-launch-template-versions --launch-template-id xxx)
# there are several more options one could set but you probably dont need to modify them
# you can take the default and add your custom AMI and/or custom tags
#
# Trivia: AWS transparently creates a copy of your LaunchTemplate and actually uses that copy then for the node group. If you DONT use a custom AMI,
# then the default user-data for bootstrapping a cluster is merged in the copy.
resource "aws_launch_template" "workers" {
for_each = { for k, v in local.node_groups_expanded : k => v if v["create_launch_template"] }
name_prefix = lookup(each.value, "name", join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id]))
description = lookup(each.value, "name", join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id]))
update_default_version = true

block_device_mappings {
device_name = "/dev/xvda"

ebs {
volume_size = lookup(each.value, "disk_size", null)
volume_type = lookup(each.value, "disk_type", null)
delete_on_termination = true
}
}

instance_type = each.value["set_instance_types_on_lt"] ? element(each.value.instance_types, 0) : null

monitoring {
enabled = lookup(each.value, "enable_monitoring", null)
}

network_interfaces {
associate_public_ip_address = lookup(each.value, "public_ip", null)
delete_on_termination = lookup(each.value, "eni_delete", null)
security_groups = flatten([
var.worker_security_group_id,
var.worker_additional_security_group_ids,
lookup(
each.value,
"additional_security_group_ids",
null,
),
])
}
ArchiFleKs marked this conversation as resolved.
Show resolved Hide resolved
ArchiFleKs marked this conversation as resolved.
Show resolved Hide resolved

# if you want to use a custom AMI
# image_id = var.ami_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we allow this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because if using a custom image, this does not apply: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/launch_templates_with_managed_node_groups/launchtemplate.tf#L18 I think this lead to different behavior between EKS ami and other AMI, but I have not tested with custom AMI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stay simple, because we still allow user to use a custom launch template if needed, or maybe this can be added in another PR as I'm not really sure on how to handle cloud-init with custom AMI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not implemented for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you partially copied the LT from the examples that got added in my #997 , I might be able to help here:

so I am using LT with a custom AMI and it works just fine.
however there is indeed subtle but important differences between using an LT w/ or w/o a custom AMI. In the old PR, someone else described it quite well, see #997 (comment)

He also mentions a then required fix to the MIME boundary when using cloudinit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philicious do you mean that this part should be added manually when using custom AMI:

set -ex
B64_CLUSTER_CA=LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeE1ESXdNakUyTXpJeU0xb1hEVE14TURFek1URTJNekl5TTFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTkhXCk5vZjgxekorcGIxdEswMXRWVExSNEd0NDBDbkw5TU5vV0hSWGc3WndNVFkzcHVQMm05TlkvSXJ2bEZ2dDNNUVcKejUrb0FRdU8rcHA2RUFQOEZFK0JGaUVSVXpMZTYvbXFscGg2S2hmOEsyQU45QUN2RUYvMWlYNlQvWFlDdlRrRQp5MmhYSk1CUnVGSVF6dGVSaDEwRTFBZG5UWDdxNUY5RlhIY2VzR285TGlPbmRNMVpQRGpPS2lnZ0hMK2xheG4wCnN0bDlxeGZrYWZpMHNzb0ZCcUM3eGU1SGt2OVowYTYvRmxWeVNXazFQQXFCWDZOTlUvc0RjNTA3bXN0OEVMc0oKSU9naWFTcGZLaXVnekZNaTlTS3NQbjRQcm94UDEwRlErOGpSdTZZdm9tQmswMHFnU2NFTGxadng0bG1CVGloSgpCdDdFTlUxMzdvSXdhY1pCUUNFQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFJam1ZWmthV3NuQ1lSNkpQUGw1WmVpcGkzYkYKREpBUzgvM2E4UFVnL3BsWTFVYlhCalU3b0FDb21UUzd2Y2hPUFU5aFNXdC9jNit5RnF5a0FwakMyRjFuSHg4WQpaQUg5NDFWYUNzRyt3VmE3MTJlcFRPTSt1TWxNSENFYVlMVTRKOXEvaUd1aVZtM2NPOGhmMTFoNjVGd3NuekE0CmdqQ0YxUC9Sdi9acnFSSk9XZmJaRE00MzlwajVqQzNYRVAyK1FXVlIzR2tzbW1NcDVISm9NZW5JaDBSTFhnK1oKTVRVNXFsdW0xTWZDdXRNVjkzNGJFQ21BRERJSm4rZVdHSERwRi9QOThnR1RyRU1QclhiUXZMblpwZHBNYldjNQp5LzZldkNtYXozMzllSlUwWkRaM1M0R2YvbEpBUTBZcFZoQkRlS2hXVHEwSXJYb2NWWHU5MDN0OXU5TT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
API_SERVER_URL=https://A62655F81AE9347A761BB172E28A633F.sk1.eu-west-1.eks.amazonaws.com
K8S_CLUSTER_DNS_IP=172.20.0.10
/etc/eks/bootstrap.sh pio-thanos --kubelet-extra-args '--node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,GithubRepo=terraform-aws-eks,eks.amazonaws.com/nodegroup-image=ami-066fad1ae541d1cf9,Environment=test,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=pio-thanos-default-eu-west-1a-expert-bass,eks.amazonaws.com/sourceLaunchTemplateId=lt-079cbc5cf74ace131,GithubOrg=terraform-aws-modules' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP

About the boudaries i think this is fixed and I implemented it here

I'm just not sure how to handle the custom AMI, how do you do it on your end ? From what I understand you need to pass the bootstrap command on your own when using custom AMI, because it does not get merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could be right about the boundaries. I remember that someone wanted to update the cloudinit to support custom boundaries. so ye, probably and if this PR produces a running EKS, then its proven.

and ye, with a custom AMI, you have to supply the entire userdata yourself as no merging happens with the default one from EKS.
so I in the end use it like I added in the examples.

I would have to have a closer look and do tests on how to add custom AMI support and satisfy these differences.

it would for sure be great if the module could also handle custom AMI if it already got LT generation added.


# If you use a custom AMI, you need to supply via user-data, the bootstrap script as EKS DOESNT merge its managed user-data then
# you can add more than the minimum code you see in the template, e.g. install SSM agent, see https://github.com/aws/containers-roadmap/issues/593#issuecomment-577181345
#
# (optionally you can use https://registry.terraform.io/providers/hashicorp/cloudinit/latest/docs/data-sources/cloudinit_config to render the script, example: https://github.com/terraform-aws-modules/terraform-aws-eks/pull/997#issuecomment-705286151)

user_data = data.cloudinit_config.workers_userdata[each.key].rendered

key_name = lookup(each.value, "key_name", null)

# Supplying custom tags to EKS instances is another use-case for LaunchTemplates
tag_specifications {
resource_type = "instance"

tags = merge(
var.tags,
lookup(var.node_groups_defaults, "additional_tags", {}),
lookup(var.node_groups[each.key], "additional_tags", {}),
{
Name = lookup(each.value, "name", join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id]))
}
)
}

# Supplying custom tags to EKS instances root volumes is another use-case for LaunchTemplates. (doesnt add tags to dynamically provisioned volumes via PVC tho)
tag_specifications {
resource_type = "volume"

tags = merge(
var.tags,
lookup(var.node_groups_defaults, "additional_tags", {}),
lookup(var.node_groups[each.key], "additional_tags", {}),
{
Name = lookup(each.value, "name", join("-", [var.cluster_name, each.key, random_pet.node_groups[each.key].id]))
}
)
}

# Tag the LT itself
tags = merge(
var.tags,
lookup(var.node_groups_defaults, "additional_tags", {}),
lookup(var.node_groups[each.key], "additional_tags", {}),
)

lifecycle {
create_before_destroy = true
}
}
28 changes: 19 additions & 9 deletions modules/node_groups/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,25 @@ locals {
# Merge defaults and per-group values to make code cleaner
node_groups_expanded = { for k, v in var.node_groups : k => merge(
{
desired_capacity = var.workers_group_defaults["asg_desired_capacity"]
iam_role_arn = var.default_iam_role_arn
instance_types = [var.workers_group_defaults["instance_type"]]
key_name = var.workers_group_defaults["key_name"]
launch_template_id = var.workers_group_defaults["launch_template_id"]
launch_template_version = var.workers_group_defaults["launch_template_version"]
max_capacity = var.workers_group_defaults["asg_max_size"]
min_capacity = var.workers_group_defaults["asg_min_size"]
subnets = var.workers_group_defaults["subnets"]
desired_capacity = var.workers_group_defaults["asg_desired_capacity"]
iam_role_arn = var.default_iam_role_arn
instance_types = [var.workers_group_defaults["instance_type"]]
key_name = var.workers_group_defaults["key_name"]
launch_template_id = var.workers_group_defaults["launch_template_id"]
launch_template_version = var.workers_group_defaults["launch_template_version"]
set_instance_types_on_lt = false
max_capacity = var.workers_group_defaults["asg_max_size"]
min_capacity = var.workers_group_defaults["asg_min_size"]
subnets = var.workers_group_defaults["subnets"]
create_launch_template = false
kubelet_extra_args = var.workers_group_defaults["kubelet_extra_args"]
disk_size = var.workers_group_defaults["root_volume_size"]
disk_type = var.workers_group_defaults["root_volume_type"]
enable_monitoring = var.workers_group_defaults["enable_monitoring"]
eni_delete = var.workers_group_defaults["eni_delete"]
public_ip = var.workers_group_defaults["public_ip"]
pre_userdata = var.workers_group_defaults["pre_userdata"]
additional_security_group_ids = var.workers_group_defaults["additional_security_group_ids"]
},
var.node_groups_defaults,
v,
Expand Down
18 changes: 15 additions & 3 deletions modules/node_groups/node_groups.tf
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ resource "aws_eks_node_group" "workers" {
}

ami_type = lookup(each.value, "ami_type", null)
disk_size = lookup(each.value, "disk_size", null)
instance_types = lookup(each.value, "instance_types", null)
disk_size = each.value["launch_template_id"] != null || each.value["create_launch_template"] ? null : lookup(each.value, "disk_size", null)
instance_types = !each.value["set_instance_types_on_lt"] ? each.value["instance_types"] : null
release_version = lookup(each.value, "ami_release_version", null)
capacity_type = lookup(each.value, "capacity_type", null)

dynamic "remote_access" {
for_each = each.value["key_name"] != "" ? [{
for_each = each.value["key_name"] != "" && each.value["launch_template_id"] == null && !each.value["create_launch_template"] ? [{
ec2_ssh_key = each.value["key_name"]
source_security_group_ids = lookup(each.value, "source_security_group_ids", [])
}] : []
Expand All @@ -43,6 +43,18 @@ resource "aws_eks_node_group" "workers" {
}
}

dynamic "launch_template" {
for_each = each.value["launch_template_id"] == null && each.value["create_launch_template"] ? [{
id = aws_launch_template.workers[each.key].id
version = aws_launch_template.workers[each.key].latest_version
}] : []

content {
id = launch_template.value["id"]
version = launch_template.value["version"]
}
}

version = lookup(each.value, "version", null)

labels = merge(
Expand Down
6 changes: 6 additions & 0 deletions modules/node_groups/templates/userdata.sh.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash -e

# Allow user supplied pre userdata code
${pre_userdata}

sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" ${kubelet_extra_args}"' /etc/eks/bootstrap.sh
Copy link
Contributor

@philicious philicious Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw @ArchiFleKs I am trying to understand why your bootstrap command is missing the cluster-name argument which should be a problem according to https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh#L266-L269
but it still seem to work ?!

see for reference https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/templates/userdata.sh.tpl#L7 or https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/launch_templates_with_managed_node_groups/templates/userdata.sh.tpl#L10

and also misses ${bootstrap_extra_args} that @sbarbat mentioned, which would be a good addition so we can supply any of the other optional options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is with the auto merge of official AWS usersata and the custom supplied. So there is no possibility to add bootstrap args before.

12 changes: 12 additions & 0 deletions modules/node_groups/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,18 @@ variable "workers_group_defaults" {
type = any
}

variable "worker_security_group_id" {
description = "If provided, all workers will be attached to this security group. If not given, a security group will be created with necessary ingress/egress to work with the EKS cluster."
type = string
default = ""
}

variable "worker_additional_security_group_ids" {
description = "A list of additional security group ids to attach to worker instances"
type = list(string)
default = []
}

variable "tags" {
description = "A map of tags to add to all resources"
type = map(string)
Expand Down
18 changes: 10 additions & 8 deletions node_groups.tf
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
module "node_groups" {
source = "./modules/node_groups"
create_eks = var.create_eks
cluster_name = coalescelist(aws_eks_cluster.this[*].name, [""])[0]
default_iam_role_arn = coalescelist(aws_iam_role.workers[*].arn, [""])[0]
workers_group_defaults = local.workers_group_defaults
tags = var.tags
node_groups_defaults = var.node_groups_defaults
node_groups = var.node_groups
source = "./modules/node_groups"
create_eks = var.create_eks
cluster_name = coalescelist(aws_eks_cluster.this[*].name, [""])[0]
default_iam_role_arn = coalescelist(aws_iam_role.workers[*].arn, [""])[0]
workers_group_defaults = local.workers_group_defaults
worker_security_group_id = local.worker_security_group_id
worker_additional_security_group_ids = var.worker_additional_security_group_ids
tags = var.tags
node_groups_defaults = var.node_groups_defaults
node_groups = var.node_groups

# Hack to ensure ordering of resource creation.
# This is a homemade `depends_on` https://discuss.hashicorp.com/t/tips-howto-implement-module-depends-on-emulation/2305/2
Expand Down