-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: LaunchTemplate support for managed node-groups #997
Conversation
PR would be ready. Maybe adding another example would help tho, how to actually use the |
I had success using https://registry.terraform.io/providers/hashicorp/cloudinit/latest/docs/data-sources/cloudinit_config to wrap the userdata but I would say it shouldn't be part of this module like how you're making the LT not part of this module |
Though I might suggest some documentation with an example of what the LT should look like. i.e. instance profile cannot be set, EKS will complain about that |
One good candidate for example is installing ssm agent on startup. See aws/containers-roadmap#593 (comment) |
I will add an example and make this PR fully ready towards weekend. |
Does this actually work? Reading through the documentation they list some node group configurations that are prohibited when using launch templates:
Or maybe by "prohibited" they mean ignored from the API values? |
Yes it does. Have cluster running with custom AMI by that |
@dpiddockcmp the documentation reads
so it is correct to not set the |
I tested this over the weekend and it worked great. Only addition I needed was to add
|
@philicious I can confirm that it works as you propose. |
Made a PR to make the random naming solution work properly when the node group needs to be replaced: |
Anything holding this back? I'd love to make use of it so we can use security groups and tags cc @dpiddockcmp |
That PR of @Carles-Figuerola looks very good and besides that only example missing. Iam in holidays and had some stressful weeks at my client before but next week Wednesday could finish it up ! |
Thanks @philicious for working on this. We have an terraform-aws-modules working session this friday. We'll discuss about the direction we want to take about this feature. We'll come back to you pretty soon. |
If that helps, I have tested this PR as well and it works as advertised, thank you @philicious and the maintainers of this repo for your work! |
Another important bit for documentation. The example in this PR description shows the call to "Bootstrap and join the cluster" as part of the user data. Including this will in fact fail the managed node when NOT using a custom AMI. The above note is from EKS docs. When using the default AMI, the call to bootstrap.sh is merged into the user-data. We have an eks module wrapping this one, and this is what I used for the LT looks like:
Replacing the correct multi-part boundary string into the user data seemed to be important or I'd get an error about not using multi-part:
Others noted the need to use I haven't checked when using a custom AMI, but in my case (where a custom AMI is NOT being used) EKS is creating a new launch template based on the provided one, with the bootstrap script merged into the user data. Many thanks to @philicious for putting this together! I've got it applied to multiple EKS clusters, where I can confirm it's working, and I hope to see it incorporated in some way on a tagged release soon. |
Another thing to note is about using disk/EBS encryption with LaunchTemplates, as asked for in #1023 I left that out of my initial example but its worth going into the example then too: If you set
you also need to add a key policy to that KMS key, so the cluster-autoscaler can decrypt volumes
|
@barryib any news based on your last fridays working session? I'd like to see this PR merged and I think many other would also be happy. I need to know what I should still add to it so you guys are happy with it :) |
This comment has been minimized.
This comment has been minimized.
@huguesalary are you sure you havent accidentally also set doesnt contain |
This comment has been minimized.
This comment has been minimized.
Re-creating the cluster from scratch fixed the issue mentioned in comments #997 (comment) and #997 (comment) |
I just deployed (actually redeployed just the node groups) a couple of EKS cluster and everything is fine. Thanks a lot for this pull request! How can I help here so it can be merged? Maybe provide an example of the code? Anything else? |
I am really looking forward for this change to be merged. I have tested this branch and it works fine for me with the launch templates for multiple node groups |
I'm going to do some tests during the week-end so we can merge it. Thanks in advance for your comprehension. |
@barryib sounds awesome ! I'll allot some weekend time then for adding examples / doc, based on comments in here |
@barryib i addressed the change requests. Pls have a look 🙂 |
Thanks a lot @philicious for your contribution. I'll push a new release during the day. |
v13.1.0 is now released. Thank you all for your works. |
@barryib thanks for doing such a quick release !! ❤️ |
@philicious I just wanted to thank you for your work on this. You're a life saver! |
@philicious I tried your
I am using the default AMI, custom AMI is not enabled |
@MBalazs90 I have myself rarely but have seen that error before. For different reasons. For debugging, ssh to the machine and try to CURL the masters URL and also try to run the bootstrap command by hand. |
I can replicate this error with custom userdata. I have noticed that
I did spin instances successfully up with managed node groups and the default (implicit) launch template. I inspected the default userdata from the AWS Console > EKS > Compute > MNG > Advanced > Userdata and I noticed that the following labels are present there (in the configuration which works):
So - in order to get managed node groups with launch template WITH custom userdata requirement, you need to MANUALLY fill in the I played around with this and found out that those three are the minimum to get nodes register properly. However, the default (implicit) launch template also fills The examples/launch_templates_with_managed_node_groups/templates/userdata.sh.tpl provides a reference but it does not work without the labels mentioned above. |
I personally only tested with MNG with LT with using a custom AMI, which the userdata template is used for. I never tried userdata w/o a custom AMI. However see #997 (comment) where @davidalger successfully did that. Maybe he has an idea ? |
I investigated deeper into this. The problem I was facing is related to the merge of userdata done by EKS Managed Node Groups (MNG). My problem is that I need to pass custom K8s node-labels to the kubelet. Normally you'd be able to do this by just passing
The problem is that Managed Node Group "merging of userdata" will place a The last part of the userdata (provided by MNG) will contain the Without these Userdata - update EKS Managed Node Group EC2 instances to the newest AWS Kernel
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
set -xe
# Install newer Amazon supported kernel
amazon-linux-extras install -y kernel-ng
yum install -y amazon-ssm-agent
yum update -y
TOKEN="$(curl -X PUT -H "X-aws-ec2-metadata-token-ttl-seconds: 600" "http://169.254.169.254/latest/api/token")"
INSTANCE_LIFECYCLE="$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s http://169.254.169.254/latest/meta-data/instance-life-cycle)"
INSTANCE_ID="$(curl -H "X-aws-ec2-metadata-token: $TOKEN" --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .instanceId -r)"
REGION="$(curl -H "X-aws-ec2-metadata-token: $TOKEN" --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region -r)"
LAUNCH_TEMPLATE_VERSION="$(aws ec2 describe-tags --region "$REGION" --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=tag-key,Values=aws:ec2launchtemplate:version" --query 'Tags[0].Value')"
LAUNCH_TEMPLATE_ID="$(aws ec2 describe-tags --region "$REGION" --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=tag-key,Values=aws:ec2launchtemplate:id" --query 'Tags[0].Value')"
NODEGROUP="$(aws ec2 describe-tags --region "$REGION" --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=tag-key,Values=eks:nodegroup-name" --query 'Tags[0].Value')"
# AMI ID is passed by the default MNG launch template, but node joins the cluster without it also.
# Also as we have just updated the kernel, ami id would need to be queried from somewhere.
# eks.amazonaws.com/nodegroup-image=ami-05cd1e07212dd719a
# TODO: dynamic eks.amazonaws.com/capacityType=ON_DEMAND from INSTANCE_LIFECYCLE
EKS_MNG_LABELS="eks.amazonaws.com/sourceLaunchTemplateVersion=$LAUNCH_TEMPLATE_VERSION,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/sourceLaunchTemplateId=$LAUNCH_TEMPLATE_ID,eks.amazonaws.com/nodegroup=$NODEGROUP"
# https://github.com/awslabs/amazon-eks-ami/blob/0a96824d7b60d0930c846f5d6841d1c10ff411d2/files/bootstrap.sh#L273
K8S_CLUSTER_DNS_IP=172.20.0.10
# Userdata is only executed at the first boot of an EC2 instance.
# Prepare bootstrap instructions which will be executed at the second boot.
cat >/etc/rc.d/rc.local <<EOF
#!/bin/bash
set -xe
# Bootstrap and join the cluster
# https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh
/etc/eks/bootstrap.sh \
--b64-cluster-ca '${cluster_auth_base64}' \
--apiserver-endpoint '${endpoint}' \
--dns-cluster-ip "$K8S_CLUSTER_DNS_IP" \
${bootstrap_extra_args} \
--kubelet-extra-args '--node-labels=${k8s_labels} --node-labels=$EKS_MNG_LABELS' \
'${cluster_name}'
touch /var/lock/subsys/local
EOF
chmod +x /etc/rc.d/rc.local
systemctl enable rc-local.service
# Start again with the new kernel
reboot
--//--
Launch template: user_data = base64encode(
data.template_file.launch_template_userdata_osd.rendered
) template_filedata "template_file" "launch_template_userdata_osd" {
template = file("${path.module}/templates/userdata.sh.tpl")
vars = {
cluster_name = var.cluster_name
endpoint = module.eks.cluster_endpoint
cluster_auth_base64 = module.eks.cluster_certificate_authority_data
bootstrap_extra_args = ""
k8s_labels = "node.rdx.net/example-role=example-value"
}
} I'm over my head now and I'm no longer sure whether the MNG was able to recognize the nodes before I introduced the reboot. However, if you want a newer kernel with custom k8s node-labels, the reboot and custom eks-bootstrap.sh call is required. |
Wow @pre , That looks like a lot of hassle you had to do to get this working, thanks for sharing ! to me it seems as if using just a custom AMI would be easier: user packer to easily build a custom AMI with the kernel update baked in. Then have the simple userdata from the examples and have a smooth experience with managed node groups and not having to worry about labels yourself |
Are you aware of how to install a newer kernel from Amazon Linux Extras as an AMI? The Amazon FAQ only tells the My goal was not to build an AMI of my own and I definitely did not want a third party AMI. Even though the setup above was a hassle, it now provides a newer Linux 5.5 kernel which is managed by Amazon. I tried using the official AWS Ubuntu image, which also provides Linux 5.5, but it was too different from the Amazon Linux to use without many changes elsewhere. TL;DR Do you know how to find an AMI ID for an image which has the latest Amazon Linux Extras with a Linux 5.5 kernel? |
PS. If you want to pass custom Kubernetes Node labels such as |
For the life of me I can't figure out what to do with this. I assume I need to create an AWS IAM policy with that document, but then what? Do I need to attach it to a role? Do I need to do a KMS grant? It would be really useful to have a complete example. |
@daenney I guess you seen https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/launch_templates_with_managed_node_groups so the ebs encryption policy should be added as policy to a KMS key as stated, then that KMS key ARN needs to be set in the LT. by that, your cluster node disks will be encrypted and EKS will be able to actually enc/dec them resource "aws_kms_key" "this" {
customer_master_key_spec = "SYMMETRIC_DEFAULT"
description = "EKS key"
enable_key_rotation = true
is_enabled = true
key_usage = "ENCRYPT_DECRYPT"
policy = var.key_policy
}
// optionally give the kms a human-readable name
resource "aws_kms_alias" "this" {
name = "alias/eks_key"
target_key_id = join("", aws_kms_key.this.*.id)
} |
Got it. Thanks a ton! Somehow I had missed the |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
PR o'clock
Description
fixes #979
Just recently on 17th August, AWS released LaunchTemplate support for managed node-groups. https://aws.amazon.com/blogs/containers/introducing-launch-template-and-custom-ami-support-in-amazon-eks-managed-node-groups/
Furthermore the
aws
provider also supports it since3.3.0
hashicorp/terraform-provider-aws#14639This module didn't support it yet, only LTs for self-managed worker groups.
As the module is quite complex already, I only added support for providing the id of LT you create yourself and then supply the Id.
The existing
workers_launch_template.tf
couldn't have been easily reused imho as its related to also creating ASGs and other resources for self-managed node-groups. Also at least theiam_instance_profile
should NOT be supplied for LTs being used for managed node-groups as I noticed. AWS API will error then.So instead of adding another LT manifest and variables and wiring that all together, I prefered to take care of the LT myself and copied the userdata template
I now create a LT like
and pass it to the eks module
Noteworthy: I had to wrap the userdata script in MIME as otherwise API complained it wasnt MIME. so I took the MIME wrapper as seen when creating a node group manually and letting AWS create a LT for you
Checklist