-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Create launch template for Managed Node Groups #1138
feat: Create launch template for Managed Node Groups #1138
Conversation
Did you try creating a cluster using this TF code? I'm getting the following error when using one or more node_groups:
node_groups = {
nginx = {
create_launch_template = true
desired_capacity = 4
max_capacity = 10
min_capacity = 3
instance_type = "m5.large"
kubelet_extra_args = "--node-labels=role=nginx,group=nginx"
additional_tags = {
group = "nginx"
}
}
} I was able to get a cluster up by doing the following: change both
My next issue was with "disk_size", which didn't get a default value, so I set it to 50. EC2 instances are also coming up without the "Name" tag being set either. I think you need to add another |
@cabrinha thanks for the review, actually I used Terragrunt + Terraform but I was able to get a cluster running, I havent tested without disk_size (I tested with 50 also) so I think we need to set a default because there is none with launch template. If we set the About the |
@cabrinha also the plan is passing with the
|
Could we add a simple
I wish there was a way to add the Name tag to the instances spun up. The EC2 instances list is horrible without any Names 😅 Also, it'd be nice to add capacity_type too, since AWS now supports Managed Node Groups with Spot Instances. What version of Terraform are you using?
I'm on 0.12.29, using the same example code block you are and I'm still getting the error:
|
modules/node_groups/node_groups.tf
Outdated
disk_size = lookup(each.value, "disk_size", null) | ||
instance_types = each.value["launch_template_id"] != null ? [] : [each.value["instance_type"]] | ||
disk_size = each.value["launch_template_id"] != null || each.value["create_launch_template"] ? null : lookup(each.value, "disk_size", null) | ||
instance_types = each.value["launch_template_id"] != null || each.value["create_launch_template"] ? [] : [each.value["instance_type"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be cool to add capacity_type = lookup(each.value, "capacity_type", "ON_DEMAND")
right above this, so users could choose to use "SPOT"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If think this is implemented by #1129
@cabrinha Alright for the spot and the name. But I have to admit I really do not understand the part about about the I'm using the latest Terraform 0.13.5, I have ont tried with 0.12 I will test on my end. |
Supposedly this issue is more likely to happen on a blank tfstate. |
Guys I have a question related to remote access, you won't be able to mention remote access on node groups as per aws docs
I dont see a handle of that on your code launch_template.tf. would be great if you can clarify. |
You mean to perform validation on the input ? |
What I mean is if you add a remote access as an input to nodegroups with an associated launch template, you would end up with the below error.
You would need to handle the remote_access on the launch_template.tf. If im wrong do correct me. |
@binnythomas-1989 I think I understand, we need to prevent setting remote_access here : https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/node_groups.tf#L21 if launchtemplate is used |
you are correct @ArchiFleKs. You would need to add the key on the launch template too. |
Guys adding the Kubelet argument is great. I just tested it. Im kinda sorry with amazon their EKS is kinda really annoying. So let me explain what I have figured.
The problem is once you add the taints with a NoSchedule for example, the node won't join the cluster. You would have issues with the coredns pod scheduling since its a deployment. It won't be able to schedule. So letting people to use this for taints is a bad idea on EKS. |
@binnythomas-1989 it is true for coreDNS, aws-node tolerate every taint, you should always have a "default" pool or a "criticalAddonsOnly" pool if you want. I agree this is kind of a poweruser feature, node join also failed if you are using |
There is an example right now on how to use a launch template but it is not straightforward, this PR goal is to enable a simple configuration while still allowing to use a custom launch template if needed or a basic node pool.
|
The kube-proxy and aws-node is a DaemonSet so that's okay, its just the core-dns is an issue since its a deployment. Anyways it could be useful if we add a condition on the README. i was trying to adopt your solution on my local terraform module I use. Im planning to handle this using terraform using kubectl provider for now. :-) I dont use eksctl |
4812f88
to
51d7f61
Compare
@binnythomas-1989 this : https://github.com/terraform-aws-modules/terraform-aws-eks/pull/1138/files#diff-7a3fc6c7df17fda0c341e61255461bf1f149256a9ddf14d4a18ab6f020d08136R22 should take care of remote access, could you try ? If not I'll try to test today |
This won't work again, Because now you need to handle the Remote access on Launch template and it would go like this
So that bit is being handled on the launch_template.tf |
I added the key_name lookup |
This is exactly what we're looking for. Thank you for the work done on this so far! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that I didn't submit my review. It was in pending state.
} | ||
|
||
# if you want to use a custom AMI | ||
# image_id = var.ami_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we allow this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because if using a custom image, this does not apply: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/launch_templates_with_managed_node_groups/launchtemplate.tf#L18 I think this lead to different behavior between EKS ami and other AMI, but I have not tested with custom AMI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stay simple, because we still allow user to use a custom launch template if needed, or maybe this can be added in another PR as I'm not really sure on how to handle cloud-init with custom AMI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not implemented for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you partially copied the LT from the examples that got added in my #997 , I might be able to help here:
so I am using LT with a custom AMI and it works just fine.
however there is indeed subtle but important differences between using an LT w/ or w/o a custom AMI. In the old PR, someone else described it quite well, see #997 (comment)
He also mentions a then required fix to the MIME boundary when using cloudinit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philicious do you mean that this part should be added manually when using custom AMI:
set -ex
B64_CLUSTER_CA=LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeE1ESXdNakUyTXpJeU0xb1hEVE14TURFek1URTJNekl5TTFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTkhXCk5vZjgxekorcGIxdEswMXRWVExSNEd0NDBDbkw5TU5vV0hSWGc3WndNVFkzcHVQMm05TlkvSXJ2bEZ2dDNNUVcKejUrb0FRdU8rcHA2RUFQOEZFK0JGaUVSVXpMZTYvbXFscGg2S2hmOEsyQU45QUN2RUYvMWlYNlQvWFlDdlRrRQp5MmhYSk1CUnVGSVF6dGVSaDEwRTFBZG5UWDdxNUY5RlhIY2VzR285TGlPbmRNMVpQRGpPS2lnZ0hMK2xheG4wCnN0bDlxeGZrYWZpMHNzb0ZCcUM3eGU1SGt2OVowYTYvRmxWeVNXazFQQXFCWDZOTlUvc0RjNTA3bXN0OEVMc0oKSU9naWFTcGZLaXVnekZNaTlTS3NQbjRQcm94UDEwRlErOGpSdTZZdm9tQmswMHFnU2NFTGxadng0bG1CVGloSgpCdDdFTlUxMzdvSXdhY1pCUUNFQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFJam1ZWmthV3NuQ1lSNkpQUGw1WmVpcGkzYkYKREpBUzgvM2E4UFVnL3BsWTFVYlhCalU3b0FDb21UUzd2Y2hPUFU5aFNXdC9jNit5RnF5a0FwakMyRjFuSHg4WQpaQUg5NDFWYUNzRyt3VmE3MTJlcFRPTSt1TWxNSENFYVlMVTRKOXEvaUd1aVZtM2NPOGhmMTFoNjVGd3NuekE0CmdqQ0YxUC9Sdi9acnFSSk9XZmJaRE00MzlwajVqQzNYRVAyK1FXVlIzR2tzbW1NcDVISm9NZW5JaDBSTFhnK1oKTVRVNXFsdW0xTWZDdXRNVjkzNGJFQ21BRERJSm4rZVdHSERwRi9QOThnR1RyRU1QclhiUXZMblpwZHBNYldjNQp5LzZldkNtYXozMzllSlUwWkRaM1M0R2YvbEpBUTBZcFZoQkRlS2hXVHEwSXJYb2NWWHU5MDN0OXU5TT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
API_SERVER_URL=https://A62655F81AE9347A761BB172E28A633F.sk1.eu-west-1.eks.amazonaws.com
K8S_CLUSTER_DNS_IP=172.20.0.10
/etc/eks/bootstrap.sh pio-thanos --kubelet-extra-args '--node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,GithubRepo=terraform-aws-eks,eks.amazonaws.com/nodegroup-image=ami-066fad1ae541d1cf9,Environment=test,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=pio-thanos-default-eu-west-1a-expert-bass,eks.amazonaws.com/sourceLaunchTemplateId=lt-079cbc5cf74ace131,GithubOrg=terraform-aws-modules' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP
About the boudaries i think this is fixed and I implemented it here
I'm just not sure how to handle the custom AMI, how do you do it on your end ? From what I understand you need to pass the bootstrap command on your own when using custom AMI, because it does not get merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could be right about the boundaries. I remember that someone wanted to update the cloudinit to support custom boundaries. so ye, probably and if this PR produces a running EKS, then its proven.
and ye, with a custom AMI, you have to supply the entire userdata yourself as no merging happens with the default one from EKS.
so I in the end use it like I added in the examples.
I would have to have a closer look and do tests on how to add custom AMI support and satisfy these differences.
it would for sure be great if the module could also handle custom AMI if it already got LT generation added.
@ArchiFleKs do you have a rough timeline for this work? Also, would it be possible to add support for |
Signed-off-by: Kevin Lefevre <[email protected]>
Sure, I can retest this at any time and post my configs here. |
Alright conflict should be fixed now |
Just tested this config: node_groups = {
managed = {
desired_capacity = 1
max_capacity = 5
min_capacity = 1
instance_types = [
"c3.2xlarge",
"c4.xlarge",
"c4.2xlarge",
]
capacity_type = "SPOT"
root_volume_type = "gp2"
root_volume_size = 10
kubelet_extra_args = "--node-labels=node.kubernetes.io/lifecycle=spot,role=worker,node.kubernetes.io/exclude-from-external-load-balancers --register-with-taints=dedicated=managed:NoSchedule"
k8s_labels = {
Environment = "test"
GithubRepo = "terraform-aws-eks"
GithubOrg = "terraform-aws-modules"
}
additional_tags = {
CustomTag = "EKS example"
}
}
} Seems to be working well |
When will this be merged? |
Great question. @barryib time to review and merge? |
@barryib it looks like the changes in this PR didn't make it into v15.0.0 or v15.1.0, do you have a plan for when they are going to be released? |
Aren't the changes here? v15.1.0...master 2e1651d |
That first link is showing the diff between v15.1.0 and master. Your second link shows that it is only on |
I'm not seeing this work with Example:
Produces the following userdata for the launch configuration:
Which is missing all of the other required parameters provided by the EKS AMI which is vital for registering the nodes in the cluster. Am I missing something or does this not work? |
@martin308 this hasn't been released yet, so unless you're using |
@martin308 I'm pretty sure what you are missing/forgetting, is this: https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data about using custom user data snippets while still using official AMI. You would need full user data in case of custom AMI, but in that case you would also have created the full regular launch template. |
yup, pulling down the merge commit by ref 👍
I'm not using a custom AMI as per my example. I guess I'm just confused as how to use the Are you saying that it is expected that my example above would not work? If so is there an example of how to make use of the |
No I'm saying the opposite :) My understanding is your example with the resulting LT should work and be able to join cluster (due to the merging of user data done outside control of this terraform module). I'm interested to hear if your testing confirm that. |
I just tested with
It is working as expected |
I can confirm it works with x86 also:
|
You have to use the |
…odules#1138) Signed-off-by: Kevin Lefevre <[email protected]>
@ArchiFleKs I've been looking at further customizing the manage node group bootstrap process and I'd be interested if you tried setting the |
I tried and seems cloud-init don't preserve exported variables between its parts (w/o custom AMI user-data get merged with the one provided by AWS). One of solutions might be to write vars to some file like |
@ipleten it was a leading question, I've done this exact thing for some of the other env variables by persisting the export. I'll probably open a PR to change this as it's more resilient to AMI changes than the sed solution. |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Signed-off-by: Kevin Lefevre [email protected]
PR o'clock
Description
Enable the creation of a default launch template if needed to use with managed node pool. This enable the use of
kubelet_extra_args
and to add taint quickly without having to manage a separate launch template Terraform config.It implements this logic: aws/containers-roadmap#864
I think the launchtemplate default might need some trimming, tell me what you think.
Checklist