-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EKS] [request]: Spot instances for managed node groups #583
Comments
Thanks for adding this! We're working on this feature and its been part of our plan for managed nodes from the start. Question: Would you expect to provision a spot node group with a single instance type or multiple instance types? |
@tabern Thanks for the quick answer! |
How about the new Fargate Spot option on EKS? |
I assume you are also talking about this announcement, however this just means you're running pods on fargate orchestrated by EKS, which is still quite expensive compared to running actual nodes, and does not integrate with typical k8s tooling such as EFK, Prometheus+Grafana, nginx ingress, cert manager, etc. |
@stijndehaes would you expect to add any priority to these instance types or is random sufficient (ie: let cluster autoscaler scale up and we'll hit eventual capacity)? If we did not support multiple instance types per group would it be painful to need to create multiple node groups, some of which were scaled to 0 (and could be scaled up as needed) or would this create undue complexity?
@gjmveloso - that's on our roadmap, tracked as #622
@gertjangaillet - The cost of Fargate tends to be dependent on cluster utilization. If you're getting very high utilization, Fargate is more expensive than nodes. However, if you typically run with low cluster utilization (50% or much less is very common), Fargate is more efficient. We're also bringing Savings Plan to EKS/Fargate (#616) which is another great way to lower costs. |
@tabern to start with random would be sufficient. However I would be most interested in the option to launch the cheapest instance type. I mostly use a couple of different instance types that roughly have the same cpu/memory. For example: m5.xlarge, m5a.xlarge, m5d.xlarge. This makes it sure that all jobs land on instances with roughly the same power available to them. Also this used to be very important for the kubernetes cluster autoscaler because it uses one of the nodes as a template to see if a new pod would fit that node. I am not sure if this is still the case though (but I guess it is). |
We would be interested in specifying a 'Capacity-Optimised' allocation strategy, as we have seen instability in Spot using 'Lowest-price' as we have suffered from losing instances within a given AZ then getting then back again, and losing them again where that instance type in that AZ is near exhaustion. We have therefore moved to diversified pools of instances matching the same capacity requirements, with a Capacity-Optimised strategy, i.e. we are willing to take a hit on getting the cheapest spot for stability. |
When can we expect this? |
Is there an ETA to release this feature? We're interested in migrating from Kops to EKS Managed, but not having Spot Instances is going to increase all of our pre-environments costs, which is a no-go. |
This would be a really outstanding important feature, hope it comes soon :) |
+1 to this, essential feature IMO. |
I'd suggest checking out this workshop and https://ec2spotworkshops.com/using_ec2_spot_instances_with_eks.html and this blog post https://itnext.io/the-definitive-guide-to-running-ec2-spot-instances-as-kubernetes-worker-nodes-68ef2095e767. |
Agreed that this is an essential EKS offering especially to support development pipelines. |
Any update on this item? |
Any update please? |
Hi, are there some new about this topic? |
This would be pretty useful given widespread industry mandates to cut costs right now ..... |
any updates? It would help us keep out cost low on EKS cluster which we plan to use for our CI builds. |
@tabern after moving to eks on reserved instances we are now thinking of leveraging spot instances. This, in my opinion, can also work with SpotAllocationStrategy set to So i think implementing priority based expander is important. |
@casey-robertson 👍. Any update regarding the support of spot instances on managed node group ? |
Question: Would you expect to provision a spot node group with vCPU/Mem based inputs and let EKS select the list of instance types? |
If that was an additional (optional) feature it might be interesting. But having control over the exact types of instances is more important. |
@dchelupati I think having control of the type of instance, it's what's people expect from that feature. A scenario would be providing more power to your CI and optimise the cost or another scenario to increase the power of your cluster based on the amount of preview environment. You probably want to pick up the exact instance type you need. |
@antiqe Thanks for the feedback. I understand why you want to pick the instance type based on the workload and desired performance of the cluster. However, in order to take the most benefit of EC2 Spot instances, we recommend best practices of instance and AZ flexibility. For example, if you need c5.large as preferred instance type for your CI environment, we can create a Spot node group with c5.large and c4.large so you are flexible across instance types. If you prefer, we can even add m4 and m5 to further increase flexibility. If the node group does it on your behalf with a preferred instance type input instead of vCPU and memory, would that work? |
That works for me, especially since it's best practice. I personally don't need to care about instance type for CPU and memory, but for GPU instances the instance type is very important. There can be big price and performance differences between different types of GPUs. |
If you read through the blog I posted and try it out, you will see that it describes a way to use spot instance under managed worker node group, which is the specific request mentioned in this ticket. The control is passed via eksctl instead of other AWS APIs. Let me know if I missed anything here. :) |
This doc also mentions eksctl’s support on spot instances. It’s updated recently in May for the first time. For those who voted down on my comments: |
@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade. |
@igrowheart my bad. But let's try to not get emotive here and not flood this issue with unnecessary comments. |
Didn't realize the upgrade will break this and the custom user data part. Thanks for the insights! |
@Dudssource never mind. |
We also did the same thing. We are using Cluster Autosacaler as well. Did you use their recommended settings for instance types or did you configure our own instance types with mixed spot and on-demand in the launch templates? |
@tabern Do you have some news from AWS team regarding the used of Spot Instance with Managed Node Groups. Thanks in advance |
Any terraform |
I wouldn't count on it becoming available until it's part of the official AWS EKS API. See #583 (comment). |
Alright, this has been open for almost a year now. Is there any progress? If this feature is not available soon (within a month), I will be forced to give up on managed node groups and that would be a shame. Is there a roadmap for the AWS EKS API? Is there an Amazon rep who can speak to this? |
@treksler We are actively working on it and appreciate the patience. |
Is there any ETA @treksler ? |
I saw the status changed to 'Coming Soon'. So I'm expecting this during the re:invent. :) |
EKS managed node groups now provide native support for EC2 Spot Instances. When you create a managed node group, simply set capacity type as SPOT and the select one or more EC2 instance types that meet your resource requirements. Managed node groups provision and manage Spot nodes based on the latest Spot best practices. In particular, they enhance your node group's availability by enabling the capacity-optimized allocation strategy and Capacity Rebalancing on all Amazon EC2 Auto Scaling groups they manage. Learn more |
Is this supported in the config as well? |
@amazingandyyy This is available in PR form here: terraform-aws-modules/terraform-aws-eks#1129 |
Tried this today. The price is even higher than on-demand's one. What's the purpose? |
I must admit that I mixed up "price" and "max price" options. I apologize |
Love to have the availability to choose the Reason: we have some secondary nodegroups running in spot instance type and they are acceptable suddenly interruption, it also help us to save cost for non-critical cluster/nodegroup workload |
Tell us about your request
Right now we can use on-demand instances in a managed node worker group. However I see no reference in the documentation to using spot instances or a spot fleet. Ideally, I would like to be able to use spot instances for my batch workloads.
Which service(s) is this request for?
EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I want to run batch workloads cost efficiently. We mostly use spot instances for this. Without this feature I can't take advantage of the nice managed draining and upgrading support of the managed worker node groups.
Are you currently working around this issue?
Creating our own autoscaling groups and manually doing a rolling upgrade using kubectl cordon and drain commands.
Additional context
No
Attachments
None
Update 12/1 – this feature is now available
The text was updated successfully, but these errors were encountered: