Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: Spot instances for managed node groups #583

Closed
stijndehaes opened this issue Nov 19, 2019 · 58 comments
Closed

[EKS] [request]: Spot instances for managed node groups #583

stijndehaes opened this issue Nov 19, 2019 · 58 comments
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@stijndehaes
Copy link

stijndehaes commented Nov 19, 2019

Tell us about your request
Right now we can use on-demand instances in a managed node worker group. However I see no reference in the documentation to using spot instances or a spot fleet. Ideally, I would like to be able to use spot instances for my batch workloads.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I want to run batch workloads cost efficiently. We mostly use spot instances for this. Without this feature I can't take advantage of the nice managed draining and upgrading support of the managed worker node groups.

Are you currently working around this issue?
Creating our own autoscaling groups and manually doing a rolling upgrade using kubectl cordon and drain commands.

Additional context
No

Attachments
None


Update 12/1 – this feature is now available

@stijndehaes stijndehaes added the Proposed Community submitted issue label Nov 19, 2019
@tabern tabern added the EKS Amazon Elastic Kubernetes Service label Nov 19, 2019
@tabern
Copy link
Contributor

tabern commented Nov 19, 2019

Thanks for adding this! We're working on this feature and its been part of our plan for managed nodes from the start.

Question: Would you expect to provision a spot node group with a single instance type or multiple instance types?

@stijndehaes
Copy link
Author

@tabern Thanks for the quick answer!
As to your question I would expect to be able to specify multiple instance types. At the moment we use launch template to specify multiple instance types and let it choose automatically. Providing the same support would be great. The reason we choose multiple instance types is resiliency, if one of those instance types is not available we can automatically switch.

@gjmveloso
Copy link

How about the new Fargate Spot option on EKS?

@gertjangaillet
Copy link

How about the new Fargate Spot option on EKS?

I assume you are also talking about this announcement, however this just means you're running pods on fargate orchestrated by EKS, which is still quite expensive compared to running actual nodes, and does not integrate with typical k8s tooling such as EFK, Prometheus+Grafana, nginx ingress, cert manager, etc.

@tabern
Copy link
Contributor

tabern commented Dec 16, 2019

I would expect to be able to specify multiple instance types. At the moment we use launch template to specify multiple instance types and let it choose automatically.

@stijndehaes would you expect to add any priority to these instance types or is random sufficient (ie: let cluster autoscaler scale up and we'll hit eventual capacity)? If we did not support multiple instance types per group would it be painful to need to create multiple node groups, some of which were scaled to 0 (and could be scaled up as needed) or would this create undue complexity?


How about the new Fargate Spot option on EKS?

@gjmveloso - that's on our roadmap, tracked as #622


which is still quite expensive compared to running actual nodes

@gertjangaillet - The cost of Fargate tends to be dependent on cluster utilization. If you're getting very high utilization, Fargate is more expensive than nodes. However, if you typically run with low cluster utilization (50% or much less is very common), Fargate is more efficient. We're also bringing Savings Plan to EKS/Fargate (#616) which is another great way to lower costs.

@stijndehaes
Copy link
Author

@tabern to start with random would be sufficient. However I would be most interested in the option to launch the cheapest instance type. I mostly use a couple of different instance types that roughly have the same cpu/memory. For example: m5.xlarge, m5a.xlarge, m5d.xlarge. This makes it sure that all jobs land on instances with roughly the same power available to them. Also this used to be very important for the kubernetes cluster autoscaler because it uses one of the nodes as a template to see if a new pod would fit that node. I am not sure if this is still the case though (but I guess it is).

@mambetica
Copy link

We would be interested in specifying a 'Capacity-Optimised' allocation strategy, as we have seen instability in Spot using 'Lowest-price' as we have suffered from losing instances within a given AZ then getting then back again, and losing them again where that instance type in that AZ is near exhaustion. We have therefore moved to diversified pools of instances matching the same capacity requirements, with a Capacity-Optimised strategy, i.e. we are willing to take a hit on getting the cheapest spot for stability.

@jurgenweber
Copy link

When can we expect this?

@AndresPineros
Copy link

AndresPineros commented Jan 14, 2020

Is there an ETA to release this feature? We're interested in migrating from Kops to EKS Managed, but not having Spot Instances is going to increase all of our pre-environments costs, which is a no-go.

@sandrom
Copy link

sandrom commented Jan 14, 2020

This would be a really outstanding important feature, hope it comes soon :)

@leepatrick-goop
Copy link

+1 to this, essential feature IMO.

@ruecarlo
Copy link

ruecarlo commented Feb 2, 2020

o release this feature? We're interested in migrating from Kops to EKS Managed, but not having Spot Instances is going to increase all of our pre-environments costs, which is a no-go.

I'd suggest checking out this workshop and https://ec2spotworkshops.com/using_ec2_spot_instances_with_eks.html and this blog post https://itnext.io/the-definitive-guide-to-running-ec2-spot-instances-as-kubernetes-worker-nodes-68ef2095e767.

@jonathanoliver80
Copy link

Agreed that this is an essential EKS offering especially to support development pipelines.
Make it so!

@igrowheart
Copy link

Any update on this item?

@jayolmos
Copy link

jayolmos commented Mar 6, 2020

Any update please?

@lsantana486
Copy link

Hi, are there some new about this topic?

@casey-robertson
Copy link

This would be pretty useful given widespread industry mandates to cut costs right now .....

@gauravkohli
Copy link

any updates? It would help us keep out cost low on EKS cluster which we plan to use for our CI builds.

@pc-rshetty
Copy link

@tabern after moving to eks on reserved instances we are now thinking of leveraging spot instances.
We do use cluster autoscaler on production today.
If we have 2 node groups and if i were to set precedence for "spot" worker node groups (over reserved) to expand first and only if that is not successful i would like to go to reserved instances.
To make this happen in understand i would have to implement something like this https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/priority/readme.md

This, in my opinion, can also work with SpotAllocationStrategy set to capacity-optimized

So i think implementing priority based expander is important.

@antiqe
Copy link

antiqe commented Apr 4, 2020

@casey-robertson 👍. Any update regarding the support of spot instances on managed node group ?

@dchelupati
Copy link

Question: Would you expect to provision a spot node group with vCPU/Mem based inputs and let EKS select the list of instance types?

@itssimon
Copy link

itssimon commented Apr 9, 2020

If that was an additional (optional) feature it might be interesting. But having control over the exact types of instances is more important.

@antiqe
Copy link

antiqe commented Apr 9, 2020

@dchelupati I think having control of the type of instance, it's what's people expect from that feature. A scenario would be providing more power to your CI and optimise the cost or another scenario to increase the power of your cluster based on the amount of preview environment. You probably want to pick up the exact instance type you need.

@dchelupati
Copy link

@antiqe Thanks for the feedback. I understand why you want to pick the instance type based on the workload and desired performance of the cluster. However, in order to take the most benefit of EC2 Spot instances, we recommend best practices of instance and AZ flexibility. For example, if you need c5.large as preferred instance type for your CI environment, we can create a Spot node group with c5.large and c4.large so you are flexible across instance types. If you prefer, we can even add m4 and m5 to further increase flexibility. If the node group does it on your behalf with a preferred instance type input instead of vCPU and memory, would that work?

@cep21
Copy link

cep21 commented Apr 9, 2020

For example, if you need c5.large as preferred instance type for your CI environment, we can create a Spot node group with c5.large and c4.large

That works for me, especially since it's best practice. I personally don't need to care about instance type for CPU and memory, but for GPU instances the instance type is very important. There can be big price and performance differences between different types of GPUs.

@igrowheart
Copy link

@igrowheart The API documentation doesn't tell anything about "spot" instances when creating a managed node group: https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html. I think that blog describes how to launch spot instances through a "regular" "manually" created launch template + autoscaling group.

If you read through the blog I posted and try it out, you will see that it describes a way to use spot instance under managed worker node group, which is the specific request mentioned in this ticket. The control is passed via eksctl instead of other AWS APIs. Let me know if I missed anything here. :)

@igrowheart
Copy link

This doc also mentions eksctl’s support on spot instances. It’s updated recently in May for the first time.
https://eksctl.io/usage/spot-instances/

For those who voted down on my comments:
We have the same interest to wait for the most wanted feature on EKS and we want those features to benefit our work or apps in the future. I’m just guessing the way the product team is doing, if you think the comment is not in the right direction, please just leave your comments. Do not down voting like a kid. :)

@ktumu0225
Copy link

@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.

@Dudssource
Copy link

Dudssource commented Jul 12, 2020

@igrowheart my bad. But let's try to not get emotive here and not flood this issue with unnecessary comments.
I downvoted your comment because just like @yourilefers pointed out, currently there's no support for spot instances through the managed node groups official API. What eksctl does is to use cloudformation templates to provision what AWS itself calls 'self managed nodes'. Also in the article you mentioned it's pretty clear, managed nodes for the on demand group and self managed for the spot pool.
And even though it is possible to workaround this by changing the auto scaling groups manually (like @ktumu0225 mentioned), this does not solves this issue as it stands for being capable of provisioning spot instances through managed nodes (officially), this would also enable other tools like terraform to use this feature.

@igrowheart
Copy link

@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.

Didn't realize the upgrade will break this and the custom user data part. Thanks for the insights!
However, this will help a lot on the dev&test environments.
For production, we need to wait for the General Available of this feature.

@igrowheart
Copy link

@Dudssource never mind.
Seems I'm too thrilled after I found eksctl can support spot instances :)
Thanks for your time explaining the details. Let's wait for the General Available of this feature.

@sarbajitdutta
Copy link

@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.

We also did the same thing. We are using Cluster Autosacaler as well. Did you use their recommended settings for instance types or did you configure our own instance types with mixed spot and on-demand in the launch templates?

@antiqe
Copy link

antiqe commented Aug 7, 2020

@tabern Do you have some news from AWS team regarding the used of Spot Instance with Managed Node Groups. Thanks in advance

@amazingandyyy
Copy link

Any terraform terraform-aws-modules/eks/aws user here, know when it can become a feature for node_groups?

@anarsen
Copy link

anarsen commented Sep 16, 2020

Any terraform terraform-aws-modules/eks/aws user here, know when it can become a feature for node_groups?

I wouldn't count on it becoming available until it's part of the official AWS EKS API. See #583 (comment).

@treksler
Copy link

Alright, this has been open for almost a year now. Is there any progress?

If this feature is not available soon (within a month), I will be forced to give up on managed node groups and that would be a shame.

Is there a roadmap for the AWS EKS API? Is there an Amazon rep who can speak to this?

@rtripat
Copy link

rtripat commented Oct 16, 2020

@treksler We are actively working on it and appreciate the patience.

@deimosfr
Copy link

Is there any ETA @treksler ?

@treksler
Copy link

Is there any ETA @treksler ?
you mean @rtripat

@igrowheart
Copy link

I saw the status changed to 'Coming Soon'. So I'm expecting this during the re:invent. :)

@tabern
Copy link
Contributor

tabern commented Dec 1, 2020

EKS managed node groups now provide native support for EC2 Spot Instances.

When you create a managed node group, simply set capacity type as SPOT and the select one or more EC2 instance types that meet your resource requirements. Managed node groups provision and manage Spot nodes based on the latest Spot best practices. In particular, they enhance your node group's availability by enabling the capacity-optimized allocation strategy and Capacity Rebalancing on all Amazon EC2 Auto Scaling groups they manage.

Learn more

@kreempuff
Copy link

Is this supported in the config as well?

@jonathan-mothership
Copy link

Any terraform terraform-aws-modules/eks/aws user here, know when it can become a feature for node_groups?

I wouldn't count on it becoming available until it's part of the official AWS EKS API. See #583 (comment).

@amazingandyyy This is available in PR form here: terraform-aws-modules/terraform-aws-eks#1129

@vladimirtiukhtin
Copy link

Tried this today. The price is even higher than on-demand's one. What's the purpose?

@vladimirtiukhtin
Copy link

I must admit that I mixed up "price" and "max price" options. I apologize

@jindov
Copy link

jindov commented Jun 25, 2021

EKS managed node groups now provide native support for EC2 Spot Instances.

When you create a managed node group, simply set capacity type as SPOT and the select one or more EC2 instance types that meet your resource requirements. Managed node groups provision and manage Spot nodes based on the latest Spot best practices. In particular, they enhance your node group's availability by enabling the capacity-optimized allocation strategy and Capacity Rebalancing on all Amazon EC2 Auto Scaling groups they manage.

Learn more

Love to have the availability to choose the Spot allocation strategy: loweset price or capacity optimized

Reason: we have some secondary nodegroups running in spot instance type and they are acceptable suddenly interruption, it also help us to save cost for non-critical cluster/nodegroup workload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests