[EKS] [request]: Spot instances for managed node groups #583

stijndehaes · 2019-11-19T05:38:00Z

Tell us about your request
Right now we can use on-demand instances in a managed node worker group. However I see no reference in the documentation to using spot instances or a spot fleet. Ideally, I would like to be able to use spot instances for my batch workloads.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I want to run batch workloads cost efficiently. We mostly use spot instances for this. Without this feature I can't take advantage of the nice managed draining and upgrading support of the managed worker node groups.

Are you currently working around this issue?
Creating our own autoscaling groups and manually doing a rolling upgrade using kubectl cordon and drain commands.

Additional context
No

Attachments
None

Update 12/1 – this feature is now available

tabern · 2019-11-19T07:20:10Z

Thanks for adding this! We're working on this feature and its been part of our plan for managed nodes from the start.

Question: Would you expect to provision a spot node group with a single instance type or multiple instance types?

stijndehaes · 2019-11-19T07:38:59Z

@tabern Thanks for the quick answer!
As to your question I would expect to be able to specify multiple instance types. At the moment we use launch template to specify multiple instance types and let it choose automatically. Providing the same support would be great. The reason we choose multiple instance types is resiliency, if one of those instance types is not available we can automatically switch.

gjmveloso · 2019-12-04T13:20:38Z

How about the new Fargate Spot option on EKS?

gertjangaillet · 2019-12-09T08:37:36Z

How about the new Fargate Spot option on EKS?

I assume you are also talking about this announcement, however this just means you're running pods on fargate orchestrated by EKS, which is still quite expensive compared to running actual nodes, and does not integrate with typical k8s tooling such as EFK, Prometheus+Grafana, nginx ingress, cert manager, etc.

tabern · 2019-12-16T16:22:44Z

I would expect to be able to specify multiple instance types. At the moment we use launch template to specify multiple instance types and let it choose automatically.

@stijndehaes would you expect to add any priority to these instance types or is random sufficient (ie: let cluster autoscaler scale up and we'll hit eventual capacity)? If we did not support multiple instance types per group would it be painful to need to create multiple node groups, some of which were scaled to 0 (and could be scaled up as needed) or would this create undue complexity?

How about the new Fargate Spot option on EKS?

@gjmveloso - that's on our roadmap, tracked as #622

which is still quite expensive compared to running actual nodes

@gertjangaillet - The cost of Fargate tends to be dependent on cluster utilization. If you're getting very high utilization, Fargate is more expensive than nodes. However, if you typically run with low cluster utilization (50% or much less is very common), Fargate is more efficient. We're also bringing Savings Plan to EKS/Fargate (#616) which is another great way to lower costs.

stijndehaes · 2019-12-17T06:34:49Z

@tabern to start with random would be sufficient. However I would be most interested in the option to launch the cheapest instance type. I mostly use a couple of different instance types that roughly have the same cpu/memory. For example: m5.xlarge, m5a.xlarge, m5d.xlarge. This makes it sure that all jobs land on instances with roughly the same power available to them. Also this used to be very important for the kubernetes cluster autoscaler because it uses one of the nodes as a template to see if a new pod would fit that node. I am not sure if this is still the case though (but I guess it is).

mambetica · 2019-12-19T12:28:47Z

We would be interested in specifying a 'Capacity-Optimised' allocation strategy, as we have seen instability in Spot using 'Lowest-price' as we have suffered from losing instances within a given AZ then getting then back again, and losing them again where that instance type in that AZ is near exhaustion. We have therefore moved to diversified pools of instances matching the same capacity requirements, with a Capacity-Optimised strategy, i.e. we are willing to take a hit on getting the cheapest spot for stability.

jurgenweber · 2020-01-08T02:45:54Z

When can we expect this?

AndresPineros · 2020-01-14T04:31:12Z

Is there an ETA to release this feature? We're interested in migrating from Kops to EKS Managed, but not having Spot Instances is going to increase all of our pre-environments costs, which is a no-go.

sandrom · 2020-01-14T16:38:39Z

This would be a really outstanding important feature, hope it comes soon :)

leepatrick-goop · 2020-01-24T00:31:17Z

+1 to this, essential feature IMO.

ruecarlo · 2020-02-02T12:56:59Z

o release this feature? We're interested in migrating from Kops to EKS Managed, but not having Spot Instances is going to increase all of our pre-environments costs, which is a no-go.

I'd suggest checking out this workshop and https://ec2spotworkshops.com/using_ec2_spot_instances_with_eks.html and this blog post https://itnext.io/the-definitive-guide-to-running-ec2-spot-instances-as-kubernetes-worker-nodes-68ef2095e767.

jonathanoliver80 · 2020-02-07T00:57:03Z

Agreed that this is an essential EKS offering especially to support development pipelines.
Make it so!

igrowheart · 2020-02-13T06:28:20Z

Any update on this item?

jayolmos · 2020-03-06T09:15:08Z

Any update please?

lsantana486 · 2020-03-12T17:47:45Z

Hi, are there some new about this topic?

casey-robertson · 2020-03-23T19:20:39Z

This would be pretty useful given widespread industry mandates to cut costs right now .....

gauravkohli · 2020-03-27T10:39:13Z

any updates? It would help us keep out cost low on EKS cluster which we plan to use for our CI builds.

pc-rshetty · 2020-03-30T07:01:52Z

@tabern after moving to eks on reserved instances we are now thinking of leveraging spot instances.
We do use cluster autoscaler on production today.
If we have 2 node groups and if i were to set precedence for "spot" worker node groups (over reserved) to expand first and only if that is not successful i would like to go to reserved instances.
To make this happen in understand i would have to implement something like this https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/priority/readme.md

This, in my opinion, can also work with SpotAllocationStrategy set to capacity-optimized

So i think implementing priority based expander is important.

antiqe · 2020-04-04T09:55:02Z

@casey-robertson 👍. Any update regarding the support of spot instances on managed node group ?

dchelupati · 2020-04-08T21:36:03Z

Question: Would you expect to provision a spot node group with vCPU/Mem based inputs and let EKS select the list of instance types?

itssimon · 2020-04-09T09:00:36Z

If that was an additional (optional) feature it might be interesting. But having control over the exact types of instances is more important.

antiqe · 2020-04-09T13:54:52Z

@dchelupati I think having control of the type of instance, it's what's people expect from that feature. A scenario would be providing more power to your CI and optimise the cost or another scenario to increase the power of your cluster based on the amount of preview environment. You probably want to pick up the exact instance type you need.

dchelupati · 2020-04-09T14:12:00Z

@antiqe Thanks for the feedback. I understand why you want to pick the instance type based on the workload and desired performance of the cluster. However, in order to take the most benefit of EC2 Spot instances, we recommend best practices of instance and AZ flexibility. For example, if you need c5.large as preferred instance type for your CI environment, we can create a Spot node group with c5.large and c4.large so you are flexible across instance types. If you prefer, we can even add m4 and m5 to further increase flexibility. If the node group does it on your behalf with a preferred instance type input instead of vCPU and memory, would that work?

cep21 · 2020-04-09T16:41:42Z

For example, if you need c5.large as preferred instance type for your CI environment, we can create a Spot node group with c5.large and c4.large

That works for me, especially since it's best practice. I personally don't need to care about instance type for CPU and memory, but for GPU instances the instance type is very important. There can be big price and performance differences between different types of GPUs.

igrowheart · 2020-07-11T13:19:46Z

@igrowheart The API documentation doesn't tell anything about "spot" instances when creating a managed node group: https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html. I think that blog describes how to launch spot instances through a "regular" "manually" created launch template + autoscaling group.

If you read through the blog I posted and try it out, you will see that it describes a way to use spot instance under managed worker node group, which is the specific request mentioned in this ticket. The control is passed via eksctl instead of other AWS APIs. Let me know if I missed anything here. :)

igrowheart · 2020-07-12T16:43:11Z

This doc also mentions eksctl’s support on spot instances. It’s updated recently in May for the first time.
https://eksctl.io/usage/spot-instances/

For those who voted down on my comments:
We have the same interest to wait for the most wanted feature on EKS and we want those features to benefit our work or apps in the future. I’m just guessing the way the product team is doing, if you think the comment is not in the right direction, please just leave your comments. Do not down voting like a kid. :)

ktumu0225 · 2020-07-12T17:32:27Z

@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.

Dudssource · 2020-07-12T21:03:42Z

@igrowheart my bad. But let's try to not get emotive here and not flood this issue with unnecessary comments.
I downvoted your comment because just like @yourilefers pointed out, currently there's no support for spot instances through the managed node groups official API. What eksctl does is to use cloudformation templates to provision what AWS itself calls 'self managed nodes'. Also in the article you mentioned it's pretty clear, managed nodes for the on demand group and self managed for the spot pool.
And even though it is possible to workaround this by changing the auto scaling groups manually (like @ktumu0225 mentioned), this does not solves this issue as it stands for being capable of provisioning spot instances through managed nodes (officially), this would also enable other tools like terraform to use this feature.

igrowheart · 2020-07-13T04:00:47Z

@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.

Didn't realize the upgrade will break this and the custom user data part. Thanks for the insights!
However, this will help a lot on the dev&test environments.
For production, we need to wait for the General Available of this feature.

igrowheart · 2020-07-13T04:03:26Z

@Dudssource never mind.
Seems I'm too thrilled after I found eksctl can support spot instances :)
Thanks for your time explaining the details. Let's wait for the General Available of this feature.

sarbajitdutta · 2020-07-31T14:27:13Z

@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.

We also did the same thing. We are using Cluster Autosacaler as well. Did you use their recommended settings for instance types or did you configure our own instance types with mixed spot and on-demand in the launch templates?

antiqe · 2020-08-07T15:45:55Z

@tabern Do you have some news from AWS team regarding the used of Spot Instance with Managed Node Groups. Thanks in advance

amazingandyyy · 2020-08-28T06:46:55Z

Any terraform terraform-aws-modules/eks/aws user here, know when it can become a feature for node_groups?

anarsen · 2020-09-16T11:15:57Z

Any terraform terraform-aws-modules/eks/aws user here, know when it can become a feature for node_groups?

I wouldn't count on it becoming available until it's part of the official AWS EKS API. See #583 (comment).

treksler · 2020-10-16T22:12:06Z

Alright, this has been open for almost a year now. Is there any progress?

If this feature is not available soon (within a month), I will be forced to give up on managed node groups and that would be a shame.

Is there a roadmap for the AWS EKS API? Is there an Amazon rep who can speak to this?

rtripat · 2020-10-16T22:19:19Z

@treksler We are actively working on it and appreciate the patience.

deimosfr · 2020-10-16T22:40:30Z

Is there any ETA @treksler ?

treksler · 2020-10-16T23:37:00Z

Is there any ETA @treksler ?
you mean @rtripat

igrowheart · 2020-11-26T16:25:38Z

I saw the status changed to 'Coming Soon'. So I'm expecting this during the re:invent. :)

tabern · 2020-12-01T17:35:02Z

EKS managed node groups now provide native support for EC2 Spot Instances.

When you create a managed node group, simply set capacity type as SPOT and the select one or more EC2 instance types that meet your resource requirements. Managed node groups provision and manage Spot nodes based on the latest Spot best practices. In particular, they enhance your node group's availability by enabling the capacity-optimized allocation strategy and Capacity Rebalancing on all Amazon EC2 Auto Scaling groups they manage.

Learn more

kreempuff · 2020-12-08T15:18:32Z

Is this supported in the config as well?

jonathan-mothership · 2020-12-08T19:05:54Z

Any terraform terraform-aws-modules/eks/aws user here, know when it can become a feature for node_groups?

I wouldn't count on it becoming available until it's part of the official AWS EKS API. See #583 (comment).

@amazingandyyy This is available in PR form here: terraform-aws-modules/terraform-aws-eks#1129

vladimirtiukhtin · 2020-12-11T23:59:59Z

Tried this today. The price is even higher than on-demand's one. What's the purpose?

vladimirtiukhtin · 2020-12-15T23:35:21Z

I must admit that I mixed up "price" and "max price" options. I apologize

jindov · 2021-06-25T08:08:02Z

EKS managed node groups now provide native support for EC2 Spot Instances.

When you create a managed node group, simply set capacity type as SPOT and the select one or more EC2 instance types that meet your resource requirements. Managed node groups provision and manage Spot nodes based on the latest Spot best practices. In particular, they enhance your node group's availability by enabling the capacity-optimized allocation strategy and Capacity Rebalancing on all Amazon EC2 Auto Scaling groups they manage.

Learn more

Announcement

Blog -Amazon EKS now supports EC2 Spot Instances in managed node groups

Love to have the availability to choose the Spot allocation strategy: loweset price or capacity optimized

Reason: we have some secondary nodegroups running in spot instance type and they are acceptable suddenly interruption, it also help us to save cost for non-critical cluster/nodegroup workload

stijndehaes added the Proposed Community submitted issue label Nov 19, 2019

tabern added the EKS Amazon Elastic Kubernetes Service label Nov 19, 2019

tabern mentioned this issue Nov 19, 2019

[EKS] Managed worker nodes #139

Closed

fmedery mentioned this issue Nov 25, 2019

v0.10.2 Not supporting Spot Instances? eksctl-io/eksctl#1600

Closed

bflad mentioned this issue Jul 27, 2020

Add support for spot instances in EKS hashicorp/terraform-provider-aws#14350

Closed

costrouc mentioned this issue Aug 18, 2020

Waiting on AWS EKS Features for completeness nebari-dev/nebari#44

Closed

3 tasks

RichiCoder1 mentioned this issue Oct 27, 2020

feat: LaunchTemplate support for managed node-groups terraform-aws-modules/terraform-aws-eks#997

Merged

2 tasks

barryib mentioned this issue Nov 19, 2020

[Feature Request] Spot for Managed Node Groups terraform-aws-modules/terraform-aws-eks#1107

Closed

4 tasks

tabern closed this as completed Dec 1, 2020

yspreen mentioned this issue Dec 1, 2020

[EKS/Fargate] [request]: Support Fargate Spot for EKS #622

Open

0xlen mentioned this issue Nov 25, 2021

[EKS] [request]: Support Instance Market Options of Spot instances for managed node groups #1575

Open

[EKS] [request]: Spot instances for managed node groups #583

[EKS] [request]: Spot instances for managed node groups #583

Comments

stijndehaes commented Nov 19, 2019 • edited by tabern Loading

tabern commented Nov 19, 2019

stijndehaes commented Nov 19, 2019

gjmveloso commented Dec 4, 2019

gertjangaillet commented Dec 9, 2019

tabern commented Dec 16, 2019 • edited Loading

stijndehaes commented Dec 17, 2019

mambetica commented Dec 19, 2019

jurgenweber commented Jan 8, 2020

AndresPineros commented Jan 14, 2020 • edited Loading

sandrom commented Jan 14, 2020

leepatrick-goop commented Jan 24, 2020

ruecarlo commented Feb 2, 2020

jonathanoliver80 commented Feb 7, 2020

igrowheart commented Feb 13, 2020

jayolmos commented Mar 6, 2020

lsantana486 commented Mar 12, 2020

casey-robertson commented Mar 23, 2020

gauravkohli commented Mar 27, 2020

pc-rshetty commented Mar 30, 2020

antiqe commented Apr 4, 2020 • edited Loading

dchelupati commented Apr 8, 2020

itssimon commented Apr 9, 2020

antiqe commented Apr 9, 2020 • edited Loading

dchelupati commented Apr 9, 2020

cep21 commented Apr 9, 2020

igrowheart commented Jul 11, 2020

igrowheart commented Jul 12, 2020

ktumu0225 commented Jul 12, 2020

Dudssource commented Jul 12, 2020 • edited Loading

igrowheart commented Jul 13, 2020

igrowheart commented Jul 13, 2020

sarbajitdutta commented Jul 31, 2020

antiqe commented Aug 7, 2020

amazingandyyy commented Aug 28, 2020

anarsen commented Sep 16, 2020

treksler commented Oct 16, 2020

rtripat commented Oct 16, 2020

deimosfr commented Oct 16, 2020

treksler commented Oct 16, 2020

igrowheart commented Nov 26, 2020

tabern commented Dec 1, 2020

kreempuff commented Dec 8, 2020

jonathan-mothership commented Dec 8, 2020

vladimirtiukhtin commented Dec 11, 2020

vladimirtiukhtin commented Dec 15, 2020

jindov commented Jun 25, 2021

stijndehaes commented Nov 19, 2019 •

edited by tabern

Loading

tabern commented Dec 16, 2019 •

edited

Loading

AndresPineros commented Jan 14, 2020 •

edited

Loading

antiqe commented Apr 4, 2020 •

edited

Loading

antiqe commented Apr 9, 2020 •

edited

Loading

Dudssource commented Jul 12, 2020 •

edited

Loading