-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EC2 Fleet Autoscaling Support? #838
Comments
According to this official blog post on EC2 Fleets, auto-scaling group support is still in progress. They say:
Thought it's worth mentioning since it's greatly affects design/implementation. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
@zdoherty @itskingori @jfoy I think this work is no longer blocked - ec2 fleets now support ASGs (or vice versa?) |
@geota I believe this is solved by that release. It's now possible to have an ASG with spot instances in it...so in theory this should "just work" with the cluster autoscaler. I'm planning on doing some testing next week. |
Any update on this? We are trying to achieve autoscaling with EC2-fleet. |
Spot Fleet and EC2 Fleet seem to have been superseded by this: https://aws.amazon.com/blogs/aws/new-ec2-auto-scaling-groups-with-multiple-instance-types-purchase-options/ It doesn't quite have all the same features but it should fit most use cases. We've set it up with 100% spot instances of two different types using terraform (although you can have a mixture of spot and on demand like in EC2 fleet). The cluster autoscaler works very nicely with it. 😄 |
@cablespaghetti Thanks a lot. But does the present cluster autoscaler support Spot instances(EC2 fleet or spot fleet), from what I know, it was just supporting on-demand instances. Also, will there be service interruption with this approach? |
It doesn't know or care what kind of instances are behind the ASG. It just increases or decreases the "Desired Count". The ASG config controls what instance types are used auto-magically. I'll upload my terraform config which is 100% spot but it's easy to have a base of X on demand instances and then use spot (see the linked blog post). |
That is not true. CA works by performing careful simulations to see if adding more nodes would help pending pods. It needs to know exactly how a new node will look like to make good decisions. The actuation is indeed done by increasing "desired count", so your setup will work to an extent. But it also means CA makes decisions based on incorrect data (it will make prediction how next node will look like, but in your setup it will be incorrect most of the time). Effectively you're going from "carefully choose how many nodes of each kind will be the best fit for your pods" to "let's add some nodes and hope for the best". It may be good enough for a very simple cluster (single ASG, each pod can run on every type of node available), but it's not something we recommend. As of now our official policy is that all nodes in NodeGroup must be strictly identical and things like multi-instance type ASGs or multi-zonal ASGs are not officially supported by CA. |
@MaciekPytel , so we can't use CA with spot instances (or EC2-fleets to be more precise)? |
To clarify: Cluster Autoscaler cares very much how the nodes (as in, Kubernetes Node objects) look like. It doesn't care about the underlying instance directly. Most often, instance type will determine the node's scheduling properties, like allocatable resources. But if the nodes running on spot instances are indistinguishable from nodes running on regular instances, it should be OK to mix those. |
Has there been any progress on this? Are we able to increment spot fleets desired count at will? |
So if your using an ASG backed by LaunchTemplate + MixedInstancePolicy how does the CA determine the node capacity per ASG to simulate? .. given they can be backed by a mixture of instance types |
Precisely because of that reason MixedInstancePolicy is not officially supported by CA (as of now). You can still set it up using the config provided by @cablespaghetti or creating a similar one of your own. If you do CA will just take one instance type and assume each node will look exactly like that. Depending on your exact setup and your luck it may or may not result in correct scaling decisions.
There is an ongoing effort to make it work #1473. Note that it still assumes all the instance types in ASG are (roughly) the same size. |
I think we should be able to simulate a behavior from projects like Spotinst. For me the main requirement is to be able to use Spot instances BUT fallback to On-Demand whenever there are no spot instances available. Then replace the On-Demand whenever Spot are available again. The current support of EC2 Fleets in the ASGs doesn't consider this (I think, please correct if I'm wrong... and I hope I am). They just give you a base % for On-Demand and the rest for Spot, but if you don't have enough spot instances because AWS interrupts them, the ASG won't replace with On-Demand and then move back to Spot whenever possible. So, we're screwed and still depending on luck. I think this could be VERY easily solved by the cluster-autoscaler if it allowed giving priorities to ASGs when scaling up. I could have two ASGs, one with a MixedPolicy pointing to multiple spot instance pools and another with my On-Demand instances. If I could configure the CA to always try to upscale using the Spot instance ASG but if not possible to use the On-Demand, we would have the same behavior as Spotinst. EDIT: I think they are already working on this, by allowing the price based expander. This would be even better because prices would be calculated dynamically, but it is a much more complex feature than just letting a user pick the priorities for the ASGs. I'd simply do something like:
|
I'd really like seeing native support for this in As a sidenote, there is also k8s-spot-rescheduler which might be of interest to the discussion. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecyle stale |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Can we close this issue now? #1886 has merged and been cherry-picked back to 1.14. There is also documentation about how to use Spot + On-Demand in the same ASG. There are known limitations, including the roughly-same-sized instances restriction that @MaciekPytel mentions, but I believe the gist of this issue has been completed. @zdoherty, can you share your thoughts? Thanks! |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
MixedInstancePolicy instance pool already support 1 instance type. (It was 2 before). From feature perspective, there's no different compare to EC2-Fleet. we can close this issue. Feel free to reopen if anyone has questions https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_InstancesDistribution.html |
/close |
@Jeffwan: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…modules/github.com/onsi/ginkgo/v2-2.10.0 Bump github.com/onsi/ginkgo/v2 from 2.9.7 to 2.10.0
Amazon recently released a feature called EC2 Fleets which appears to consolidate spot fleet requests with EC2 on-demand/auto-scaling group requests. Per their documentation, this appears to support a similar feature to desired capacity in auto-scaling:
target-capacity appears to be very similar to spot fleets weighted capactity, but you're able to change it over time. Being able to change the target-capacity parameter over time seems to align closely with changing the DesiredCapacity parameter of an auto-scaling group. Are there any plans to support EC2 fleets with Kubernetes autoscaling?
The text was updated successfully, but these errors were encountered: