-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow ephemeral-storage capacity overrides for instance types (per node template or provisioner) #2723
Comments
We have an open PR #2554 that's working on surfacing instance store volumes through the AWSNodeTemplate. We are discussing taking that PR a step further where, if you specify this I think eventually, this work extends into Instance Type Settings and #2390. Ideally, we could discover the instance store for a given instance type and always assume that this is being used for ephemeral-storage so that no user-based configuration was needed. |
Would this step further for #2554 also address the case of multiple nvme instance volumes ? Would we need to explicitly map each instance volume separately in the blockDeviceMappings section ? (meaning potentially needing separate aws node templates for each instance type in the same family) #2390 - could be a good option too but it wouldn't support different nvme array setups for the same instance types (ideally we'd allow this for a pair of aws node template (or a provisioner) & instance type) |
Yes, this should address the case of multiple nvme instance volumes; however, you are correct that without #2390, you would have to create a separate Provisioner for each different array setup.
There's some extensions of #2390 that we have thought about where you could proxy instance type setups to create your own "custom" instance type, but that seems a bit further down the line. |
That sounds great, proxy instance type setups could add a lot of flexibility. I guess we can live with either the extension to #2554 (assuming it does not break the bottlerocket boostrap image nvme setup from bottlerocket-os/bottlerocket#1991 (comment)) or with the basic functionality of #2390. Please give us an update once the approximate timeline for availability of any of these options is known. |
Sure @wkaczynski, I think @bwagner5 as the assignee on #2544 should be able to give you a good timeline on that PR to allow the initial NVME functionality. For instance types and #2390, this was put on the backburner in favor of some other work but it should be re-picked up soon. Once the RFC goes in, that should be a good indicator of when the work is about to start. |
just a thought - with separate provisioners (as the provisioners for different array setups (like just different nvme disk counts) would be selected at random) we wouldn't necessarily see instances with the optimal cost selected, right ? (so we could end up getting bigger and more expensive instances than needed) |
I don't think #2554 will take care of this use-case. Even if instance-stores can be mapped as a block device, it doesn't indicate the configuration that the volumes would be used as (i.e. if 2 volumes are mapped does it mean they'll be in a RAID-0, RAID-1, ... etc). I'm wondering if it would make more sense to configure the instance-store volumes within the Karpenter AMI Family itself. We could, by default within the AL2 amiFamily, RAID the volumes and remount where kubernetes components point to storage. |
Are there examples for this? Would this happen inside |
There is on-going work in the eks optimized AL2 AMI to setup a RAID-0 out of instance storage disks and remount containerd and kubelet. Once that PR is merged into the EKS Optimized AMI, we can then set the bootstrap flag within Karpenter to enable the new functionality and adjust the node ephemeral-storage capacity to assume that we'll use a RAID-0 setup for instance types with NVMe instance storage. awslabs/amazon-eks-ami#1171 |
We'll want that option for bottlerocket too. We use a startup container to format and mount a raid array of the disks already, we just need to have a way to properly account for the node ephemeral storage in the kubelet. |
Our current karpenter instances are using EBS volumes since that's what's currently supported by karpenter. We don't need EBS volumes, and would rather use the instance storage for ephermeral. This would save thousands of dollars a month on our AWS bill. Very excited to see this ticket get progress. |
In the meantime, now that the new EKS AMI has userData: |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"
--BOUNDARY
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
/bin/setup-local-disks raid0
--BOUNDARY-- |
@ryanschneider any update on whether or not your suggestion is working? |
@taylorturner we ended up using a custom script since we wanted more control than setup-local-disks provided but I did test it once and it seemed to work. |
Yes, This feature is important |
This would be a very useful feature for us too; took a stab at a possible solution: #4735 |
This is becoming a cost-prohibitive issue for our company as well. |
Following up -- is there any intention to listen to the customers here, and let us save on thousands of dollars of wasted spend, monthly? |
Apologize for the miss on response here. There's a small number of maintainers trying to keep up across a number of requests on the project. We were working hard to push out the beta and some other high-pri features and now that we are unblocked on those, will start to burndown the list of open PRs that are out there. Looking at #4735 at a high-level, it sounds like a fairly reasonable approach to me. It allows the user to specify which way they want to go with their NVME storage and then will configure them as such 🎉 |
@jonathan-innis - Thank you for the prompt and informative follow-up. We all appreciate the enormous amount of work you're all putting in. And thank you for putting eyes on this specific issue, and the accompanying PR. It's really exciting to know it's getting attention, and seems to be coming down the pipeline soon 😎 Thanks again! 🙌🏼 |
For some numbers, we've noticed a consistent 18% of our "EC2-other + EC2 Instances" bill is spent on these non-ephemeral disks, due to the large container images we deploy. This ticket will have a real, material, and noticeable impact on the cost to run services in AWS. |
Tell us about your request
Currently there is no way of letting karpenter know that during the bootstrap of a node with nvme instance volumes, kubelet root is re-mounted to an array created out of the nvme instance volumes effectively changing the ephemeral-storage capacity of the node.
Possible solutions would be:
#2390 seems to offer some interesting options as well
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Without the ephemeral-storage overrides, karpenter will be unable to select instances with ephemeral-storage provided using instance-volume-backed array for pods with ephemeral-storage requirements unless there is an ebs volume added to blockDeviceMappings that is matching the array size (that will effectively be unused)
Are you currently working around this issue?
There is currently no good workaround, in our case we need to add additional requirement to the provisioner (or pods) on "karpenter.k8s.aws/instance-local-nvme" to not provision instances with instance nvme storage smaller than our ebs configuration (otherwise the pods would never be scheduled on the boostrapped nodes). The issue is also that if karpenter choses to add bigger instances it is very likely to overprovison nodes (the pods will eventually schedule on a smaller number of nodes due to ephemeral-storage >> ebs size) and then remove empty nodes.
Another workaround could be to match the ebs size in blockDeviceMappings to the nvme instance total size which would generate additional costs (and the ebs would be effectively unused)
Additional Context
No response
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: