New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

RFC: Add instance type settings design proposal #2390

Closed

jonathan-innis wants to merge 3 commits into aws:main from jonathan-innis:instance-type-settings-design

Contributor

jonathan-innis commented Aug 29, 2022

Fixes #

Description

Adds Instance Type settings design doc and RFC.

How was this change tested?

Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

Release Note

None

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify bot commented Aug 29, 2022 •

edited

Loading

✅ Deploy Preview for karpenter-docs-prod canceled.

Name	Link
🔨 Latest commit	`821aa2a`
🔍 Latest deploy log	https://app.netlify.com/sites/karpenter-docs-prod/deploys/632b54b56022dd000a1dbf1a

jonathan-innis force-pushed the instance-type-settings-design branch 3 times, most recently from a584057 to 8e07c9d Compare

August 29, 2022 17:21

Contributor Author

jonathan-innis commented Aug 29, 2022

#2129, #2180, #1490, #2129, #1803, #1995, and #2161 addressed by this design. RFC from issue owners and others that are interested in the design.

jonathan-innis force-pushed the instance-type-settings-design branch 2 times, most recently from e86a4ae to 6b59714 Compare

August 30, 2022 00:49

njtran previously approved these changes

View reviewed changes

Contributor

njtran left a comment

LGTM. Just one nit

designs/instance-type-settings.md Outdated Show resolved Hide resolved

jonathan-innis dismissed njtran’s stale review via

249bf8f

August 30, 2022 21:11

jonathan-innis force-pushed the instance-type-settings-design branch 2 times, most recently from 249bf8f to 738cea0 Compare

August 30, 2022 21:41

jonathan-innis marked this pull request as ready for review

August 31, 2022 17:53

jonathan-innis requested a review from a team as a code owner

August 31, 2022 17:53

jonathan-innis requested a review from dewjam

August 31, 2022 17:53

jonathan-innis force-pushed the instance-type-settings-design branch from 10dd135 to faf549e Compare

August 31, 2022 17:54

ellistarn reviewed

View reviewed changes

designs/instance-type-settings.md Outdated

+. In the meantime, while most users may not be using either mechanism to achieve system-wide pod overheads, we will still calculate the `kubeReserved` values based on pod density. There are still some open questions based on whether we should open up the ability to specify the strategy type (see below)
+. Max Pods
+. In general, it appears to be an anti-pattern in the Kubernetes community to strictly fix maxPods values (for kubeletConfiguration) on a per-instance type level. Users should be restricting the number of pods on their instances based on pod requests and size of instances as opposed to strict pod density sizing. If users are concerned with Provisioners creating instances that are too large, there is now the option to specify GT or LT operators for setting limits for how large an instance type that is provisioned can get.
+. 🔑 For users that continue to utilize and/or need `--max-pods`, we will surface a `pods` value in the `.spec.allocatable` of the `InstanceType`. This `pods` value will be used for pod density for bin-packing and scheduling. Since there would now be a `max-pods` value surfaced at both the `InstanceType` level and at the `Provisioner` level, we will take the `min(provisioner.maxPods, instanceType.maxPods)` to be the pod density for the instance.

Contributor

ellistarn Aug 31, 2022

IIUC, this is the only field of the instancetype CRD that is capable of influencing user data. I'm a bit wary of this coupling, how it might limit our design decisions in the future, as well as the security implications. Put differently, InstanceType is currently a readonly type and you're proposing to give it write privileges. It currently isn't capable of modifying the configuration of nodes, which is a special privilege of the Provisioner. I'm comfortable using it for binpacking, since this relationship exists today.

Here's a less controversial alternative that functions identically w/ how type InstanceType interface works today, and doesn't sacrifice any use cases.

The InstanceType CRD can expose pods as a value in Allocatable
The maxpods value can be hardcoded in the provisioner's kubelet configuration
Our binpacker will default to what the InstanceType says (which may be overriden by the CRD), but if the provisioner sets a value, that's what we'll use.
If the customer wants to compute a dynamic (w.r.t. instancetype) max pods value, they can do so in user data, and they just need to make sure that this function agrees with the values they've overridden in the instancetype CRD.
If you need different dynamic max pods values for the same instance type but different provisioners, stop and rethink your architecture.

Contributor Author

jonathan-innis Sep 1, 2022

as well as the security implications
I'm not quite sure I get the security implications that you are trying to call out here. Maybe you could describe it further. I see that this changes some read/write assumptions about the InstanceType and how it interacts with the current code logic but I'm not quite sure how this has security implications for the project.

Here's a less controversial alternative that functions
I think the other option to call out is that we can surface podsPerCore in the kubeletConfiguration in the Provisioner which would give users another method to have this be a dynamic value without the overhead of having to write custom userData. If we are going to say that pods in the .sepc.capacity can't affect --max-pods, I would rather just punt having pods included entirely.

ellistarn reviewed

View reviewed changes

designs/instance-type-settings.md Outdated Show resolved Hide resolved

ellistarn reviewed

View reviewed changes

designs/instance-type-settings.md Outdated


		🔑 Open Questions

		1. There is still [users asks](https://github.com/aws/karpenter/issues/1803) for calculating the kubeReserved resources value based on the GKE calculation of the kubeReserved resources value as opposed to the EKS default which is based on pod density. Should we allow some flag on the provisioner that can determine which way to base the calculation on? i.e. `kubeReservedMemoryStrategy`

Contributor

ellistarn Aug 31, 2022

I'd love for us to do the work and figure out containerd overhead in 2022, and then provide customers with an accurate runtime class.

ellistarn reviewed

View reviewed changes

designs/instance-type-settings.md Show resolved Hide resolved

Contributor

ellistarn commented Aug 31, 2022

🚀 🎸 🎸 🚀

jonathan-innis force-pushed the instance-type-settings-design branch 2 times, most recently from ff2d088 to 6659fe2 Compare

September 1, 2022 06:40

chrisnegus reviewed

View reviewed changes

Member

chrisnegus left a comment

This reads really well. I just have few small suggestions.

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

designs/instance-type-settings.md Outdated Show resolved Hide resolved

jonathan-innis changed the title ~~docs: Add instance type settings design proposal~~ docs: RFC: Add instance type settings design proposal

AndreasSko reviewed

View reviewed changes

designs/instance-type-settings.md Outdated

+. If we were to deny overlapping `InstanceTypeSettings` , then the feature of specifying a group of instance types may not be used that often as configuration between instance types may rarely be shared
+. It’s unclear whether users would want have enough common configuration that they would assign across instance types. It makes more sense to iterate on a grouping mechanism later if there is a user ask for it
+              **Recommendation:** Use `InstanceType` CRD for initial iteration for 1-1 mappings before considering introducing more complex constructs to the user. While specifying rulesets might reduce the overhead burden for customers to keep track of separate `InstanceTypeSettings` , ultimately without an API surface to view exactly what settings that ruleset will produce, it will ultimately be confusing for users.

AndreasSko Sep 19, 2022

For our use case (similar to #1900 where nodes offer fuse devices via the smarter device manager) the recommended approach of having a 1-1 mapping would create quite a significant overhead and burden: For every instance type and size, we would need to create an InstanceType resource and configure it. Even with templating this is quite a bit of effort and seems prone to errors. Also, we would need to keep the list of available instance types up to date, increasing the maintenance work. So going with the second option sounds like a more sustainable approach to me personally.

Contributor Author

jonathan-innis Sep 21, 2022 •

edited

Loading

recommended approach of having a 1-1 mapping would create quite a significant overhead and burden

How would you get around having this burden? My assumption is that each instance type wouldn't have a static number of resources that match the smarter-devices/fuse so you would need some dynamic way to configure the quantities for each of the different instance types, or at least a collection of instance types.

This becomes a problem when extending out to other resources. Even assuming that there is a consistent quantity of resources for the smarter-devices/fuse resource across all instance types, that certainly wouldn't be true for other extended resources.

Also, we would need to keep the list of available instance types up to date

This works as an override mechanism, so you would still have those instance types available if they didn't have a matching CR for them. The only difference there would be that the extended resource wouldn't be mapped onto it, which would still be something that would need to be addressed with the other "collection-based" approach.

Contributor Author

jonathan-innis Sep 21, 2022

Also, the other thing that we were thinking with the approach proposed in this design, is we would offer some tooling/guidance around how to programmatically generate these values into CRs, so that the maintenance burden is reduced.

jonathan-innis added 2 commits

September 21, 2022 11:10


          Add instance type settings design doc

625c0df


          Change resources to capacity

6dac178

jonathan-innis force-pushed the instance-type-settings-design branch from 6659fe2 to 6dac178 Compare

September 21, 2022 18:10


          Update based on PR comments

821aa2a

jonathan-innis changed the title ~~docs: RFC: Add instance type settings design proposal~~ RFC: Add instance type settings design proposal

jonathan-innis mentioned this pull request

feat: Introduce InstanceType CRD for overriding instance type resources #2404

Closed

3 tasks

Contributor Author

jonathan-innis commented Oct 6, 2022

Summarizing most feedback that we've heard. It seems like we might want to consider a mechanism for "selecting" sets of instance types so that we can ease the customer burden of having to maintain a 1:1 mapping.

An alternative solution here may be to keep the 1:1 mapping and provide some documentation around manifest generation for these InstanceType CRDs.

Though this might be more confusing to reason about; in general, it will cause less overhead for users.

ellistarn mentioned this pull request

fix: permit custom device requests #2161

Closed

3 tasks

wkaczynski mentioned this pull request

Allow ephemeral-storage capacity overrides for instance types (per node template or provisioner) #2723

Closed

ellistarn mentioned this pull request

Savings plan support #2259

Open

haarchri commented Dec 5, 2022

any chance to see this in one of the coming releases ?

universam1 commented Dec 26, 2022 •

edited

Loading

@jonathan-innis @ellistarn We would really like to move out of the fork of #2161 for a significant number of clusters. We are forced to where otherwise it is be a blocking no-go for Karpenter.
In case the final solution is pending longer, would you please reconsider merging #2161 for the time being, as it does provide the MVP and is tested well by hundreds of nodes to be working as expected?
Thank you

jonathan-innis mentioned this pull request

Adjust blockDeviceMappings volume size depending on instance size. #3111

Closed

njtran mentioned this pull request

Add support for Elastic Fabric Adapter (EFA) to Karpenter #3127

Closed

BryanStenson-okta mentioned this pull request

nitro enclave support awslabs/amazon-eks-ami#1148

Closed

Contributor Author

jonathan-innis commented Jan 18, 2023 •

edited

Loading

@universam1 We've talked a bit about this internally. I think we're willing to take a short-term experimental path that would unblock you in the main project. Ideally, the hope is that we can get a more robust solution that should solve for all the problems in the RFC going forward, but right now we haven't put this at super high priority.

Here's the thoughts:

The long-term solution is still under design and is deproiritized for the time-being but one idea that we have is to expose the InstanceTypes offered by a cloudprovider as CRs on the cluster. This would allow you to either SSA changes to the InstanceTypes or to create mutating webhooks that could receive the InstanceTypes and then edit them with anything that you need injected into them (like additional extended resources). Thoughts on this one?
As a short-term solution, we’d like to add an entry to the current karpenter-global-settings configMap (aws.instanceTypeOverrides) that exposes any extended resources (on top of the known ones) through a mapping from instance types to a v1.ResourceList like the example below. This would be an alpha feature and would be highly likely to change at some point in the future in favor of something more akin to 1 that would be a more robust long-term solution. Would you be interested in contributing to this short-term solution?

aws.instanceTypeOverrides: "[{"name": "c5.large", "capacity":{"fuse: "1", "other-extended-resource": "2"}}]"

Contributor

ellistarn commented Jan 18, 2023

Perhaps we can model it based off of the instance type struct? { "name": "c5.large", "capacity": {"fuse": "1", "other-extended-resource": "4"} }

universam1 commented Jan 23, 2023

Thank you @jonathan-innis for considering this request and taking into account several options to solve it. Very much appreciated!

We are certainly happy to contribute to a solution - if you can provide some more details where you would see it fit into and what you'd specifically like to see, that would help getting us started.
No issue with alpha grade solution that is likely going to change.

One requirement though from our side is to be agnostic to instance type and size. It is impossible for us to predefine all variants we like to support ( type x size ).
A simple globbing or regex support in aws.instanceTypeOverrides matcher doesn't sound hard for me to add and should meet our needs just fine. Thank you very much for consideration!

Contributor Author

jonathan-innis commented Jan 23, 2023

It is impossible for us to predefine all variants we like to support

Can you explain why this is the case? If you want all types of the size .xlarge, you could just get the instance type names from an aws ec2 describe-instance-types call and then include these in the settings using some programatic means.

A simple globbing or regex support

I'm not really a fan of opening up this settings schema to regex-based pattern matching. It doesn't seem particularly explicit in its design and implementation and seems overly-complicated given what we are trying to implement.

universam1 commented Jan 23, 2023 •

edited

Loading

include these in the settings using some programatic m

The instances we permit the devs. to request or allow Karpenter to choose, sum up at roughly 600 types and versions. Say anything >1 core, >3GB, incl. amd64 or arm64. Having to manage that, means updating a huge list and/or keeping hundreds of developers informed about what instances are whitelisted or why not.
Offloading this type decision is the selling point of Karpenter, and we love it for that. Feels like this undermines the whole concept.

seems overly-complicated given what we are trying to implement

Could you explain please why this is complicated to implement? I believe few lines of code would suffice. In contrast coding an API or generator for Karpenter definition is a huge effort to implement.
It would mean that Karpenter now submits to an external API that it would have to comply with, which could easily become an unpleasant dependency for the project!?

It doesn't seem particularly explicit in its design

Alternative, the type and size could be optional fields. Then the match would be explicitly undefined.

Contributor Author

jonathan-innis commented Jan 23, 2023 •

edited

Loading

I'm less a fan of the regex solution and might be swayed toward doing a requirements-based solution that allows you to create disparate objects that contain requirements (of which instance-type name can be one) and capacity details with a merge semantic. Would this suite the case you are describing and be more maintainable for you?

[
   {
      "requirements":[
         {
            "key":"karpenter.k8s.aws/instance-size",
             "operator":"In",
             "values":["large"]
         }
      ],
      "capacity":{
         "fuse":"1",
         "other-extended-resource":"4"
      }
   }
]

In the ConfigMap, it would look like

"aws.instanceTypeOverrides": "[{"requirements":[{"key":"karpenter.k8s.aws/instance-size","operator":"In","values":["large"]}],"capacity":{"fuse":"1","other-extended-resource":"4"}}]"

universam1 commented Jan 23, 2023

might be swayed toward doing a requirements-based solution

That's a great one! I think the list of instance sizes is not that of an issue to manage. In fact I could imagine a hack to invert to a "NotIn: small" if we really want to permit all sizes but those irrelevant ones.

Thank you!
How do you like us to proceed?

Contributor Author

jonathan-innis commented Jan 23, 2023

I think the list of instance sizes is not that of an issue to manage

On this same train of thought, what if there was a resources field in the Provisioner API that allowed you to specify the amount of extended resource that a Node that a Provisioner launched would have (we would assume this amount for inflight scheduling until the Node resources actually registered), do you feel like this type of solution would also work well for you? I'm assuming if you have a set of requirements that are something like "only use instance types that aren't of the small variety" that this would map naturally to Provisioner requirements, where you could add these extended resources.

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  resources:
    requests:
      fuse: "2" # We assume this much until the node registers
      some.example.com/extended-resource: "500Mi" # We assume this much until the node registers

universam1 commented Jan 24, 2023

what if there was a resources field

I like this semantic actually most, a single authoritative place for Karpenter config! 👏🏻

we would assume this amount for inflight scheduling until the Node resources actually registered

AFAIR Karpenter does not actively assign pods to nodes any more like it used to, leaving that to kubescheduler? Great, we'll only have to ensure that the smarter-device-manager daemonset is up in time and has patched the custom resources on the nodes, before Karpenter enters a new loop for unscheduled pods.... I guess we can tame such a flapping via startupTaints to have a defined cool down delay!? I think this should work fine!

I'm assuming if you have a set of requirements that are something like "only use instance types that aren't of the small variety" that this would map naturally to Provisioner requirements

Indeed, main part of our specs contain

spec:
  requirements:
    - key: karpenter.k8s.aws/instance-size
      operator: NotIn
      values:
        - nano
        - micro
        - small

Great idea!

Contributor Author

jonathan-innis commented Jan 24, 2023 •

edited

Loading

before Karpenter enters a new loop for unscheduled pods

We handle this with the concept of "node initialization". We wait for the node to go ready and have all requested resources registered before we consider pending pods for scheduling to new nodes; so, you shouldn't have to handle this cool down period yourself. Once we are aware that a resource should appear on the node, we wait until we see it before we consider it empty or not eligible to schedule pods.

jonathan-innis added the needs-design label

haarchri commented Mar 6, 2023

thanks for this RFC - i wonder what is the current status ?

Contributor Author

jonathan-innis commented Mar 6, 2023

@haarchri We're keeping this open to keep the conversation going, since this is the largest item that is currently tracking and creating pointers to all the open issues. There's been a lot of discussion among the maintainers and contributors around how we should attack a lot of these disparate problems, so the RFC is still unsettled and probably won't be until the project settles and merges a lot of the other current feature work:

jonathan-innis mentioned this pull request

virtual gpu support #3535

Closed

Contributor Author

jonathan-innis commented Apr 4, 2023

As mentioned here, there is still a lot of uncertainty on the plan of attack with how to handle custom instance type settings. The issues mentioned in this initial RFC will be addressed at a later time, with a rewritten RFC. Closing this for now.

jonathan-innis closed this

universam1 commented Apr 19, 2023 •

edited

Loading

@ellistarn @jonathan-innis Trying to get a feeling for the time to expect for your long-term solution. We are wondering if we should maintain a fork with an intermediate solution for the time being or if there are other options possible?
As an idea, could we guard this #2161 implementation behind an alpha flag to unblock those users willing to take the risk for the time being? Would that be a feasable approach?

jonathan-innis mentioned this pull request

Add Instance Types Requirements for Windows Nodes #2730

Closed

universam1 mentioned this pull request

feat: support unknown resources kubernetes-sigs/karpenter#603

Closed

ellistarn mentioned this pull request

docs: Add instance-type-pricing design document #6299

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

chrisnegus chrisnegus left review comments

AndreasSko AndreasSko left review comments

ellistarn ellistarn left review comments

njtran njtran left review comments

dewjam Awaiting requested review from dewjam dewjam is a code owner automatically assigned from aws/karpenter

Labels