Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uneven scale-up of AWS ASG's #2020

Closed
danmcnulty opened this issue May 15, 2019 · 45 comments
Closed

Uneven scale-up of AWS ASG's #2020

danmcnulty opened this issue May 15, 2019 · 45 comments
Assignees
Labels
area/provider/aws Issues or PRs related to aws provider lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@danmcnulty
Copy link

danmcnulty commented May 15, 2019

Hi,
I'm testing the cluster autoscaler on our AWS EKS 1.12 cluster.
I created 3 identical ASG's in zones a/b/c, and created a test deployment using a basic nginx pod, which I scale up with commands like
kubectl -n playground scale --replicas=4 deployment nginx-scaleout
I've sized the pods so that 2 will fit on each node.

I started with 3 nodes, one per AZ, and began scaling up the deployment. I saw it add nodes evenly at first so that each zone had 2 nodes. I then scaled up further until I had 3/3/2 nodes across the zones (so far so good), but the next time it scaled up it added a fourth in zone A so I had 4/3/2, but I'm unsure why it did this instead of adding a new node in zone C?

The relevant part of the log is this:
I0515 10:17:35.100431 1 static_autoscaler.go:121] Starting main loop
I0515 10:17:35.133540 1 leaderelection.go:227] successfully renewed lease kube-system/cluster-autoscaler
I0515 10:17:35.743084 1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: [eks-zeus-autoscaler20190501200602699100000004 eks-zeus-scalemultiaz-a-20190514154009657700000006 eks-zeus-scalemultiaz-b-20190514154009657700000004 eks-zeus-scalemultiaz-c-20190514154009657700000005]
I0515 10:17:35.845006 1 aws_manager.go:157] Refreshed ASG list, next refresh after 2019-05-15 10:17:45.84499724 +0000 UTC m=+45941.859559752
I0515 10:17:35.845350 1 utils.go:552] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0515 10:17:35.845378 1 static_autoscaler.go:252] Filtering out schedulables
I0515 10:17:35.942508 1 static_autoscaler.go:262] No schedulable pods
I0515 10:17:35.942538 1 scale_up.go:263] Pod playground/nginx-scaleout-85cf87558d-z2pmd is unschedulable
I0515 10:17:35.942546 1 scale_up.go:263] Pod playground/nginx-scaleout-85cf87558d-w2h2j is unschedulable
I0515 10:17:35.942667 1 scale_up.go:300] Upcoming 0 nodes
I0515 10:17:35.942721 1 utils.go:208] Pod nginx-scaleout-85cf87558d-z2pmd can't be scheduled on eks-zeus-autoscaler20190501200602699100000004, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector
I0515 10:17:35.942853 1 utils.go:198] Pod nginx-scaleout-85cf87558d-w2h2j can't be scheduled on eks-zeus-autoscaler20190501200602699100000004. Used cached predicate check results
I0515 10:17:35.942869 1 scale_up.go:406] No pod can fit to eks-zeus-autoscaler20190501200602699100000004
I0515 10:17:36.042655 1 waste.go:57] Expanding Node Group eks-zeus-scalemultiaz-a-20190514154009657700000006 would waste 50.00% CPU, 47.28% Memory, 48.64% Blended
I0515 10:17:36.042693 1 waste.go:57] Expanding Node Group eks-zeus-scalemultiaz-b-20190514154009657700000004 would waste 50.00% CPU, 47.28% Memory, 48.64% Blended
I0515 10:17:36.042705 1 waste.go:57] Expanding Node Group eks-zeus-scalemultiaz-c-20190514154009657700000005 would waste 50.00% CPU, 47.28% Memory, 48.64% Blended
I0515 10:17:36.042721 1 scale_up.go:418] Best option to resize: eks-zeus-scalemultiaz-a-20190514154009657700000006
I0515 10:17:36.042732 1 scale_up.go:422] Estimated 1 nodes needed in eks-zeus-scalemultiaz-a-20190514154009657700000006
I0515 10:17:36.042894 1 scale_up.go:501] Final scale-up plan: [{eks-zeus-scalemultiaz-a-20190514154009657700000006 3->4 (max: 6)}]
I0515 10:17:36.042918 1 scale_up.go:579] Scale-up: setting group eks-zeus-scalemultiaz-a-20190514154009657700000006 size to 4
I0515 10:17:36.042963 1 auto_scaling_groups.go:203] Setting asg eks-zeus-scalemultiaz-a-20190514154009657700000006 size to 4

And my configuration looks like this:
`spec:
containers:

  • command:
    • ./cluster-autoscaler
    • --v=4
    • --stderrthreshold=info
    • --cloud-provider=aws
    • --skip-nodes-with-local-storage=false
    • --expander=least-waste
    • --balance-similar-node-groups
    • --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/zeus
      image: k8s.gcr.io/cluster-autoscaler:v1.12.3`

Any help would be appreciated, Thanks.

@Jeffwan
Copy link
Contributor

Jeffwan commented May 15, 2019

/assign

@Jeffwan
Copy link
Contributor

Jeffwan commented May 20, 2019

/sig aws

@vikaschoudhary16
Copy link

by default it is "random" expander, if that is the case, scenario you described does not seem highly improbable.Does it happen each time?

@danmcnulty
Copy link
Author

We're using the least-waste expander presently, but in this case the calculations for each of the ASG's is identical, so shouldn't it then choose to scale up to make the ASG's balanced?

@MaciekPytel
Copy link
Contributor

None of existing expanders cares about zone balancing, so it shouldn't matter which one you use. CA has a separate mechanism for balancing: it finds 'similar' NodeGroups (ASGs) and splits any scale-up between them. This happens after the expander makes a decision and it shouldn't depend on expander at all.
My guess would be that your ASGs are not actually 'similar' according to the definition used by CA (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/balance_similar.md#similar-node-groups). I'd look at the set of labels the nodes in each ASG have and see if they're identical except for Kubernetes defined zone and host labels.

@danmcnulty
Copy link
Author

All of the nodes have identical labels except zone and hostname - we used terraform to create and tag the ASG's they reside in.

`kubectl get nodes -l environment=playground --show-labels
NAME STATUS ROLES AGE VERSION LABELS
ip-10-0-173-7.eu-west-1.compute.internal Ready 8m6s v1.12.7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t3.medium,beta.kubernetes.io/os=linux,cluster=zeus,environment=playground,failure-domain.beta.kubernetes.io/region=eu-west-1,failure-domain.beta.kubernetes.io/zone=eu-west-1a,kubernetes.io/hostname=ip-10-0-173-7.domain.com,nodegroup=scalemultiaz,workload=scalemultiaz

ip-10-0-187-220.eu-west-1.compute.internal Ready 7m59s v1.12.7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t3.medium,beta.kubernetes.io/os=linux,cluster=zeus,environment=playground,failure-domain.beta.kubernetes.io/region=eu-west-1,failure-domain.beta.kubernetes.io/zone=eu-west-1b,kubernetes.io/hostname=ip-10-0-187-220.domain.com,nodegroup=scalemultiaz,workload=scalemultiaz`

@Jeffwan
Copy link
Contributor

Jeffwan commented May 21, 2019

for least-waste, it's backed by random strategy.

func NewStrategy() expander.Strategy {
return &leastwaste{random.NewStrategy()}
}

In your case, I think all 3 nodegroups are qualified, and random strategy will work and pick one of them.

@meringu
Copy link

meringu commented May 28, 2019

We’re hitting this issue in our clusters too. I think I know why.

We have 3 ASGs with c5.2xlarge spread across 3 AZs. Looks like when Amazon creates an EC2 instance either 15835076Ki or 15835084Ki total memory is provisioned for the VM. This is what is reported in the node status.capacity.memory, and verified with free -k on the node.

When the cluster autoscaler attempts to discover similar node groups, it requires an exact match in memory capacity here:

// For capacity we require exact match.
// If this is ever changed, enforcing MaxCoresTotal and MaxMemoryTotal limits
// as it is now may no longer work.

Also seems like node groups that are scaled to zero, get a different memory capacity again, maybe this?

Do we need some sort of tolerance in the capacity comparison, similar to the allocatable and free comparisons?

@Jeffwan
Copy link
Contributor

Jeffwan commented May 29, 2019

@meringu If that's the case, we have some options

  1. Figure out why some instance have different memory provisioned. I can check with EC2 team on this. I suspect not all c5.2xlarge use exact same memory? Also, ec2_instance_type mapping has to be match with provisioned memory.

  2. Use comparison with tolerance. My concern is this is common code and if that's only aws issues, we should not touch this.

@MaciekPytel
Copy link
Contributor

MaciekPytel commented May 29, 2019

Regarding point 2 - the comparison logic lives in processor, ie. it's hidden behind an interface specifically to allow adding custom implementations without touching common code. If required you should be able to customize the logic by adding your own implementation of https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodegroupset/nodegroup_set_processor.go. You may be able to reuse most of implementation of balancing processor (default), just change the comparison logic.

@Jeffwan
Copy link
Contributor

Jeffwan commented May 29, 2019

Regarding point 2 - the comparison logic lives in processor, ie. it's hidden behind an interface specifically to allow adding custom implementations without touching common code. If required you should be able to customize the logic by adding your own implementation of https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodegroupset/nodegroup_set_processor.go. You may be able to reuse most of implementation of balancing processor (default), just change the comparison logic.

Thanks for guidance! @MaciekPytel

@Jeffwan
Copy link
Contributor

Jeffwan commented May 30, 2019

@meringu I will try to reproduce this issue on my end and check with EC2 team at the same time. It will take some time and come back to you later then

@meringu
Copy link

meringu commented Jun 13, 2019

Any update on the issue @Jeffwan?

As I understand, this issue would be causing --balance-similar-nodegroups to not work for all AWS users of the autoscaler who configure an AutoScaling group per instance type per Availability Zone.

@Jeffwan
Copy link
Contributor

Jeffwan commented Jun 14, 2019

@meringu Sorry I was on call in past two weeks. Get some times this week to check this issue.

@meringu
Copy link

meringu commented Jun 14, 2019

Thanks @Jeffwan. Let me know if there is anything I can help with.

@jhohertz
Copy link
Contributor

I just ran into a similar thing, with a kops-built cluster, and I know exactly why it is ignoring the balance-similar-node-groups flag.

The problem, at least in my case, and it sounds similar to the above description, is that kops is adding it's own labelling to the node, IE: kops.k8s.io/instancegroup: nodes-us-east-1a, meaning that groups do not match on the labelling.

One possible fix, specific to kops clusters, would be to add the kops.k8s.io/instancegroup string to the ignoredLabels here

Not sure if other tools IE: for EKS have well-known labels like this that could also be added, or if the better solution would be to add a flag to let additional ignore labels be specified at runtime.

Thoughts?

@meringu
Copy link

meringu commented Jun 18, 2019

Not the case for our EKS cluster. We do have some control over the labels. The only different node labels on our clusters are in the ignoredLabels list.

It is the capacity check in our case, as the logs show it sometimes finds a similar node group or two, depending on if the node group has any instances, and a bit random as AWS doesn't give the exact amount of memory everytime.

@meringu
Copy link

meringu commented Jun 24, 2019

Hi @Jeffwan, did you get a change to look at this last week? Is there anything I can help with?

We are happy to contribute engineer time if that would be helpful.

@Jeffwan
Copy link
Contributor

Jeffwan commented Jun 24, 2019

@meringu Sorry for late response. It would be helpful to help identify the problem. I am trying to see if this can be easily reproducible. In the ca log @danmcnulty provide, it failed node selector, could you share the logs and make sure it failed on mismatch of memory?

Pod nginx-scaleout-85cf87558d-z2pmd can't be scheduled on eks-zeus-autoscaler20190501200602699100000004, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector

I did some search and didn't find any clues of memory mismatch for one instance. Could you file a service ticket to AWS ec2 team? (I don't have all your details)

@meringu
Copy link

meringu commented Jun 24, 2019

Unfortunately there are no logs for the comparisons, you can see in my above link for the IsNodeInfoSimilar function, that there are no log statements. See my above for the memory disparity.

Summarised:

c5.2xlarge:
Cluster Autoscaler is coded to 16384MB
Some nodes free -k 15835076Ki
Other nodes free -k 15835084Ki

I'll raise a support ticket asking them about the difference in memory from instance to instance, and the why the advertised memory is different again.

@leonsodhi-lf
Copy link

@meringu this sounds like a similar problem to what I'm facing. In the event it is relevant, you may wish to skim aleksandra-malinowska's comment. Here's a snippet:

That's probably because actual node's memory is slightly different from predicted memory based on machine type (due to kernel reservation, specific to a given OS/machine combination)

@danmcnulty
Copy link
Author

@Jeffwan Just a quick note on the logs I provided - the ASG with prefix "eks-zeus-autoscaler" was used for something unrelated by us, and so it was expected that this group didn't match the node selector.

The question I have is about the 3 similar groups with prefixes:
eks-zeus-scalemultiaz-a
eks-zeus-scalemultiaz-b
eks-zeus-scalemultiaz-c
Which are the ones with identical labels, which scaled up un-evenly.

@meringu
Copy link

meringu commented Jul 4, 2019

Hey team,

I've heard back from AWS about the memory capacity disparity.

For the difference reported on the running hosts, I was using two different AMIs with different kernel versions. After updating all the instances to the newer AMI, they all report the same memory capacity.

With regards to the difference in reported in the API vs reported on the node, I got this response:

Pricing API provides total memory that comes with the instance type (which is also displayed in our websites). However, there will be certain memory that will reserve for the use of kernel , BIOs etc.
Hence total available memory for the use will be less than listed in pricing API

This explains why two different AMIs can have different memory capacity, as they can have different kernel versions etc. This also means the capacity in the pricing API will never match the reported capacity from the node.

So now looking at this comment in the code:

// For capacity we require exact match.
// If this is ever changed, enforcing MaxCoresTotal and MaxMemoryTotal limits
// as it is now may no longer work.
A course of action could be to look at a different way to satisfy the MaxMemoryTotal requirement so we can implement a tolerance on the capacity check.

@meringu
Copy link

meringu commented Jul 5, 2019

Or I could disable scaling to zero for my ASGs. This means extra overhead however.

@k8s-ci-robot k8s-ci-robot added area/provider/aws Issues or PRs related to aws provider and removed sig/aws labels Aug 6, 2019
@ewoutp
Copy link

ewoutp commented Sep 12, 2019

Isn't it possible to simplify the "similar check" to comparing just the instance type?

@jhohertz
Copy link
Contributor

@ewoutp You could have multiple node pools of the same instance type that you want scaled separately for one reason or another.

The main issue is where you have pools that are the same for all intents across differing zones, and some tag/label is different between the zones and getting in the way. See this PR for what I am talking about: #2207

@Robert-Stam
Copy link

Not the complete check, however for the CPU and especially the memory part (which seems to be a little off for now) ?

@ewoutp
Copy link

ewoutp commented Sep 13, 2019

@jhohertz That is true. However I suspect there are many cases where comparison on instanceType alone are sufficient (and working very well).
Shouldn't that at least be an option?

@sulixu
Copy link

sulixu commented Oct 8, 2019

I have 3 ASGs with c5.9xlarge spread across 3 AZs, however when CA tries to make a decision on splitting scale-up nodes between the node groups, most of the times it split among 2 AZs. Why do we want to check the 5% free resources? Aren't labels, capacity and allocatable sufficient enough to compare the node group similarity?

@cdmurph32
Copy link
Contributor

cdmurph32 commented Oct 16, 2019

I'm seeing this issue with m5.xlarge.
One node group reports Memory:16404668416 and another Memory:16228507648, for a difference of 176MB.

&nodeinfo.Resource{MilliCPU:4000, Memory:16228507648, EphemeralStorage:48307038948, AllowedPodNumber:58, ScalarResources:map[v1.ResourceName]int64{"attachable-volumes-aws-ebs":25, "hugepages-1Gi":0, "hugepages-2Mi":0}}

&nodeinfo.Resource{MilliCPU:4000, Memory:16404668416, EphemeralStorage:48307038948, AllowedPodNumber:58, ScalarResources:map[v1.ResourceName]int64{"attachable-volumes-aws-ebs":25, "hugepages-1Gi":0, "hugepages-2Mi":0}}

Updating MaxMemoryDifferenceInKiloBytes to 256000 fixes the issue. https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodegroupset/compare_nodegroups.go#L36

For those using the AWS CNI with custom networking, the label "k8s.amazonaws.com/eniConfig" needs to be whitelisted as well.

@JacobHenner
Copy link

I'm seeing this issue with m5.xlarge.
One node group reports Memory:16404668416 and another Memory:16228507648.

Seen here on m5.large

@bazzargh
Copy link

bazzargh commented Nov 7, 2019

We're hitting this same issue; we can see from the logs that k8s decided only 2 of our 3 nodegroups were similar (by looking at the "Splitting..." log messages), despite all non-ignored labels being identical. These were m5s, so it's possible our problem is fixed by #2462, but in general it would be very nice to have some logging in the comparator that says why it decided 2 nodegroups were dissimilar, would have made it much easier to diagnose this.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2020
@JacobHenner
Copy link

/remove-lifecycle stale

See #1676 (comment) for list of some of the reasons this might be happening.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2020
@JoelSpeed
Copy link
Contributor

I was looking into this issue last week and came up with this PR which you may all be interest in #3124

In my debugging, I was seeing a difference of 172032Ki across availability zones of m5.xlarge instances in us-east-2. I will add I was using the cluster-api provider rather than the AWS provider so this bug affects multiple providers.

My finding was that the value of MaxMemoryDifferenceInKiloBytes has caused some confusion. It was initially introduced by a colleague of mine who coded the difference to tolerate 128Ki as he described it. However in this original PR MaxMemoryDifferenceInKiloBytes was set to 128000. This is because the value was actually a number of Bytes and not a number of KiloBytes. So when this was doubled, it allowed a 256Ki diff in memory and not a 256Mi diff as was described in that PR.

To avoid confusion, I've reworked this difference calculation to use Kubernetes Quantities and prevent conversions to integers where possible to reduce the likelihood of mistakes being made in the future. So now the MaxMemoryDifference comes from resource.MustParse("256Mi") and all of the maths is done as Quantities. I've also added a test case that demonstrates some real world values I got from my testing.

Before this patch I added a bunch of debug logic to see exactly the values that the code was receiving that lead me to this discovery. I was able to consistently reproduce the problem and now, with this fix, I can see that the values coming through are being compared properly to the 256Mi tolerance.

What I would like to understand is why there are differences across instances (I've seen across AZs and within AZs)? In my testing, all machines booted from the same configuration, same AMI, so there should be no difference in the kernel version as reported earlier.

Also, I would like to understand which instance types are affected by this memory difference, is it all Nitro based EC2 instances? I've seen mention of C5 and M5, are R5 and T3 instances as well?

@JoelSpeed
Copy link
Contributor

I've been doing some more testing of this today and have come up with some more results.

Managed to find differences on the following instance types:

  • m5.xlarge - Biggest diff around 168Mi
  • r5.4xlarge - Biggest diff 16Ki
  • t3.large - Biggest diff 16Ki
  • m5.16xlarge - Biggest diff around 2688Mi
  • m4.2xlarge - Biggest diff 200Ki
  • c5.4xlarge - Biggest diff around 224Mi

The large differences have only been seen on Nitro instances, but there are some differences on older instances as well

The larger differences are about 1% difference (1.14, 1.05, 0.7%). Perhaps we should just allow a small difference as the allocatable and free are allowing? Are there any problems or implications with that, that anyone can think of?

@frobware
Copy link
Contributor

My finding was that the value of MaxMemoryDifferenceInKiloBytes has caused some confusion. It was initially introduced by a colleague of mine who coded the difference to tolerate 128Ki as he described it.

When I first noticed this, the differences I observed were way, way (way!) smaller than 128KiB. Sometimes just 8Kb. I just scaled this up as a first pass in absence of any hard numbers and also not to dramatically perturb the existing behaviour.

@JoelSpeed
Copy link
Contributor

When I first noticed this, the differences I observed were way, way (way!) smaller than 128KiB. Sometimes just 8Kb. I just scaled this up as a first pass in absence of any hard numbers and also not to dramatically perturb the existing behaviour.

That makes sense! Thanks for chipping in. Do you happen to remember if you ever tested with 5th generation instances?

I've just had an update from an AWS contact who suggested it may be due to having different CPUs supporting various instance types. EG you may get two instances from the same type but one may run a Skylake and the other a Cascade lake. Perhaps there are also subtle difference in memory capacity for these two generations

@frobware
Copy link
Contributor

Perhaps there are also subtle difference in memory capacity for these two generations

My guess at the time was "video memory". And I think I concluded on that for the m4 instance types we were using by looking at the dmesg output. I'm sure BIOS revisions could also play a part here. But dmesg should help show what's what.

@trondhindenes
Copy link

I don't know if this is relevant here, but spot-enabled auto scaling groups typically have multiple instance types attached to them, and one of them will be selected based on spot market costs (I guess) at the time of creation.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 2, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 2, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@harlitad
Copy link

Hi everyone,
Apologies if this isn't the right thread to post about my issue. Please feel free to direct me to a more appropriate one if needed.

Over the past few days, I’ve been analyzing the AWS EC2 nodes currently running across multiple Kubernetes clusters in different regions, with various instance types. Here are the memory capacity differences I’ve found:

Instance Type Largest Memory Capacity (Ki) Difference in Memory Capacity (Ki) % Difference with AWS Documentation
t3a.xlarge 16226752 550508 3.28%
t3a.2xlarge 32587216 967248 2.88%
c5a.xlarge 8022472 366136 4.36%
c6a.large 3900312 293992 7.01%
t3a.medium 3955608 238704 5.69%
t3a.small 1976632 120552 5.75%
t3a.large 8045460 343148 4.09%
c5a.large 3943324 250980 5.98%

In summary, the differences range from 2% to 7%, which seems quite significant and inconsistent. I retrieved the memory capacity from the node describe command and cross-checked it with /proc/meminfo to confirm that MemTotal matches the memory capacity shown in the node describe output.

For additional context, we are using Karpenter with the Nitro hypervisor.

Does anyone have any insights on why these discrepancies occur, and what percentage difference should be considered acceptable when calculating general memory capacity? I need to establish a baseline for memory capacity calculations.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/aws Issues or PRs related to aws provider lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests