Uneven scale-up of AWS ASG's #2020

danmcnulty · 2019-05-15T10:49:07Z

Hi,
I'm testing the cluster autoscaler on our AWS EKS 1.12 cluster.
I created 3 identical ASG's in zones a/b/c, and created a test deployment using a basic nginx pod, which I scale up with commands like
kubectl -n playground scale --replicas=4 deployment nginx-scaleout
I've sized the pods so that 2 will fit on each node.

I started with 3 nodes, one per AZ, and began scaling up the deployment. I saw it add nodes evenly at first so that each zone had 2 nodes. I then scaled up further until I had 3/3/2 nodes across the zones (so far so good), but the next time it scaled up it added a fourth in zone A so I had 4/3/2, but I'm unsure why it did this instead of adding a new node in zone C?

The relevant I0515 10:17:35.100431 I0515 10:17:35.133540 I0515 10:17:35.743084 I0515 10:17:35.845006 I0515 10:17:35.845350 I0515 10:17:35.845378 I0515 10:17:35.942508 I0515 10:17:35.942538 I0515 10:17:35.942546 I0515 10:17:35.942667 I0515 10:17:35.942721 I0515 10:17:35.942853 I0515 10:17:35.942869 I0515 10:17:36.042655 I0515 10:17:36.042693 I0515 10:17:36.042705 I0515 10:17:36.042721 I0515 10:17:36.042732 I0515 10:17:36.042894 I0515 10:17:36.042918 I0515 10:17:36.042963 part of the log is this:
1 static_autoscaler.go:121] Starting main loop
1 leaderelection.go:227] successfully renewed lease kube-system/cluster-autoscaler
1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: [eks-zeus-autoscaler20190501200602699100000004 eks-zeus-scalemultiaz-a-20190514154009657700000006 eks-zeus-scalemultiaz-b-20190514154009657700000004 eks-zeus-scalemultiaz-c-20190514154009657700000005]
1 aws_manager.go:157] Refreshed ASG list, next refresh after 2019-05-15 10:17:45.84499724 +0000 UTC m=+45941.859559752
1 utils.go:552] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
1 static_autoscaler.go:252] Filtering out schedulables
1 static_autoscaler.go:262] No schedulable pods
1 scale_up.go:263] Pod playground/nginx-scaleout-85cf87558d-z2pmd is unschedulable
1 scale_up.go:263] Pod playground/nginx-scaleout-85cf87558d-w2h2j is unschedulable
1 scale_up.go:300] Upcoming 0 nodes
1 utils.go:208] Pod nginx-scaleout-85cf87558d-z2pmd can't be scheduled on eks-zeus-autoscaler20190501200602699100000004, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector
1 utils.go:198] Pod nginx-scaleout-85cf87558d-w2h2j can't be scheduled on eks-zeus-autoscaler20190501200602699100000004. Used cached predicate check results
1 scale_up.go:406] No pod can fit to eks-zeus-autoscaler20190501200602699100000004
1 waste.go:57] Expanding Node Group eks-zeus-scalemultiaz-a-20190514154009657700000006 would waste 50.00% CPU, 47.28% Memory, 48.64% Blended
1 waste.go:57] Expanding Node Group eks-zeus-scalemultiaz-b-20190514154009657700000004 would waste 50.00% CPU, 47.28% Memory, 48.64% Blended
1 waste.go:57] Expanding Node Group eks-zeus-scalemultiaz-c-20190514154009657700000005 would waste 50.00% CPU, 47.28% Memory, 48.64% Blended
1 scale_up.go:418] Best option to resize: eks-zeus-scalemultiaz-a-20190514154009657700000006
1 scale_up.go:422] Estimated 1 nodes needed in eks-zeus-scalemultiaz-a-20190514154009657700000006
1 scale_up.go:501] Final scale-up plan: [{eks-zeus-scalemultiaz-a-20190514154009657700000006 3->4 (max: 6)}]
1 scale_up.go:579] Scale-up: setting group eks-zeus-scalemultiaz-a-20190514154009657700000006 size to 4
1 auto_scaling_groups.go:203] Setting asg eks-zeus-scalemultiaz-a-20190514154009657700000006 size to 4

And my configuration looks like this:
`spec:
containers:

command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --balance-similar-node-groups
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/zeus
  image: k8s.gcr.io/cluster-autoscaler:v1.12.3`

Any help would be appreciated, Thanks.

The text was updated successfully, but these errors were encountered:

Jeffwan · 2019-05-15T22:31:33Z

/assign

Jeffwan · 2019-05-20T21:43:18Z

/sig aws

vikaschoudhary16 · 2019-05-21T05:35:02Z

by default it is "random" expander, if that is the case, scenario you described does not seem highly improbable.Does it happen each time?

danmcnulty · 2019-05-21T08:08:17Z

We're using the least-waste expander presently, but in this case the calculations for each of the ASG's is identical, so shouldn't it then choose to scale up to make the ASG's balanced?

MaciekPytel · 2019-05-21T09:40:50Z

None of existing expanders cares about zone balancing, so it shouldn't matter which one you use. CA has a separate mechanism for balancing: it finds 'similar' NodeGroups (ASGs) and splits any scale-up between them. This happens after the expander makes a decision and it shouldn't depend on expander at all.
My guess would be that your ASGs are not actually 'similar' according to the definition used by CA (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/balance_similar.md#similar-node-groups). I'd look at the set of labels the nodes in each ASG have and see if they're identical except for Kubernetes defined zone and host labels.

danmcnulty · 2019-05-21T10:45:50Z

All of the nodes have identical labels except zone and hostname - we used terraform to create and tag the ASG's they reside in.

`kubectl get nodes -l environment=playground --show-labels
NAME STATUS ROLES AGE VERSION LABELS
ip-10-0-173-7.eu-west-1.compute.internal Ready 8m6s v1.12.7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t3.medium,beta.kubernetes.io/os=linux,cluster=zeus,environment=playground,failure-domain.beta.kubernetes.io/region=eu-west-1,failure-domain.beta.kubernetes.io/zone=eu-west-1a,kubernetes.io/hostname=ip-10-0-173-7.domain.com,nodegroup=scalemultiaz,workload=scalemultiaz

ip-10-0-187-220.eu-west-1.compute.internal Ready 7m59s v1.12.7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t3.medium,beta.kubernetes.io/os=linux,cluster=zeus,environment=playground,failure-domain.beta.kubernetes.io/region=eu-west-1,failure-domain.beta.kubernetes.io/zone=eu-west-1b,kubernetes.io/hostname=ip-10-0-187-220.domain.com,nodegroup=scalemultiaz,workload=scalemultiaz`

Jeffwan · 2019-05-21T20:53:50Z

for least-waste, it's backed by random strategy.

autoscaler/cluster-autoscaler/expander/waste/waste.go

Lines 33 to 35 in cb4e60f

    
           func NewStrategy() expander.Strategy { 
        
           	return &leastwaste{random.NewStrategy()} 
        
           }

In your case, I think all 3 nodegroups are qualified, and random strategy will work and pick one of them.

meringu · 2019-05-28T21:27:22Z

We’re hitting this issue in our clusters too. I think I know why.

We have 3 ASGs with c5.2xlarge spread across 3 AZs. Looks like when Amazon creates an EC2 instance either 15835076Ki or 15835084Ki total memory is provisioned for the VM. This is what is reported in the node status.capacity.memory, and verified with free -k on the node.

When the cluster autoscaler attempts to discover similar node groups, it requires an exact match in memory capacity here:

autoscaler/cluster-autoscaler/processors/nodegroupset/compare_nodegroups.go

Lines 79 to 81 in 0968736

    
           // For capacity we require exact match. 
        
           // If this is ever changed, enforcing MaxCoresTotal and MaxMemoryTotal limits 
        
           // as it is now may no longer work.

Also seems like node groups that are scaled to zero, get a different memory capacity again, maybe this?

autoscaler/cluster-autoscaler/cloudprovider/aws/ec2_instance_types.go

Line 129 in 0968736

MemoryMb: 16384,

Do we need some sort of tolerance in the capacity comparison, similar to the allocatable and free comparisons?

Jeffwan · 2019-05-29T07:23:04Z

@meringu If that's the case, we have some options

Figure out why some instance have different memory provisioned. I can check with EC2 team on this. I suspect not all c5.2xlarge use exact same memory? Also, ec2_instance_type mapping has to be match with provisioned memory.
Use comparison with tolerance. My concern is this is common code and if that's only aws issues, we should not touch this.

MaciekPytel · 2019-05-29T11:03:20Z

Regarding point 2 - the comparison logic lives in processor, ie. it's hidden behind an interface specifically to allow adding custom implementations without touching common code. If required you should be able to customize the logic by adding your own implementation of https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodegroupset/nodegroup_set_processor.go. You may be able to reuse most of implementation of balancing processor (default), just change the comparison logic.

Jeffwan · 2019-05-29T19:21:04Z

Regarding point 2 - the comparison logic lives in processor, ie. it's hidden behind an interface specifically to allow adding custom implementations without touching common code. If required you should be able to customize the logic by adding your own implementation of https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodegroupset/nodegroup_set_processor.go. You may be able to reuse most of implementation of balancing processor (default), just change the comparison logic.

Thanks for guidance! @MaciekPytel

Jeffwan · 2019-05-30T05:45:11Z

@meringu I will try to reproduce this issue on my end and check with EC2 team at the same time. It will take some time and come back to you later then

meringu · 2019-06-13T04:48:53Z

Any update on the issue @Jeffwan?

As I understand, this issue would be causing --balance-similar-nodegroups to not work for all AWS users of the autoscaler who configure an AutoScaling group per instance type per Availability Zone.

Jeffwan · 2019-06-14T15:05:57Z

@meringu Sorry I was on call in past two weeks. Get some times this week to check this issue.

meringu · 2019-06-14T21:55:52Z

Thanks @Jeffwan. Let me know if there is anything I can help with.

jhohertz · 2019-06-18T15:22:04Z

I just ran into a similar thing, with a kops-built cluster, and I know exactly why it is ignoring the balance-similar-node-groups flag.

The problem, at least in my case, and it sounds similar to the above description, is that kops is adding it's own labelling to the node, IE: kops.k8s.io/instancegroup: nodes-us-east-1a, meaning that groups do not match on the labelling.

One possible fix, specific to kops clusters, would be to add the kops.k8s.io/instancegroup string to the ignoredLabels here

Not sure if other tools IE: for EKS have well-known labels like this that could also be added, or if the better solution would be to add a flag to let additional ignore labels be specified at runtime.

Thoughts?

meringu · 2019-06-18T19:51:42Z

Not the case for our EKS cluster. We do have some control over the labels. The only different node labels on our clusters are in the ignoredLabels list.

It is the capacity check in our case, as the logs show it sometimes finds a similar node group or two, depending on if the node group has any instances, and a bit random as AWS doesn't give the exact amount of memory everytime.

meringu · 2019-06-24T03:37:07Z

Hi @Jeffwan, did you get a change to look at this last week? Is there anything I can help with?

We are happy to contribute engineer time if that would be helpful.

Jeffwan · 2019-06-24T06:16:47Z

@meringu Sorry for late response. It would be helpful to help identify the problem. I am trying to see if this can be easily reproducible. In the ca log @danmcnulty provide, it failed node selector, could you share the logs and make sure it failed on mismatch of memory?

Pod nginx-scaleout-85cf87558d-z2pmd can't be scheduled on eks-zeus-autoscaler20190501200602699100000004, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector

I did some search and didn't find any clues of memory mismatch for one instance. Could you file a service ticket to AWS ec2 team? (I don't have all your details)

meringu · 2019-06-24T22:22:32Z

Unfortunately there are no logs for the comparisons, you can see in my above link for the IsNodeInfoSimilar function, that there are no log statements. See my above for the memory disparity.

Summarised:

c5.2xlarge:
Cluster Autoscaler is coded to 16384MB
Some nodes free -k 15835076Ki
Other nodes free -k 15835084Ki

I'll raise a support ticket asking them about the difference in memory from instance to instance, and the why the advertised memory is different again.

leonsodhi-lf · 2019-06-25T11:21:13Z

@meringu this sounds like a similar problem to what I'm facing. In the event it is relevant, you may wish to skim aleksandra-malinowska's comment. Here's a snippet:

That's probably because actual node's memory is slightly different from predicted memory based on machine type (due to kernel reservation, specific to a given OS/machine combination)

danmcnulty · 2019-07-01T13:19:20Z

@Jeffwan Just a quick note on the logs I provided - the ASG with prefix "eks-zeus-autoscaler" was used for something unrelated by us, and so it was expected that this group didn't match the node selector.

The question I have is about the 3 similar groups with prefixes:
eks-zeus-scalemultiaz-a
eks-zeus-scalemultiaz-b
eks-zeus-scalemultiaz-c
Which are the ones with identical labels, which scaled up un-evenly.

meringu · 2019-07-04T23:36:50Z

Hey team,

I've heard back from AWS about the memory capacity disparity.

For the difference reported on the running hosts, I was using two different AMIs with different kernel versions. After updating all the instances to the newer AMI, they all report the same memory capacity.

With regards to the difference in reported in the API vs reported on the node, I got this response:

Pricing API provides total memory that comes with the instance type (which is also displayed in our websites). However, there will be certain memory that will reserve for the use of kernel , BIOs etc.
Hence total available memory for the use will be less than listed in pricing API

This explains why two different AMIs can have different memory capacity, as they can have different kernel versions etc. This also means the capacity in the pricing API will never match the reported capacity from the node.

So now looking at this comment in the code:

autoscaler/cluster-autoscaler/processors/nodegroupset/compare_nodegroups.go

Lines 79 to 81 in 0968736

    
           // For capacity we require exact match. 
        
           // If this is ever changed, enforcing MaxCoresTotal and MaxMemoryTotal limits 
        
           // as it is now may no longer work.

A course of action could be to look at a different way to satisfy the MaxMemoryTotal requirement so we can implement a tolerance on the capacity check.

meringu · 2019-07-05T06:02:27Z

Or I could disable scaling to zero for my ASGs. This means extra overhead however.

ewoutp · 2019-09-12T19:17:16Z

Isn't it possible to simplify the "similar check" to comparing just the instance type?

jhohertz · 2019-09-12T19:26:50Z

@ewoutp You could have multiple node pools of the same instance type that you want scaled separately for one reason or another.

The main issue is where you have pools that are the same for all intents across differing zones, and some tag/label is different between the zones and getting in the way. See this PR for what I am talking about: #2207

Robert-Stam · 2019-09-12T20:02:02Z

Not the complete check, however for the CPU and especially the memory part (which seems to be a little off for now) ?

ewoutp · 2019-09-13T06:14:17Z

@jhohertz That is true. However I suspect there are many cases where comparison on instanceType alone are sufficient (and working very well).
Shouldn't that at least be an option?

sulixu · 2019-10-08T22:14:36Z

I have 3 ASGs with c5.9xlarge spread across 3 AZs, however when CA tries to make a decision on splitting scale-up nodes between the node groups, most of the times it split among 2 AZs. Why do we want to check the 5% free resources? Aren't labels, capacity and allocatable sufficient enough to compare the node group similarity?

cdmurph32 · 2019-10-16T15:21:58Z

I'm seeing this issue with m5.xlarge.
One node group reports Memory:16404668416 and another Memory:16228507648, for a difference of 176MB.

&nodeinfo.Resource{MilliCPU:4000, Memory:16228507648, EphemeralStorage:48307038948, AllowedPodNumber:58, ScalarResources:map[v1.ResourceName]int64{"attachable-volumes-aws-ebs":25, "hugepages-1Gi":0, "hugepages-2Mi":0}}

&nodeinfo.Resource{MilliCPU:4000, Memory:16404668416, EphemeralStorage:48307038948, AllowedPodNumber:58, ScalarResources:map[v1.ResourceName]int64{"attachable-volumes-aws-ebs":25, "hugepages-1Gi":0, "hugepages-2Mi":0}}

Updating MaxMemoryDifferenceInKiloBytes to 256000 fixes the issue. https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodegroupset/compare_nodegroups.go#L36

For those using the AWS CNI with custom networking, the label "k8s.amazonaws.com/eniConfig" needs to be whitelisted as well.

JacobHenner · 2019-10-16T15:27:46Z

I'm seeing this issue with m5.xlarge.
One node group reports Memory:16404668416 and another Memory:16228507648.

Seen here on m5.large

bazzargh · 2019-11-07T19:43:31Z

We're hitting this same issue; we can see from the logs that k8s decided only 2 of our 3 nodegroups were similar (by looking at the "Splitting..." log messages), despite all non-ignored labels being identical. These were m5s, so it's possible our problem is fixed by #2462, but in general it would be very nice to have some logging in the comparator that says why it decided 2 nodegroups were dissimilar, would have made it much easier to diagnose this.

fejta-bot · 2020-03-18T04:23:30Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

JacobHenner · 2020-04-08T20:53:05Z

/remove-lifecycle stale

See #1676 (comment) for list of some of the reasons this might be happening.

JoelSpeed · 2020-05-11T16:44:39Z

I was looking into this issue last week and came up with this PR which you may all be interest in #3124

In my debugging, I was seeing a difference of 172032Ki across availability zones of m5.xlarge instances in us-east-2. I will add I was using the cluster-api provider rather than the AWS provider so this bug affects multiple providers.

My finding was that the value of MaxMemoryDifferenceInKiloBytes has caused some confusion. It was initially introduced by a colleague of mine who coded the difference to tolerate 128Ki as he described it. However in this original PR MaxMemoryDifferenceInKiloBytes was set to 128000. This is because the value was actually a number of Bytes and not a number of KiloBytes. So when this was doubled, it allowed a 256Ki diff in memory and not a 256Mi diff as was described in that PR.

To avoid confusion, I've reworked this difference calculation to use Kubernetes Quantities and prevent conversions to integers where possible to reduce the likelihood of mistakes being made in the future. So now the MaxMemoryDifference comes from resource.MustParse("256Mi") and all of the maths is done as Quantities. I've also added a test case that demonstrates some real world values I got from my testing.

Before this patch I added a bunch of debug logic to see exactly the values that the code was receiving that lead me to this discovery. I was able to consistently reproduce the problem and now, with this fix, I can see that the values coming through are being compared properly to the 256Mi tolerance.

What I would like to understand is why there are differences across instances (I've seen across AZs and within AZs)? In my testing, all machines booted from the same configuration, same AMI, so there should be no difference in the kernel version as reported earlier.

Also, I would like to understand which instance types are affected by this memory difference, is it all Nitro based EC2 instances? I've seen mention of C5 and M5, are R5 and T3 instances as well?

JoelSpeed · 2020-05-12T14:19:22Z

I've been doing some more testing of this today and have come up with some more results.

Managed to find differences on the following instance types:

m5.xlarge - Biggest diff around 168Mi
r5.4xlarge - Biggest diff 16Ki
t3.large - Biggest diff 16Ki
m5.16xlarge - Biggest diff around 2688Mi
m4.2xlarge - Biggest diff 200Ki
c5.4xlarge - Biggest diff around 224Mi

The large differences have only been seen on Nitro instances, but there are some differences on older instances as well

The larger differences are about 1% difference (1.14, 1.05, 0.7%). Perhaps we should just allow a small difference as the allocatable and free are allowing? Are there any problems or implications with that, that anyone can think of?

frobware · 2020-05-15T08:46:13Z

My finding was that the value of MaxMemoryDifferenceInKiloBytes has caused some confusion. It was initially introduced by a colleague of mine who coded the difference to tolerate 128Ki as he described it.

When I first noticed this, the differences I observed were way, way (way!) smaller than 128KiB. Sometimes just 8Kb. I just scaled this up as a first pass in absence of any hard numbers and also not to dramatically perturb the existing behaviour.

JoelSpeed · 2020-05-15T09:09:02Z

When I first noticed this, the differences I observed were way, way (way!) smaller than 128KiB. Sometimes just 8Kb. I just scaled this up as a first pass in absence of any hard numbers and also not to dramatically perturb the existing behaviour.

That makes sense! Thanks for chipping in. Do you happen to remember if you ever tested with 5th generation instances?

I've just had an update from an AWS contact who suggested it may be due to having different CPUs supporting various instance types. EG you may get two instances from the same type but one may run a Skylake and the other a Cascade lake. Perhaps there are also subtle difference in memory capacity for these two generations

frobware · 2020-05-15T09:28:21Z

Perhaps there are also subtle difference in memory capacity for these two generations

My guess at the time was "video memory". And I think I concluded on that for the m4 instance types we were using by looking at the dmesg output. I'm sure BIOS revisions could also play a part here. But dmesg should help show what's what.

trondhindenes · 2020-08-04T12:12:19Z

I don't know if this is relevant here, but spot-enabled auto scaling groups typically have multiple instance types attached to them, and one of them will be selected based on spot market costs (I guess) at the time of creation.

fejta-bot · 2020-11-02T12:49:35Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-12-02T13:34:21Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2021-01-01T14:18:48Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2021-01-01T14:18:59Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

harlitad · 2024-10-16T15:14:10Z

Hi everyone,
Apologies if this isn't the right thread to post about my issue. Please feel free to direct me to a more appropriate one if needed.

Over the past few days, I’ve been analyzing the AWS EC2 nodes currently running across multiple Kubernetes clusters in different regions, with various instance types. Here are the memory capacity differences I’ve found:

Instance Type	Largest Memory Capacity (Ki)	Difference in Memory Capacity (Ki)	% Difference with AWS Documentation
t3a.xlarge	16226752	550508	3.28%
t3a.2xlarge	32587216	967248	2.88%
c5a.xlarge	8022472	366136	4.36%
c6a.large	3900312	293992	7.01%
t3a.medium	3955608	238704	5.69%
t3a.small	1976632	120552	5.75%
t3a.large	8045460	343148	4.09%
c5a.large	3943324	250980	5.98%

In summary, the differences range from 2% to 7%, which seems quite significant and inconsistent. I retrieved the memory capacity from the node describe command and cross-checked it with /proc/meminfo to confirm that MemTotal matches the memory capacity shown in the node describe output.

For additional context, we are using Karpenter with the Nitro hypervisor.

Does anyone have any insights on why these discrepancies occur, and what percentage difference should be considered acceptable when calculating general memory capacity? I need to establish a baseline for memory capacity calculations.

Thank you!

k8s-ci-robot assigned Jeffwan May 15, 2019

k8s-ci-robot added the sig/aws label May 20, 2019

k8s-ci-robot added area/provider/aws Issues or PRs related to aws provider and removed sig/aws labels Aug 6, 2019

cdmurph32 mentioned this issue Oct 16, 2019

Add AWS ignore labels and increase max memory difference. #2462

Merged

JacobHenner mentioned this issue Dec 19, 2019

AWS EKS cluster autoscaler only using ASGs that have at least one node #1676

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2020

JoelSpeed mentioned this issue May 18, 2020

Allow small tolerance on memory capacity when comparing nodegroups #3124

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 2, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 2, 2020

k8s-ci-robot closed this as completed Jan 1, 2021

Uneven scale-up of AWS ASG's #2020

Uneven scale-up of AWS ASG's #2020

Comments

danmcnulty commented May 15, 2019 • edited Loading

Jeffwan commented May 15, 2019

Jeffwan commented May 20, 2019

vikaschoudhary16 commented May 21, 2019

danmcnulty commented May 21, 2019

MaciekPytel commented May 21, 2019

danmcnulty commented May 21, 2019

Jeffwan commented May 21, 2019

meringu commented May 28, 2019

Jeffwan commented May 29, 2019

MaciekPytel commented May 29, 2019 • edited Loading

Jeffwan commented May 29, 2019

Jeffwan commented May 30, 2019

meringu commented Jun 13, 2019

Jeffwan commented Jun 14, 2019

meringu commented Jun 14, 2019

jhohertz commented Jun 18, 2019

meringu commented Jun 18, 2019

meringu commented Jun 24, 2019

Jeffwan commented Jun 24, 2019

meringu commented Jun 24, 2019

leonsodhi-lf commented Jun 25, 2019

danmcnulty commented Jul 1, 2019

meringu commented Jul 4, 2019

meringu commented Jul 5, 2019

ewoutp commented Sep 12, 2019

jhohertz commented Sep 12, 2019

Robert-Stam commented Sep 12, 2019

ewoutp commented Sep 13, 2019

sulixu commented Oct 8, 2019

cdmurph32 commented Oct 16, 2019 • edited Loading

JacobHenner commented Oct 16, 2019

bazzargh commented Nov 7, 2019

fejta-bot commented Mar 18, 2020

JacobHenner commented Apr 8, 2020

JoelSpeed commented May 11, 2020

JoelSpeed commented May 12, 2020

frobware commented May 15, 2020

JoelSpeed commented May 15, 2020

frobware commented May 15, 2020

trondhindenes commented Aug 4, 2020

fejta-bot commented Nov 2, 2020

fejta-bot commented Dec 2, 2020

fejta-bot commented Jan 1, 2021

k8s-ci-robot commented Jan 1, 2021

harlitad commented Oct 16, 2024

danmcnulty commented May 15, 2019 •

edited

Loading

MaciekPytel commented May 29, 2019 •

edited

Loading

cdmurph32 commented Oct 16, 2019 •

edited

Loading