Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max-pods-calculator.sh should not be skipped in bootstrap.sh for large instance type #1419

Closed
HenryXie1 opened this issue Sep 3, 2023 · 7 comments

Comments

@HenryXie1
Copy link

HenryXie1 commented Sep 3, 2023

What happened:
We use EKS opt AMI with node type c5.18xlarge
we find the max pod settings in kubelet config is 737 which is from eni-max-pods.txt , not from max-pods-calclulator.sh
it is skipped due to line

if [ -z "$MAX_PODS" ] || [ -z "$INSTANCE_TYPE" ]; then

What you expected to happen:

refer https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html#determine-max-pods
max pods should be 250

we expect the max-pods-calculator.sh should not be skipped

How to reproduce it (as minimally and precisely as possible):
Start instance in eks with node type c5.18xlarge.
and check max pods in kubelet config file

the max pod number in kubelet config file is 737, not 250

Anything else we need to know?:

Environment:
AWS Region: ap-southeast-2
Instance Type(s): c5.18xlarge
EKS Platform version : eks.11
Kubernetes version : 1.23
AMI Version: ami-018ae0f2e02aab38b
Kernel (e.g. uname -a): Linux ANL10233355 5.4.249-163.359.amzn2.x86_64
Release information (run cat /etc/eks/release on a node):
BASE_AMI_ID="ami-018ae0f2e02aab38b"
BUILD_TIME="Fri Jul 28 04:19:03 UTC 2023"
BUILD_KERNEL="5.4.249-163.359.amzn2.x86_64"
ARCH="x86_64"

@HenryXie1 HenryXie1 changed the title max-pods-calculator.sh is skipped by bootstrap.sh max-pods-calculator.sh should not be skipped in bootstrap.sh for large instance type Sep 3, 2023
@bryantbiggs
Copy link
Contributor

737 is the theoretical limit based on the available VPC networking resources (num ENIs * num available IPs per ENI), but running this many pods on the instance most likely will cause resource contention. The upper bounds of 110/250 is based on testing performed by the EKS team based on the instance size - you can see more details here #1368

@cartermckinnon
Copy link
Member

cartermckinnon commented Sep 6, 2023

IMO the 110/250 limits are fine guidelines but mostly arbitrary; and using the theoretical max here is a better default. To @bryantbiggs' point, the number of pods on a given node will usually be bounded by resource requests.

@cartermckinnon cartermckinnon closed this as not planned Won't fix, can't repro, duplicate, stale Sep 6, 2023
@HenryXie1
Copy link
Author

@bryantbiggs @cartermckinnon Thanks for your input.
What is the recommended number customers should use?
Based your input, 737 is a better default, which is not quite ideal in real world IMO .
We filed aws support case, they said we should use 110/250.

As a customer we are a bit confused.

@qoehliang
Copy link

qoehliang commented Sep 6, 2023

I think the impact the max-pods value has on kubelet is the memory reservation is the concern being raised here. It is clear that typically, as mentioned @cartermckinnon, the number of pods on a given node is typically bounded by resource requests. In our scenario, we haven't seen any node that has even close to 110 pods. Given 737 pods is the theoretical limit for a c5.18xlarge, this is also the value used to calculate how much memory kubelet will be reserved when using the EKS Optimized AMI. https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh#L271.

I.E. For the c5.18xlarge instance, that's, 255 + 11 * 737 = 8362 Mi

Does it make sense to reserve 8.3Gb of memory for kubelet knowing we won't nearly be able to reach close to the 737 theoretical limit and even the 250 recommended limit for max pods for a c5.18xlarge instance type.

Capping the max-pods to 250 as done with the max-pods-calculator.sh means kubelet is only reserving about 3GB of memory. Hence, if we expect to run nowhere near 250 pods, is it recommended we only need to reserve 3GB of memory for kubelet and hence, make customisations on top of the EKS Optimized AMI to override the way it currently calculates max-pods and hence, memory reservation for kubelet.

Another question. Do we know if kubelet resource utilisation is a direct correlation to number of pods running on the instance and nothing else? I.E. If I have 5 pods on a c5.xlarge and 5 pods on a c5.18xlarge, will the kubelet consume the same amount of memory.

@cartermckinnon
Copy link
Member

I completely agree with your assessment. I think the issue is the kubeReserved formula. The max pod value is definitely related, but not a great proxy for this.

Do we know if kubelet resource utilisation is a direct correlation to number of pods running on the instance and nothing else? I.E. If I have 5 pods on a c5.xlarge and 5 pods on a c5.18xlarge, will the kubelet consume the same amount of memory.

I don't have good data on this, but that's the assumption that the related code is making.

@qoehliang
Copy link

Perhaps this is similar to: #1141. Our main objective is having accurate reservations for kubelet, to ensure we are efficiently utilising our resources on any instance type. Right now, the max pods controlling the kubelet memory reservation results in less allocatable memory for pods. And as discussed, memory and cpu requests are the typical bounds to how many pods we actually end up scheduling on a given node. This results in less pods scheduled on nodes, the nodes being slightly more inefficient, and an overall increase in cost.

@cartermckinnon
Copy link
Member

cartermckinnon commented Sep 6, 2023

Yep, reserved memory in particular should be more of a step function than a linear one. We'll continue tracking this in #1141. I'd like to scrape metrics from our CI jobs to have more data points for this type of thing, ad hoc analysis hasn’t been adequate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants