k8s-autoscaler-benchmarker

The k8s-autoscaler-benchmarker can be a useful tool for administrators and developers looking to optimize the scaling capabilities of their EKS clusters. The tool offers a streamlined process for benchmarking the performance of Karpenter and Cluster Autoscaler for EKS workloads.

By providing time metrics on EC2 instance initiation, node registration, pod readiness, node deregistration, and instance termination times, it enables users to quickly test autoscaler settings such as consolidateAfter for Karpenter and scale-down-unneeded-time, node-delete-delay-after-taint, scale-down-delay-after-add, etc. for Cluster Autoscaler.

This tool also supports customization through a variety of parameters, ensuring that users can adapt the benchmarking process to their specific environment while also providing defaults for quick testing.

Currently Supported Features

Benchmarking Metrics: The tool currently tracks the following metrics for a given autoscaler (Karpenter or Cluster Autoscaler):
1. Total time for EC2 instances to initiate their boot process after failed pod scheduling.
2. Total time for EC2 instances to register to the k8s API after initiating their boot process.
3. Total time for pod readiness of a deployment after EC2 instances are registered to the k8s API.
4. Total time for EC2 instances deregistration from k8s API after scaling a deployment to 0.
5. Total time for EC2 instances termination after scaling a deployment to 0.
Customizable Parameters: A wide array of input parameters allows for the customization of the k8s deployment used for benchmarking - supply your own by providing then name and namespace or use an autogenerated deployment customizable via parameters.
Clear Results Summary: Benchmark outcomes are concisely summarized to stdout.
Flexible Environment Configuration: Supports optional parameters for specifying kubeconfig paths and AWS profiles with default values for ease of use.

Demo

* Note: Instance ids and ip addresses have been redacted with x's in the below demo video. The real program output will show your actual resource ids.

Karpenter Example

The below example uses a user provided Docker image passed in as a parameter. A new deployment will be created since an existing deployment isn't provided by the user. Once the program is terminated the deployment will be deleted.

./k8s-autoscaler-benchmarker --nodepool k8s-autoscaler-benchmarker --replicas 2 --container-name redis --container-image redis/redis-stack

karpenter-demo.mp4

Benchmarks Summary
--------------------------------------------
Instance Initiation Time:     3.65 seconds
Instance Registration Time:   40.22 seconds
Pod Readiness Time:           31.46 seconds
Instance Deregistration Time: 20.12 seconds
Instance Termination Time:    96.24 seconds
--------------------------------------------

Cluster Autoscaler Example

The below example does not use an existing deployment nor passes in a custom Docker image. A new deployment will be created with the default "inflate" deployment. Once the program is terminated the deployment will be deleted.

 ./k8s-autoscaler-benchmarker --node-group k8s-autoscaler-benchmarker-ng --replicas 2

cluster-autoscaler-demo.mp4

Benchmarks Summary
--------------------------------------------
Instance Initiation Time:     21.63 seconds
Instance Registration Time:   34.21 seconds
Pod Readiness Time:           3.45 seconds
Instance Deregistration Time: 133.70 seconds
Instance Termination Time:    132.05 seconds
--------------------------------------------

Prerequisites

An active EKS cluster
AWS CLI configured with access to the EKS Cluster
kubectl configured with access to the EKS Cluster
Go 1.16 or later installed on your machine
For Karpenter:
- Install Karpenter in the EKS Cluster
- Setup a NodePool and EC2NodeClass similar to this NodePool example (Note: The eks.autify.com/k8s-autoscaler-benchmarker label and taint are required with their respective values for the default values to function correctly unless overridden via parameters)
For Cluster Autoscaler:
- Install Cluster Autoscaler in the EKS Cluster
- Create a Managed Node Group similar to this Managed Node Group example (Note: The eks.autify.com/k8s-autoscaler-benchmarker label and taint are required with its respective value for the default values to function correctly unless overridden via parameters)

Installation

To use the k8s-autoscaler-benchmarker, clone this repository and build the tool using Go:

git clone https://github.com/moebaca/k8s-autoscaler-benchmarker.git && cd k8s-autoscaler-benchmarker

go build -o k8s-autoscaler-benchmarker

Usage

Execute the tool with the following command, providing any desired options:

./k8s-autoscaler-benchmarker [options]

Input Parameters

Name	Description	Type	Default	Required
`nodepool`	The Karpenter node pool tag value to monitor. One of `nodepool` or `node-group` must be provided.	string	N/A	Yes*
`node-group`	The ASG node group name to monitor. One of `nodepool` or `node-group` must be provided.	string	N/A	Yes*
`kubeconfig`	Path to the kubeconfig file to use for CLI requests.	string	(uses default kubeconfig path)	No
`aws-profile`	The AWS profile to use for accessing EC2 services.	string	`default`	No
`deployment`	The name of the deployment to benchmark. If not supplied, one will be created automatically. This deployment WILL NOT be deleted upon program termination.	string	N/A	No
`namespace`	The namespace of the deployment.	string	`default`	No
`replicas`	The number of replicas to scale the deployment to.	int	`1`	No
`container-name`	The name of the container AND generated deployment if an existing deployment isn't supplied. This deployment WILL be deleted upon program termination.	string	`inflate`	No
`container-image`	The image of the container in the generated deployment if an existing deployment isn't supplied.	string	`public.ecr.aws/eks-distro/kubernetes/pause:3.7`	No
`cpu-request`	The CPU request for the container in the generated deployment if an existing deployment isn't supplied.	string	`1`	No
`toleration-key`	The toleration key for the generated deployment if an existing deployment isn't supplied.	string	`eks.autify.com/k8s-autoscaler-benchmarker`	No
`toleration-value`	The toleration value for the generated deployment if an existing deployment isn't supplied.	string	N/A	No
`node-selector-key`	The node selector key for the generated deployment if an existing deployment isn't supplied.	string	`eks.autify.com/k8s-autoscaler-benchmarker`	No
`node-selector-value`	The node selector value for the generated deployment if an existing deployment isn't supplied.	string	`true`	No

* Note: Either nodepool (for Karpenter) or node-group (for Cluster Autoscaler) is required for the tool to function correctly. Both parameters should not be provided at the same time.

Examples

Running a benchmark with all default settings:

With Karpenter:

./k8s-autoscaler-benchmarker --nodepool k8s-autoscaler-benchmarker

or with Cluster Autoscaler:

./k8s-autoscaler-benchmarker --node-group k8s-autoscaler-benchmarker

Benchmarking with Karpenter using an existing deployment in a custom namespace with 3 replicas:

./k8s-autoscaler-benchmarker --nodepool k8s-autoscaler-benchmarker --deployment my-deployment --namespace my-namespace --replicas 3

Benchmarking with Karpenter using a custom Docker image with 2 replicas:

./k8s-autoscaler-benchmarker --nodepool k8s-autoscaler-benchmarker --replicas 2 --container-name redis --container-image redis/redis-stack

Troubleshooting

If the program prompts you of a timeout during the scaling of the deployment please check for pod errors before exiting with 'no':
1. There may be an issue with taints/tolerations or labels not matching between the deployment and the node group/nodepool.
2. Cluster Autoscaler may not scale node group initially right after creation. I've found manually setting min size and desired capacity to 1 and then back to 0 fixes this (only required right after initial creation).
If you find the program stalls with only partial pod startup during the scaling of the deployment the autoscaler may not be able to scale the entire deployment due to node group limits (eg. maximum size of the node group reached). Use less replicas or increase the node group max size to fix this. Always restart the benchmark after making changes to the node group.
If you find the program stalls with 0 pods starting up check to ensure there aren't any container CrashLoopBackOff occuring.

Contributing

Contributions to the k8s-autoscaler-benchmarker project are welcome. Please submit pull requests to the main repository.

TODO

More test coverage.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
examples		examples
internal		internal
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

k8s-autoscaler-benchmarker

Currently Supported Features

Demo

Karpenter Example

Cluster Autoscaler Example

Prerequisites

Installation

Usage

Input Parameters

Examples

Troubleshooting

Contributing

TODO

License

About

Releases 1

Packages

Languages

License

moebaca/k8s-autoscaler-benchmarker

Folders and files

Latest commit

History

Repository files navigation

k8s-autoscaler-benchmarker

Currently Supported Features

Demo

Karpenter Example

Cluster Autoscaler Example

Prerequisites

Installation

Usage

Input Parameters

Examples

Troubleshooting

Contributing

TODO

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages