Skip to content

A tool for benchmarking Karpenter and Cluster Autoscaler on EKS.

License

Notifications You must be signed in to change notification settings

moebaca/k8s-autoscaler-benchmarker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

k8s-autoscaler-benchmarker

The k8s-autoscaler-benchmarker can be a useful tool for administrators and developers looking to optimize the scaling capabilities of their EKS clusters. The tool offers a streamlined process for benchmarking the performance of Karpenter and Cluster Autoscaler for EKS workloads.

By providing time metrics on EC2 instance initiation, node registration, pod readiness, node deregistration, and instance termination times, it enables users to quickly test autoscaler settings such as consolidateAfter for Karpenter and scale-down-unneeded-time, node-delete-delay-after-taint, scale-down-delay-after-add, etc. for Cluster Autoscaler.

This tool also supports customization through a variety of parameters, ensuring that users can adapt the benchmarking process to their specific environment while also providing defaults for quick testing.

Currently Supported Features

  • Benchmarking Metrics: The tool currently tracks the following metrics for a given autoscaler (Karpenter or Cluster Autoscaler):
    1. Total time for EC2 instances to initiate their boot process after failed pod scheduling.
    2. Total time for EC2 instances to register to the k8s API after initiating their boot process.
    3. Total time for pod readiness of a deployment after EC2 instances are registered to the k8s API.
    4. Total time for EC2 instances deregistration from k8s API after scaling a deployment to 0.
    5. Total time for EC2 instances termination after scaling a deployment to 0.
  • Customizable Parameters: A wide array of input parameters allows for the customization of the k8s deployment used for benchmarking - supply your own by providing then name and namespace or use an autogenerated deployment customizable via parameters.
  • Clear Results Summary: Benchmark outcomes are concisely summarized to stdout.
  • Flexible Environment Configuration: Supports optional parameters for specifying kubeconfig paths and AWS profiles with default values for ease of use.

Demo

* Note: Instance ids and ip addresses have been redacted with x's in the below demo video. The real program output will show your actual resource ids.

Karpenter Example

The below example uses a user provided Docker image passed in as a parameter. A new deployment will be created since an existing deployment isn't provided by the user. Once the program is terminated the deployment will be deleted.

./k8s-autoscaler-benchmarker --nodepool k8s-autoscaler-benchmarker --replicas 2 --container-name redis --container-image redis/redis-stack
karpenter-demo.mp4
Benchmarks Summary
--------------------------------------------
Instance Initiation Time:     3.65 seconds
Instance Registration Time:   40.22 seconds
Pod Readiness Time:           31.46 seconds
Instance Deregistration Time: 20.12 seconds
Instance Termination Time:    96.24 seconds
--------------------------------------------

Cluster Autoscaler Example

The below example does not use an existing deployment nor passes in a custom Docker image. A new deployment will be created with the default "inflate" deployment. Once the program is terminated the deployment will be deleted.

 ./k8s-autoscaler-benchmarker --node-group k8s-autoscaler-benchmarker-ng --replicas 2
cluster-autoscaler-demo.mp4
Benchmarks Summary
--------------------------------------------
Instance Initiation Time:     21.63 seconds
Instance Registration Time:   34.21 seconds
Pod Readiness Time:           3.45 seconds
Instance Deregistration Time: 133.70 seconds
Instance Termination Time:    132.05 seconds
--------------------------------------------

Prerequisites

  • An active EKS cluster
  • AWS CLI configured with access to the EKS Cluster
  • kubectl configured with access to the EKS Cluster
  • Go 1.16 or later installed on your machine
  • For Karpenter:
    • Install Karpenter in the EKS Cluster
    • Setup a NodePool and EC2NodeClass similar to this NodePool example (Note: The eks.autify.com/k8s-autoscaler-benchmarker label and taint are required with their respective values for the default values to function correctly unless overridden via parameters)
  • For Cluster Autoscaler:
    • Install Cluster Autoscaler in the EKS Cluster
    • Create a Managed Node Group similar to this Managed Node Group example (Note: The eks.autify.com/k8s-autoscaler-benchmarker label and taint are required with its respective value for the default values to function correctly unless overridden via parameters)

Installation

To use the k8s-autoscaler-benchmarker, clone this repository and build the tool using Go:

git clone https://github.com/moebaca/k8s-autoscaler-benchmarker.git && cd k8s-autoscaler-benchmarker
go build -o k8s-autoscaler-benchmarker

Usage

Execute the tool with the following command, providing any desired options:

./k8s-autoscaler-benchmarker [options]

Input Parameters

Name Description Type Default Required
nodepool The Karpenter node pool tag value to monitor. One of nodepool or node-group must be provided. string N/A Yes*
node-group The ASG node group name to monitor. One of nodepool or node-group must be provided. string N/A Yes*
kubeconfig Path to the kubeconfig file to use for CLI requests. string (uses default kubeconfig path) No
aws-profile The AWS profile to use for accessing EC2 services. string default No
deployment The name of the deployment to benchmark. If not supplied, one will be created automatically. This deployment WILL NOT be deleted upon program termination. string N/A No
namespace The namespace of the deployment. string default No
replicas The number of replicas to scale the deployment to. int 1 No
container-name The name of the container AND generated deployment if an existing deployment isn't supplied. This deployment WILL be deleted upon program termination. string inflate No
container-image The image of the container in the generated deployment if an existing deployment isn't supplied. string public.ecr.aws/eks-distro/kubernetes/pause:3.7 No
cpu-request The CPU request for the container in the generated deployment if an existing deployment isn't supplied. string 1 No
toleration-key The toleration key for the generated deployment if an existing deployment isn't supplied. string eks.autify.com/k8s-autoscaler-benchmarker No
toleration-value The toleration value for the generated deployment if an existing deployment isn't supplied. string N/A No
node-selector-key The node selector key for the generated deployment if an existing deployment isn't supplied. string eks.autify.com/k8s-autoscaler-benchmarker No
node-selector-value The node selector value for the generated deployment if an existing deployment isn't supplied. string true No

* Note: Either nodepool (for Karpenter) or node-group (for Cluster Autoscaler) is required for the tool to function correctly. Both parameters should not be provided at the same time.

Examples

Running a benchmark with all default settings:

With Karpenter:

./k8s-autoscaler-benchmarker --nodepool k8s-autoscaler-benchmarker

or with Cluster Autoscaler:

./k8s-autoscaler-benchmarker --node-group k8s-autoscaler-benchmarker

Benchmarking with Karpenter using an existing deployment in a custom namespace with 3 replicas:

./k8s-autoscaler-benchmarker --nodepool k8s-autoscaler-benchmarker --deployment my-deployment --namespace my-namespace --replicas 3

Benchmarking with Karpenter using a custom Docker image with 2 replicas:

./k8s-autoscaler-benchmarker --nodepool k8s-autoscaler-benchmarker --replicas 2 --container-name redis --container-image redis/redis-stack

Troubleshooting

  • If the program prompts you of a timeout during the scaling of the deployment please check for pod errors before exiting with 'no':
    1. There may be an issue with taints/tolerations or labels not matching between the deployment and the node group/nodepool.
    2. Cluster Autoscaler may not scale node group initially right after creation. I've found manually setting min size and desired capacity to 1 and then back to 0 fixes this (only required right after initial creation).
  • If you find the program stalls with only partial pod startup during the scaling of the deployment the autoscaler may not be able to scale the entire deployment due to node group limits (eg. maximum size of the node group reached). Use less replicas or increase the node group max size to fix this. Always restart the benchmark after making changes to the node group.
  • If you find the program stalls with 0 pods starting up check to ensure there aren't any container CrashLoopBackOff occuring.

Contributing

Contributions to the k8s-autoscaler-benchmarker project are welcome. Please submit pull requests to the main repository.

TODO

More test coverage.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A tool for benchmarking Karpenter and Cluster Autoscaler on EKS.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages