Skip to content

Commit

Permalink
fix: errors when deleting and provisioning load test clusters (#593)
Browse files Browse the repository at this point in the history
Updated the load test configuration to:
- Use c3d-standard-4 instead of n1-standard-2 node types
- To enable the dataplane-v2 for better network performance
- To leveraging a static policy when scheduling pods on nodes, allowing
us to host multiple locust processes on a single node
- To increase the node connection table capacity to prevent connection
throttling

Issue: SYNC-4082

---------

Co-authored-by: Eric Maydeck <[email protected]>
Co-authored-by: Dustin Lactin <[email protected]>
  • Loading branch information
3 people authored Feb 5, 2024
1 parent dc07762 commit 925130a
Show file tree
Hide file tree
Showing 6 changed files with 73 additions and 12 deletions.
2 changes: 1 addition & 1 deletion tests/load/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ Execute the `setup_k8s.sh` file from the root directory and select the **delete*
[13]: https://docs.locust.io/en/stable/writing-a-locustfile.html#wait-time
[14]: https://docs.locust.io/en/stable/writing-a-locustfile.html#task-decorator
[15]: https://console.cloud.google.com/home/dashboard?q=search&referrer=search&project=spheric-keel-331521&cloudshell=false
[16]: https://console.cloud.google.com/compute/instances?project=spheric-keel-331521
[16]: https://console.cloud.google.com/gcr/images/spheric-keel-331521/global/locust-autopush?project=spheric-keel-331521
[17]: https://earthangel-b40313e5.influxcloud.net/d/do4mmwcVz/autopush-gcp?orgId=1&refresh=1m


8 changes: 6 additions & 2 deletions tests/load/kubernetes-config/locust-worker-controller.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ spec:
labels:
app: locust-worker
spec:
nodeSelector:
node-pool: locust-workers
containers:
- name: locust-worker
image: gcr.io/[PROJECT_ID]/locust-autopush:[LOCUST_IMAGE_TAG]
Expand All @@ -29,9 +31,11 @@ spec:
- name: LOCUST_LOGFILE
value:
resources:
# Forcing requests and limits to match to ensured pods run in Guaranteed QoS class
# Using 1 core per worker based on recommendations from https://docs.locust.io/en/stable/running-distributed.html
limits:
cpu: 2
cpu: 1
memory: 3Gi
requests:
cpu: 1
memory: 2Gi
memory: 3Gi
49 changes: 49 additions & 0 deletions tests/load/kubernetes-config/locust-worker-daemonset.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: "locust-worker-sysctl"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "locust-worker"
template:
metadata:
labels:
app.kubernetes.io/name: "locust-worker"
spec:
nodeSelector:
cloud.google.com/gke-nodepool: "locust-workers"
hostNetwork: true
initContainers:
- name: sysctl
image: alpine:3
command:
- /bin/sh
- -c
- |
sysctl net.netfilter.nf_conntrack_max=1048576
securityContext:
privileged: true
resources:
requests:
cpu: 10m
memory: 10Mi
limits:
cpu: 10m
memory: 10Mi
containers:
- name: sleep
image: alpine:3
command:
- /bin/sh
- -c
- |
while true; do sleep 60s; done
resources:
requests:
cpu: 10m
memory: 10Mi
limits:
cpu: 10m
memory: 10Mi
2 changes: 2 additions & 0 deletions tests/load/kubernetes-config/worker-kubelet-config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
kubeletConfig:
cpuManagerPolicy: static
8 changes: 4 additions & 4 deletions tests/load/locustfiles/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,16 @@ def calculate_users(self, run_time: int) -> int:


class AutopushLoadTestShape(LoadTestShape):
"""A load test shape class for Autopush (Duration: 10 minutes, Users: 83300).
"""A load test shape class for Autopush (Duration: 10 minutes, Users: 15000).
Note: The Shape class assumes that the workers can support the generated spawn rates. Should
the number of available Locust workers change or should the Locust worker capacity change,
the MAX_USERS should also be changed.
the WORKERS_COUNT and USERS_PER_WORKER values must be changed respectively.
"""

MAX_RUN_TIME: int = 600 # 10 minutes
WORKER_COUNT: int = 300 # Must match value defined in setup_k8s.sh
USERS_PER_WORKER: int = 500 # Number of users supported on a worker running on a n1-standard-2
WORKER_COUNT: int = 150 # Must match value defined in setup_k8s.sh
USERS_PER_WORKER: int = 1000 # Number of users supported on a c3d-standard-4 hosted worker
MAX_USERS: int = WORKER_COUNT * USERS_PER_WORKER
trend: QuadraticTrend
user_classes: list[Type[User]] = [AutopushUser]
Expand Down
16 changes: 11 additions & 5 deletions tests/load/setup_k8s.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ CLUSTER='autopush-locust-load-test'
TARGET='https://updates-autopush.stage.mozaws.net'
SCOPE='https://www.googleapis.com/auth/cloud-platform'
REGION='us-central1'
WORKER_COUNT=300
MACHINE_TYPE='n1-standard-2' # 2 CPUs + 7.50GB Memory
WORKER_COUNT=150
MACHINE_TYPE='c3d-standard-4' # 4 CPUs + 16GB Memory
BOLD=$(tput bold)
NORM=$(tput sgr0)
DIRECTORY=$(pwd)
Expand All @@ -20,6 +20,8 @@ AUTOPUSH_DIRECTORY=$DIRECTORY/tests/load/kubernetes-config
MASTER_FILE=locust-master-controller.yml
WORKER_FILE=locust-worker-controller.yml
SERVICE_FILE=locust-master-service.yml
DAEMONSET_FILE=locust-worker-daemonset.yml
WORKER_KUBELET_CONFIG_FILE=worker-kubelet-config.yml

LOCUST_IMAGE_TAG=$(git log -1 --pretty=format:%h)
echo "Image tag for locust is set to: ${LOCUST_IMAGE_TAG}"
Expand Down Expand Up @@ -84,6 +86,7 @@ SetupGksCluster()
$KUBECTL apply -f $AUTOPUSH_DIRECTORY/$MASTER_FILE
$KUBECTL apply -f $AUTOPUSH_DIRECTORY/$SERVICE_FILE
$KUBECTL apply -f $AUTOPUSH_DIRECTORY/$WORKER_FILE
$KUBECTL apply -f $AUTOPUSH_DIRECTORY/$DAEMONSET_FILE

echo -e "==================== Verify the Locust deployments & Services"
$KUBECTL get pods -o wide
Expand All @@ -98,13 +101,16 @@ do
case $response in
create) #Setup Kubernetes Cluster
echo -e "==================== Creating the GKE cluster "
# The total-max-nodes = WORKER_COUNT + 1 (MASTER)
$GCLOUD container clusters create $CLUSTER --region $REGION --scopes $SCOPE --enable-autoscaling --total-min-nodes "1" --total-max-nodes "301" --scopes=logging-write,storage-ro --addons HorizontalPodAutoscaling,HttpLoadBalancing --machine-type $MACHINE_TYPE
$GCLOUD container clusters create $CLUSTER --region $REGION --scopes $SCOPE --enable-autoscaling --scopes=logging-write,storage-ro --machine-type=$MACHINE_TYPE --addons HorizontalPodAutoscaling,HttpLoadBalancing --enable-dataplane-v2
# Created 'locust-workers' node pool to enforce static policy (grants Guaranteed pods with integer CPU requests access to exclusive CPUs
# https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy
$GCLOUD container node-pools create locust-workers --cluster=$CLUSTER --region $REGION --node-labels=node-pool=locust-workers --enable-autoscaling --total-min-nodes=1 --total-max-nodes=75 --scopes=$SCOPE,logging-write,storage-ro --machine-type=$MACHINE_TYPE --system-config-from-file=$AUTOPUSH_DIRECTORY/$WORKER_KUBELET_CONFIG_FILE
SetupGksCluster
break
;;
delete)
echo -e "==================== Delete the GKE cluster "
# This should delete the 'locust-workers' node pool
$GCLOUD container clusters delete $CLUSTER --region $REGION
break
;;
Expand All @@ -118,4 +124,4 @@ do
break
;;
esac
done
done

0 comments on commit 925130a

Please sign in to comment.