-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create benchmarks/receive for load and auto scale testing of receive #34
Conversation
Signed-off-by: Matthias Loibl <[email protected]>
Signed-off-by: Matthias Loibl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work!
sleep 300 | ||
# 25m | ||
echo "delete thanos-receive-default-6" | ||
kubectl delete pod -n thanos thanos-receive-default-6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. We are doing chaos engineering here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great 🚀
I guess it needs a cleanup and we can merge it.
ctx, cancel := context.WithCancel(context.Background()) | ||
ticker := time.NewTicker(time.Second / time.Duration(tenant.Reqs)) | ||
gr.Add(func() error { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
configMap: | ||
name: thanos-receive-benchmark-config | ||
--- | ||
#apiVersion: apps/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we delete this section since it's not used?
- --query.replica-label=prometheus_replica | ||
- --query.replica-label=rule_replica | ||
- --store=dnssrv+_grpc._tcp.thanos-receive-default.thanos.svc.cluster.local | ||
image: quay.io/thanos/thanos:v0.17.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use the latest stable or even the latest RC?
valueFrom: | ||
fieldRef: | ||
fieldPath: metadata.namespace | ||
image: quay.io/observatorium/thanos-receive-controller:master-2020-06-17-a9d9169 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include the latest improvements of yours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, see description. 😉
The current manifests use Yash's PR as image and the thanos-receive-controller with its latest changes was run on the outside. We should probably clean this up once those merge. :)
@@ -0,0 +1,139 @@ | |||
#apiVersion: apps/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need a clean up.
@@ -0,0 +1,122 @@ | |||
#apiVersion: apps/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Yep, overall we can probably just wait and in the mean time try to get thanos-io/thanos#3845 merged :) |
Co-authored-by: Yash Sharma <[email protected]>
So I ran this locally, with some small changes it still works. Would there be an interest to open a new PR to finish this and merge it? |
Yes, please go for it. I'm afraid I don't have time these days. |
This benchmark was started with the idea of building something similar to Avalanche and up in the sense that it would scale the benchmark and chaos egineer during the run. It turned out to be more effective to simply scale up a Deployment of benchmarks and most likely the custom written benchmark is now easily replaceable with Avalanche... 😁
Most important is the
run.sh
that increases load then deletes two Pods while at full load and in the end scales down all of it. The goal was to still be over 99% available for both errors and latency.The current manifests use Yash's PR as image and the thanos-receive-controller with its latest changes was run on the outside. We should probably clean this up once those merge. :)
cc @kakkoyun @yashrsharma44