-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtests: introduce admission-control/elastic-backup #89482
roachtests: introduce admission-control/elastic-backup #89482
Conversation
Informs cockroachdb#89208. This test sets up a 3-node CRDB cluster on 8vCPU machines running 1000-warehouse TPC-C with an aggressive (every 20m) full backup schedule. We've observed latency spikes during backups because of its CPU-heavy nature -- it can elevate CPU scheduling latencies which in turn translates to an increase in foreground latency. In cockroachdb#86638 we introduced admission control mechanisms to dynamically pace such work while maintaining acceptable CPU scheduling latencies (sub millisecond p99s). This roachtest exercises that machinery. In future commits we'll add libraries to the roachtest package to automatically spit out the degree to which {CPU-scheduler,foreground} latencies are protected. Release note: None
Here's a prometheus dump of an experiment run. To see it, run the contained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @renatolabs)
pkg/cmd/roachtest/tests/admission_control_snapshot_overload.go
line 92 at r1 (raw file):
t.Status(fmt.Sprintf("setting up prometheus/grafana (<%s)", 2*time.Minute)) {
Nit: Remove this change from this PR, or look at my other PR which pulls this into admission_control.go.
pkg/cmd/roachtest/tests/util.go
line 121 at r1 (raw file):
"admission.sql_kv_response.enabled", "admission.sql_sql_response.enabled", "admission.elastic_cpu.enabled",
Its not clear that we want to enable this in all tests where admission control is enabled. Calling either true
or false
for this method does not result in the same as not setting anything. I'm concerned that users of this method won't do the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the speedy review!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @renatolabs)
pkg/cmd/roachtest/tests/admission_control_snapshot_overload.go
line 92 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Nit: Remove this change from this PR, or look at my other PR which pulls this into admission_control.go.
If you notice I'm using a different grafana dashboard for the backup roachtest vs. the snapshot one. Quickly skimming through #89324, I think instead of a shared constructor for prometheus.Config (which itself is a constructor param for setting up prometheus), perhaps we ought to add some smarts to setupPrometheusForRoachtest or create a "default prometheus.Config" that assumes: - last-node-houses-prom/workload-generator
- every-other-node-runs-crdb, and
- every-other-node-should-run-node-exporter
And then tests can use that + annotate it with their custom dashboards. I'll keep this patch as is and leave the refactoring to a future commit/#89324.
pkg/cmd/roachtest/tests/util.go
line 121 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Its not clear that we want to enable this in all tests where admission control is enabled. Calling either
true
orfalse
for this method does not result in the same as not setting anything. I'm concerned that users of this method won't do the right thing.
The only uses of this are tests we already own or shall soon enough 😛
The current uses look ok to me.
Build succeeded: |
Fixes cockroachdb#89600. This test makes use of `workload querybench`, which is only available in the `workload` binary. This test started failing after \cockroachdb#89482 where we (attempted) to replace all uses of `./workload` with `./cockroach workload`. Release note: None
89813: kvserver: remove legacy snapshot and diff code r=erikgrinaker a=pavelkalinnikov This commit removes the proto and code for `RaftDataSnapshot`. This type was used for reporting replica snapshots and computing diffs, but now this functionality has been removed in favor of storage checkpoints and offline tooling. Part of #21128 Epic: none Release note: None 90006: roachtest: Introduce a test to overwhelm nodes r=irfansharif a=andrewbaptist This test is here to check behavior of the system as the SQL load greatly exceeds what the nodes are able to handle. In the future we want to evaulate how this is handled, but today this will cause nodes to OOM. Informs #89142. Release note: None 90063: roachtest: de-flake admission-control/tpcc-olap r=irfansharif a=irfansharif Fixes #89600. This test makes use of `workload querybench`, which is only available in the `workload` binary. This test started failing after \#89482 where we (attempted) to replace all uses of `./workload` with `./cockroach workload`. Release note: None Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: irfan sharif <[email protected]>
Informs #89208. This test sets up a 3-node CRDB cluster on 8vCPU machines running 1000-warehouse TPC-C with an aggressive (every 20m) full backup schedule. We've observed latency spikes during backups because of its CPU-heavy nature -- it can elevate CPU scheduling latencies which in turn translates to an increase in foreground latency. In #86638 we introduced admission control mechanisms to dynamically pace such work while maintaining acceptable CPU scheduling latencies (sub millisecond p99s). This roachtest exercises that machinery. In future commits we'll add libraries to the roachtest package to automatically spit out the degree to which {CPU-scheduler,foreground} latencies are protected.
Release note: None