-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multitenant: re-enable admission control fairness tests #89721
Conversation
0102d86
to
5f14dd4
Compare
Spoke to Tommy, going to take over this PR. |
6e45bbe
to
97aaa39
Compare
Previously these tests were disabled for being flakey. Re-enable them and increase tenant resource limits to prevent throughput collapse, not sure why this wasn't an issue originally. Also disable running the tests w/o admission control as that mode is flakey and no longer of interest. Also includes some commented out code to attempt to use prometheus graphana, I couldn't get it to work but its probably close. Fixes: cockroachdb#82033, cockroachdb#83994 Release note: None
6226993
to
530ef25
Compare
+cc @andrewbaptist, @sumeerbhola this is now ready for a look. https://www.loom.com/share/3fcbc44fd3f04650b5e5b702b6d9d21d is what the prometheus dashboard looks like. Probably a lot of the roachtest primitives for multi-tenant tests can be improved (it's a bit odd that the workload binary and the tenant SQL pods are physically colocated), but leaving things as is for now. It's easiest probably to read the file from scratch instead of the diff. |
04d4e6d
to
184c154
Compare
Zipped artifacts for all variations: artifacts.tar.gz. As usual, navigate to each folder's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 10 of 10 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @renatolabs, and @smg260)
pkg/cmd/roachtest/tests/admission_control_multitenant_fairness.go
line 40 at r2 (raw file):
// // [1]: Co-locating the SQL pod and the workload generator is a bit funky, but //
nit: remove empty line
pkg/cmd/roachtest/tests/multitenant_fairness.go
line 55 at r1 (raw file):
{ name: "concurrency-skew", concurrency: func(i int) int { return i * 250 },
btw, do we see the same throughput across the different tenants for the concurrency-skew tests (both for the CPU and store overload setups)? IIRC, we had inconclusive results earlier and it was not clear to me why -- sometimes it seemed that the cause was that the minimum concurrency was insufficient to use 1/4th of the bottleneck resource.
184c154
to
af396e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs, @smg260, and @sumeerbhola)
pkg/cmd/roachtest/tests/admission_control_multitenant_fairness.go
line 40 at r2 (raw file):
Previously, sumeerbhola wrote…
nit: remove empty line
Done.
pkg/cmd/roachtest/tests/multitenant_fairness.go
line 55 at r1 (raw file):
Previously, sumeerbhola wrote…
btw, do we see the same throughput across the different tenants for the concurrency-skew tests (both for the CPU and store overload setups)? IIRC, we had inconclusive results earlier and it was not clear to me why -- sometimes it seemed that the cause was that the minimum concurrency was insufficient to use 1/4th of the bottleneck resource.
We see the same throughput across the different tenants in read-heavy/skewed
:
The second one, write-heavy/skewed
is a bit more interesting. We see something like this. The periods where there is a skew in throughput are where we're not hitting the admission IO threshold (more apparent at the start), but as soon as we do, we do see that throughput is made identical. The oscillations between both regimes is interesting, and doesn't look desirable.
Also integrate prometheus, --skip-init. Release note: None
af396e7
to
51f27e6
Compare
Canceled. |
bors r+ |
Build succeeded: |
Previously these tests were disabled for being flakey. Re-enable them
and increase tenant resource limits to prevent throughput collapse,
not sure why this wasn't an issue originally. Also disable running the
tests w/o admission control as that mode is flakey and no longer of
interest.
Also includes some commented out code to attempt to use prometheus
graphana, I couldn't get it to work but its probably close.
Fixes: #82033, #83994
Release note: None