Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multitenant: re-enable admission control fairness tests #89721

Merged
merged 2 commits into from
Dec 6, 2022

Conversation

cucaroach
Copy link
Contributor

@cucaroach cucaroach commented Oct 11, 2022

Previously these tests were disabled for being flakey. Re-enable them
and increase tenant resource limits to prevent throughput collapse,
not sure why this wasn't an issue originally. Also disable running the
tests w/o admission control as that mode is flakey and no longer of
interest.

Also includes some commented out code to attempt to use prometheus
graphana, I couldn't get it to work but its probably close.

Fixes: #82033, #83994

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cucaroach cucaroach force-pushed the mt-fairness-2 branch 2 times, most recently from 0102d86 to 5f14dd4 Compare October 11, 2022 11:53
@cucaroach cucaroach marked this pull request as ready for review October 11, 2022 16:45
@irfansharif irfansharif requested a review from a team October 11, 2022 16:46
@irfansharif irfansharif self-assigned this Oct 17, 2022
@irfansharif
Copy link
Contributor

Spoke to Tommy, going to take over this PR.

Previously these tests were disabled for being flakey. Re-enable them
and increase tenant resource limits to prevent throughput collapse,
not sure why this wasn't an issue originally. Also disable running the
tests w/o admission control as that mode is flakey and no longer of
interest.

Also includes some commented out code to attempt to use prometheus
graphana, I couldn't get it to work but its probably close.

Fixes: cockroachdb#82033, cockroachdb#83994

Release note: None
@irfansharif irfansharif force-pushed the mt-fairness-2 branch 3 times, most recently from 6226993 to 530ef25 Compare December 3, 2022 05:13
@irfansharif irfansharif marked this pull request as ready for review December 3, 2022 05:13
@irfansharif irfansharif requested a review from a team as a code owner December 3, 2022 05:13
@irfansharif irfansharif requested review from smg260 and renatolabs and removed request for a team December 3, 2022 05:13
@irfansharif
Copy link
Contributor

irfansharif commented Dec 3, 2022

+cc @andrewbaptist, @sumeerbhola this is now ready for a look. https://www.loom.com/share/3fcbc44fd3f04650b5e5b702b6d9d21d is what the prometheus dashboard looks like. Probably a lot of the roachtest primitives for multi-tenant tests can be improved (it's a bit odd that the workload binary and the tenant SQL pods are physically colocated), but leaving things as is for now. It's easiest probably to read the file from scratch instead of the diff.

@irfansharif irfansharif force-pushed the mt-fairness-2 branch 4 times, most recently from 04d4e6d to 184c154 Compare December 5, 2022 15:10
@irfansharif
Copy link
Contributor

=== RUN   admission-control/multitenant-fairness/read-heavy/even
=== RUN   admission-control/multitenant-fairness/write-heavy/skewed
=== RUN   admission-control/multitenant-fairness/read-heavy/skewed
=== RUN   admission-control/multitenant-fairness/write-heavy/even
...
--- PASS: admission-control/multitenant-fairness/write-heavy/skewed (1490.67s)
no work remaining; runWorker is bailing out...
--- PASS: admission-control/multitenant-fairness/write-heavy/even (1482.65s)
no work remaining; runWorker is bailing out...
--- PASS: admission-control/multitenant-fairness/read-heavy/even (1941.08s)
no work remaining; runWorker is bailing out...
--- PASS: admission-control/multitenant-fairness/read-heavy/skewed (1956.95s)
no work remaining; runWorker is bailing out...
PASS

Zipped artifacts for all variations: artifacts.tar.gz. As usual, navigate to each folder's prometheus-docker-run.sh and load up your locally running grafana with the dashboard over at http://go.crdb.dev/p/multi-tenant-fairness-grafana. Looks something like this for read-heavy/skewed:

image

image

image

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 10 of 10 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @renatolabs, and @smg260)


pkg/cmd/roachtest/tests/admission_control_multitenant_fairness.go line 40 at r2 (raw file):

//
// [1]: Co-locating the SQL pod and the workload generator is a bit funky, but
//

nit: remove empty line


pkg/cmd/roachtest/tests/multitenant_fairness.go line 55 at r1 (raw file):

		{
			name:        "concurrency-skew",
			concurrency: func(i int) int { return i * 250 },

btw, do we see the same throughput across the different tenants for the concurrency-skew tests (both for the CPU and store overload setups)? IIRC, we had inconclusive results earlier and it was not clear to me why -- sometimes it seemed that the cause was that the minimum concurrency was insufficient to use 1/4th of the bottleneck resource.

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs, @smg260, and @sumeerbhola)


pkg/cmd/roachtest/tests/admission_control_multitenant_fairness.go line 40 at r2 (raw file):

Previously, sumeerbhola wrote…

nit: remove empty line

Done.


pkg/cmd/roachtest/tests/multitenant_fairness.go line 55 at r1 (raw file):

Previously, sumeerbhola wrote…

btw, do we see the same throughput across the different tenants for the concurrency-skew tests (both for the CPU and store overload setups)? IIRC, we had inconclusive results earlier and it was not clear to me why -- sometimes it seemed that the cause was that the minimum concurrency was insufficient to use 1/4th of the bottleneck resource.

We see the same throughput across the different tenants in read-heavy/skewed:

image.png

The second one, write-heavy/skewed is a bit more interesting. We see something like this. The periods where there is a skew in throughput are where we're not hitting the admission IO threshold (more apparent at the start), but as soon as we do, we do see that throughput is made identical. The oscillations between both regimes is interesting, and doesn't look desirable.

image copy 2.png

Also integrate prometheus, --skip-init.

Release note: None
@craig
Copy link
Contributor

craig bot commented Dec 5, 2022

Canceled.

@irfansharif
Copy link
Contributor

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 6, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: re-enable admission control fairness tests
4 participants