Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: Use response data in the load-based splitter #89217

Merged
merged 1 commit into from
Oct 28, 2022

Conversation

KaiSun314
Copy link
Contributor

@KaiSun314 KaiSun314 commented Oct 3, 2022

Fixes #87279

We investigated why running YCSB Workload E results in a single hot
range and we observed that range queries of the form
SELECT * FROM table WHERE pkey >= A LIMIT B will result in all request
spans having the same end key - similar to [A, range_end) - rather than
end keys that take into account the specified LIMIT. Since the majority
of request spans have the same end key, the load splitter algorithm
cannot find a split key without too many contained and balance between
left and right requests. A proposed solution is to use the response span
rather than the request span, since the response span is more accurate
in reflecting the keys that this request truly iterated over. We utilize
the request span as well as the response's resume span to derive the key
span that this request truly iterated over. Using response data (resume
span) rather than just the request span in the load-based splitter
(experimentally) allows the load-based splitter to find a split key
under range query workloads (YCSB Workload E, KV workload with spans).

Ops/sec for YCSB-E workload with / without this change and various number of nodes (3 / 5) and CPUs (8 / 32): https://docs.google.com/spreadsheets/d/1OcvRUkXORiGpr-f7cMAiuv9DW7qQZgconfqE4UbfQ2c/edit?usp=sharing

Release note (ops change): We use response data rather than just the
request span in the load-based splitter to pass more accurate data
about the keys iterated over to the load splitter to find a suitable
split key, enabling the load splitter to find a split key under heavy
range query workloads.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@KaiSun314 KaiSun314 force-pushed the enable_ycsb_split branch 3 times, most recently from 9966d11 to 87c2d67 Compare October 4, 2022 19:51
@KaiSun314 KaiSun314 changed the title Enable ycsb split kvserver: Use response data rather than just the request span in the load-based splitter Oct 4, 2022
@KaiSun314 KaiSun314 changed the title kvserver: Use response data rather than just the request span in the load-based splitter kvserver: Use response data in the load-based splitter Oct 4, 2022
@KaiSun314 KaiSun314 requested a review from kvoli October 5, 2022 13:52
@KaiSun314 KaiSun314 marked this pull request as ready for review October 5, 2022 14:28
@KaiSun314 KaiSun314 requested a review from a team as a code owner October 5, 2022 14:28
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice patch. Can you post the results of running ycsb-e for the unsplittable metrics and workload ops/s when you get a chance.

Going to tag @aayushshah15 to also take a look.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @KaiSun314)


pkg/kv/kvserver/replica_send.go line 405 at r1 (raw file):

					// Request:    [key...............endKey]
					// ResumeSpan:          [key......endKey]
					// True span:  [key......key]

nit: for clarity, add in [key,endKey) rather than []. To avoid any confusion about theendKey being inclusive when it is exclusive.

Code quote:

					// Request:    [key...............endKey]
					// ResumeSpan:          [key......endKey]
					// True span:  [key......key]

@kvoli kvoli requested review from kvoli and aayushshah15 October 5, 2022 21:05
Copy link
Contributor Author

@KaiSun314 KaiSun314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)


pkg/kv/kvserver/replica_send.go line 405 at r1 (raw file):

Previously, kvoli (Austen) wrote…

nit: for clarity, add in [key,endKey) rather than []. To avoid any confusion about theendKey being inclusive when it is exclusive.

Done. Changed to [key, endKey). Also changed for reverse scan comment below, just want to confirm that reverse scans are also exclusive [key, endKey)?

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/replica_send.go line 405 at r1 (raw file):

Previously, KaiSun314 (Kai Sun) wrote…

Done. Changed to [key, endKey). Also changed for reverse scan comment below, just want to confirm that reverse scans are also exclusive [key, endKey)?

That's right, take a look at this comment. It is specific to spans:

https://github.com/cockroachdb/cockroach/blob/20e55fcca7651472a6b112900c413e67b270979f/pkg/roachpb/data.pb.go#L285-L288

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @KaiSun314)


pkg/kv/kvserver/replica_send.go line 381 at r2 (raw file):

var _ batchExecutionFn = (*Replica).executeReadOnlyBatch

func getTrueSpans(

This function re-implements the logic represented by Replica.collectSpansRead(). Let's reuse that method here. Also note the special handling for SkipLocked requests that is dealt with in collectSpansRead. Making this change should simplify this patch significantly.

Copy link
Contributor Author

@KaiSun314 KaiSun314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)


pkg/kv/kvserver/replica_send.go line 381 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

This function re-implements the logic represented by Replica.collectSpansRead(). Let's reuse that method here. Also note the special handling for SkipLocked requests that is dealt with in collectSpansRead. Making this change should simplify this patch significantly.

Done.

For SkipLocked requests, my understanding is that the spans returned will contain one span for each key in the response. I think using Replica.collectSpansRead() in this case should still be fine, since we take the union of all the spans returned?

@kvoli kvoli requested a review from aayushshah15 October 24, 2022 19:53
@KaiSun314
Copy link
Contributor Author

TFTRs!

Bazel Extended CI test failures appear to be unrelated.

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 25, 2022

Build failed:

@KaiSun314
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 25, 2022

Build failed:

@KaiSun314
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 25, 2022

Build failed:

@kvoli
Copy link
Collaborator

kvoli commented Oct 25, 2022

Try rebasing on master - it looks like a persistent issue with that test that is unrelated.

@knz
Copy link
Contributor

knz commented Oct 25, 2022

alas master is where the problem is. That's why the bors run failed and not the original CI on this PR.

@KaiSun314 KaiSun314 force-pushed the enable_ycsb_split branch 2 times, most recently from a5103ad to 6be3d43 Compare October 26, 2022 14:57
We investigated why running YCSB Workload E results in a single hot
range and we observed that range queries of the form
SELECT * FROM table WHERE pkey >= A LIMIT B will result in all request
spans having the same end key - similar to [A, range_end) - rather than
end keys that take into account the specified LIMIT. Since the majority
of request spans have the same end key, the load splitter algorithm
cannot find a split key without too many contained and balance between
left and right requests. A proposed solution is to use the response span
rather than the request span, since the response span is more accurate
in reflecting the keys that this request truly iterated over. We utilize
the request span as well as the response's resume span to derive the key
span that this request truly iterated over. Using response data (resume
span) rather than just the request span in the load-based splitter
(experimentally) allows the load-based splitter to find a split key
under range query workloads (YCSB Workload E, KV workload with spans).

Release note (ops change): We use response data rather than just the
request span in the load-based splitter to pass more accurate data
about the keys iterated over to the load splitter to find a suitable
split key, enabling the load splitter to find a split key under heavy
range query workloads.
@KaiSun314
Copy link
Contributor Author

bors r+

@KaiSun314
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 28, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

workload: ycsb-e creates a single hot range
5 participants