-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: reuse the slice of RequestUnion objects between fetches #82384
Conversation
67d3c3a
to
eb8920c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I'm still wrapping my head around these changes, but here are a few minor nits and questions.
Reviewed 3 of 7 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
-- commits
line 2 at r2:
What's the performance or memory improvement of this in benchmarking? I'm finding the use of reqsScratch
for reuse and its memory management throughout hard to follow, and wondering what the realized benefits are. Is it worth the extra complexity?
pkg/kv/kvclient/kvstreamer/streamer.go
line 1416 at r2 (raw file):
// pool. if numIncompleteRequests > 0 { // Make to sure to nil out old requests that we didn't include into the
s/Make to sure/Make sure
pkg/roachpb/api.go
line 1806 at r2 (raw file):
// RequestUnionSize is the size of the RequestUnion object. const RequestUnionSize = int64(unsafe.Sizeof(RequestUnion{}))
Why did this get moved into roachpb?
pkg/sql/row/kv_batch_streamer.go
line 77 at r2 (raw file):
} keyLocking := getKeyLockingStrength(lockStrength) reqs := spansToRequests(spans, false /* reverse */, keyLocking, reqsScratch)
One of the code complexities I'm struggling with is the subtle difference between the two usages of spansToRequests
and specifically how reqsScratch
is handled. No suggestion/ask here at the moment, just a thought.
pkg/sql/row/kv_batch_streamer.go
line 83 at r2 (raw file):
// Make sure to nil out requests in order to lose references to the // underlying Get and Scan requests which could keep large byte slices // alive.
Maybe add to the comment that reqs
is now going to be used as the (potentially new) scratch space.
eb8920c
to
0158631
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
Previously, rharding6373 (Rachael Harding) wrote…
What's the performance or memory improvement of this in benchmarking? I'm finding the use of
reqsScratch
for reuse and its memory management throughout hard to follow, and wondering what the realized benefits are. Is it worth the extra complexity?
I don't really have qualitative proof to show that it's worth it (because this won't show up in microbenchmarks since this reuse of reqs
only matters when multiple Streamer.Enqueue
calls are performed, and in our microbenchmarks there is exactly one call). (Side note: I guess I could modify the microbenchmarks so that they do require multiple Enqueue calls.)
I tried to do some quick runs of TPCH queries on a single node on my laptop and didn't really see any performance difference (which is expected). What prompted me to work on this change is that these []roachpb.RequestUnion
slices are one of the largest allocations space-wise when running different TPCH queries. I took some heap profiles of running TPCH Q4 five times before:
623 9.34MB 9.34MB reqs := make([]roachpb.RequestUnion, len(spans))
and after
648 . . if cap(reqsScratch) >= len(spans) {
649 . . reqs = reqsScratch[:len(spans)]
650 . . } else {
651 5.39MB 5.39MB reqs = make([]roachpb.RequestUnion, len(spans))
652 . . }
this change, but not sure how representative this is.
I adjusted some benchmarks as well as batch sizes, and here is what I got:
name old time/op new time/op delta
LookupJoinNoOrdering/Cockroach-24 147ms ± 4% 147ms ± 4% ~ (p=1.000 n=10+10)
LookupJoinOrdering/Cockroach-24 146ms ± 2% 142ms ± 4% -2.85% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
LookupJoinNoOrdering/Cockroach-24 26.6MB ± 5% 27.2MB ± 0% ~ (p=0.601 n=10+7)
LookupJoinOrdering/Cockroach-24 28.4MB ± 5% 27.5MB ± 4% -3.21% (p=0.003 n=10+10)
name old allocs/op new allocs/op delta
LookupJoinNoOrdering/Cockroach-24 178k ± 3% 182k ± 0% ~ (p=0.052 n=10+7)
LookupJoinOrdering/Cockroach-24 200k ± 0% 195k ± 3% -2.93% (p=0.006 n=8+10)
This doesn't look like much, so I'm ok with putting this PR on hold, until I have more concrete evidence that this change would move the needle on the stability front. I'm also ok to keeping the change only to the streamer code path and leaving a TODO for the non-streamer one.
pkg/kv/kvclient/kvstreamer/streamer.go
line 1416 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
s/Make to sure/Make sure
Done.
pkg/roachpb/api.go
line 1806 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Why did this get moved into roachpb?
roachpb
is where RequestUnion
is defined, and we want to access RequestUnionSize
from several different packages (kvstreamer
, colfetcher
, row
, rowexec
), so it seemed like the best option.
pkg/sql/row/kv_batch_streamer.go
line 77 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
One of the code complexities I'm struggling with is the subtle difference between the two usages of
spansToRequests
and specifically howreqsScratch
is handled. No suggestion/ask here at the moment, just a thought.
You're right that the reuse of reqs
is done differently in two code paths. My main goal was optimizing the streamer code path, so there the same slice is reused throughout the whole lookup or index join operation, as a result, the slice is plumbed all the way up to the join reader and index joiner, and those operators are responsible for memory accounting for the slice.
I decided to add the reuse to the non-streamer code path too, but the reuse is done only in the scope of a single []roachpb.Span
slice (in order words, the requests are reused only for ResumeSpans if the initial response was partial). If we execute lookup / index joins via the old non-streamer path, then for every lookup batch we are still allocating a new slice for requests (which might be reused if any of the lookups come back with resume spans). Making the reuse "full" would also pushing out at least the reference to the request slice to the users of txnKVFetcher
.
pkg/sql/row/kv_batch_streamer.go
line 83 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Maybe add to the comment that
reqs
is now going to be used as the (potentially new) scratch space.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/sql/row/kv_batch_streamer.go
line 77 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
You're right that the reuse of
reqs
is done differently in two code paths. My main goal was optimizing the streamer code path, so there the same slice is reused throughout the whole lookup or index join operation, as a result, the slice is plumbed all the way up to the join reader and index joiner, and those operators are responsible for memory accounting for the slice.I decided to add the reuse to the non-streamer code path too, but the reuse is done only in the scope of a single
[]roachpb.Span
slice (in order words, the requests are reused only for ResumeSpans if the initial response was partial). If we execute lookup / index joins via the old non-streamer path, then for every lookup batch we are still allocating a new slice for requests (which might be reused if any of the lookups come back with resume spans). Making the reuse "full" would also pushing out at least the reference to the request slice to the users oftxnKVFetcher
.
I think I'll finish the non-streamer code path too since it would benefit things other than lookup and index joins.
0158631
to
8a0150c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the reuse "fully" in the non-streamer code path and unified how the scratch space is held on to and accounted for, I think it should be cleaner now. PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, but I wish we didn't have to cross so many layers to make this change. It seems like reqsScratch
and expectMultipleCalls
tentacles are everywhere. Before this, at least on the batching side, the layers above txnKVFetcher
didn't need to know about RequestUnion
at all. It seems a shame to make the abstraction messier just for this small optimization.
Here's an idea: what if the caching of reqsScratch
were pushed down as low as possible? Meaning, what if we kept reqsScratch
in the lowest layers (I guess that would be txnKVFetcher
and TxnKVStreamer
) and used the lifetime of those objects as the lifetime of the cached []roachpb.RequestUnion
. Then in layers above, instead of throwing away those KVBatchFetcher
objects, what if we kept them for the lifetime of the parent Fetcher
/ cFetcher
? That seems like a more natural reflection of what we're doing here.
Reviewed 1 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I've been annoyed by the current unintuitive way that the fetchers work (meaning different structs have different lifecycles), and it seems like this is finally the time to clean this up.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
8a0150c
to
27c09d7
Compare
This commit teaches `txnKVFetcher` and `txnKVStreamer` to reuse the same slice of `RequestUnion` objects between different fetches. It is now extremely easy to do given the recent refactor. We do perform memory accounting for this slice (against a memory account bound to an unlimited memory monitor). Additionally, a similar optimization is applied to how resume requests are populated by the Streamer. Release note: None
27c09d7
to
3a2d49a
Compare
@michae2 @rharding6373 RFAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 20 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
I believe all concerns have been addressed.
Thanks for the reviews and suggestions! bors r+ |
Build failed (retrying...): |
78085: storage: add `MVCCStats` for range keys r=jbowens a=erikgrinaker **storage: export some MVCC key encoding functions** Release note: None **roachpb: add `Key.Prevish()` to find a previous key** This patch adds `Key.Prevish()`, which returns a previous key in lexicographical sort order. This is needed to expand a latch span leftwards to peek at any left-adjacent range keys. It is impossible to find the exact immediate predecessor of a key, because it can have an infinite number of `0xff` bytes at the end, so this returns the nearest previous key right-padded with `0xff` up to the given length. It is still possible for an infinite number of keys to exist between `Key` and `Key.Prevish()`, as keys have unbounded length. Release note: None **storage: add `MVCCStats` for range keys** This patch adds `MVCCStats` tracking for range keys. Four new fields are added to `MVCCStats`: * `RangeKeyCount`: the number of (fragmented) range keys, not counting historical versions. * `RangeKeyBytes`: the logical encoded byte size of all range keys. The latest version contributes the encoded key bounds, and all versions contribute encoded timestamps. Unlike point keys, which for historical reasons use a fixed-size timestamp contribution, this uses the actual variable-length timestamp size. * `RangeValCount`: the number of (fragmented) range key versions. * `RangeValBytes`: the encoded size of all range key values across all versions. The same value can be stored in multiple range keys due to fragmentation, which will be counted separately. Even though all range keys are currently MVCC range tombstones with no value, the `MVCCValueHeader` contribution can be non-zero due to e.g. a local timestamp. `ComputeStatsForRange()` has been extended to calculate the above quantities, and additionally account for range tombstones themselves in `GCBytesAge` along with their effect on point keys. All relevant call sites have been updated to surface range keys for the MVCC iterators passed to `ComputeStatsForRange()`. Most MVCC operations have been updated to correctly account for MVCC range tombstones, e.g. during point key writes and intent resolution. KV APIs are not yet updated, this will be addressed later. Range key stats are also adjusted during range splits and merges, which will split and merge any range keys that straddle the split key. This requires a single range key seek to the left and right of the split key during these operations. Touches #70412. Release note: None 82384: sql: reuse the slice of RequestUnion objects between fetches r=yuzefovich a=yuzefovich This commit teaches `txnKVFetcher` and `txnKVStreamer` to reuse the same slice of `RequestUnion` objects between different fetches. It is now extremely easy to do given the recent refactor. We do perform memory accounting for this slice (against a memory account bound to an unlimited memory monitor). Additionally, a similar optimization is applied to how resume requests are populated by the Streamer. Addresses: #82160. Release note: None Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
Build failed (retrying...): |
Build succeeded: |
This commit teaches
txnKVFetcher
andtxnKVStreamer
to reuse the sameslice of
RequestUnion
objects between different fetches. It is nowextremely easy to do given the recent refactor. We do perform memory
accounting for this slice (against a memory account bound to an
unlimited memory monitor). Additionally, a similar optimization is
applied to how resume requests are populated by the Streamer.
Addresses: #82160.
Release note: None