sql: reuse the slice of RequestUnion objects between fetches #82384

yuzefovich · 2022-06-03T00:51:29Z

This commit teaches txnKVFetcher and txnKVStreamer to reuse the same
slice of RequestUnion objects between different fetches. It is now
extremely easy to do given the recent refactor. We do perform memory
accounting for this slice (against a memory account bound to an
unlimited memory monitor). Additionally, a similar optimization is
applied to how resume requests are populated by the Streamer.

Addresses: #82160.

Release note: None

cockroach-teamcity · 2022-06-03T00:51:37Z

This change is

rharding6373

I feel like I'm still wrapping my head around these changes, but here are a few minor nits and questions.

Reviewed 3 of 7 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)

-- commits line 2 at r2:
What's the performance or memory improvement of this in benchmarking? I'm finding the use of reqsScratch for reuse and its memory management throughout hard to follow, and wondering what the realized benefits are. Is it worth the extra complexity?

pkg/kv/kvclient/kvstreamer/streamer.go line 1416 at r2 (raw file):

	// pool.
	if numIncompleteRequests > 0 {
		// Make to sure to nil out old requests that we didn't include into the

s/Make to sure/Make sure

pkg/roachpb/api.go line 1806 at r2 (raw file):

// RequestUnionSize is the size of the RequestUnion object.
const RequestUnionSize = int64(unsafe.Sizeof(RequestUnion{}))

Why did this get moved into roachpb?

pkg/sql/row/kv_batch_streamer.go line 77 at r2 (raw file):

	}
	keyLocking := getKeyLockingStrength(lockStrength)
	reqs := spansToRequests(spans, false /* reverse */, keyLocking, reqsScratch)

One of the code complexities I'm struggling with is the subtle difference between the two usages of spansToRequests and specifically how reqsScratch is handled. No suggestion/ask here at the moment, just a thought.

pkg/sql/row/kv_batch_streamer.go line 83 at r2 (raw file):

	// Make sure to nil out requests in order to lose references to the
	// underlying Get and Scan requests which could keep large byte slices
	// alive.

Maybe add to the comment that reqs is now going to be used as the (potentially new) scratch space.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

-- commits line 2 at r2:

Previously, rharding6373 (Rachael Harding) wrote…

What's the performance or memory improvement of this in benchmarking? I'm finding the use of reqsScratch for reuse and its memory management throughout hard to follow, and wondering what the realized benefits are. Is it worth the extra complexity?

I don't really have qualitative proof to show that it's worth it (because this won't show up in microbenchmarks since this reuse of reqs only matters when multiple Streamer.Enqueue calls are performed, and in our microbenchmarks there is exactly one call). (Side note: I guess I could modify the microbenchmarks so that they do require multiple Enqueue calls.)

I tried to do some quick runs of TPCH queries on a single node on my laptop and didn't really see any performance difference (which is expected). What prompted me to work on this change is that these []roachpb.RequestUnion slices are one of the largest allocations space-wise when running different TPCH queries. I took some heap profiles of running TPCH Q4 five times before:

    623       9.34MB     9.34MB           	reqs := make([]roachpb.RequestUnion, len(spans))

and after

    648            .          .           	if cap(reqsScratch) >= len(spans) { 
    649            .          .           		reqs = reqsScratch[:len(spans)] 
    650            .          .           	} else { 
    651       5.39MB     5.39MB           		reqs = make([]roachpb.RequestUnion, len(spans)) 
    652            .          .           	}

this change, but not sure how representative this is.

I adjusted some benchmarks as well as batch sizes, and here is what I got:

name                               old time/op    new time/op    delta
LookupJoinNoOrdering/Cockroach-24     147ms ± 4%     147ms ± 4%    ~     (p=1.000 n=10+10)
LookupJoinOrdering/Cockroach-24       146ms ± 2%     142ms ± 4%  -2.85%  (p=0.000 n=9+10)

name                               old alloc/op   new alloc/op   delta
LookupJoinNoOrdering/Cockroach-24    26.6MB ± 5%    27.2MB ± 0%    ~     (p=0.601 n=10+7)
LookupJoinOrdering/Cockroach-24      28.4MB ± 5%    27.5MB ± 4%  -3.21%  (p=0.003 n=10+10)

name                               old allocs/op  new allocs/op  delta
LookupJoinNoOrdering/Cockroach-24      178k ± 3%      182k ± 0%    ~     (p=0.052 n=10+7)
LookupJoinOrdering/Cockroach-24        200k ± 0%      195k ± 3%  -2.93%  (p=0.006 n=8+10)

This doesn't look like much, so I'm ok with putting this PR on hold, until I have more concrete evidence that this change would move the needle on the stability front. I'm also ok to keeping the change only to the streamer code path and leaving a TODO for the non-streamer one.

pkg/kv/kvclient/kvstreamer/streamer.go line 1416 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

s/Make to sure/Make sure

Done.

pkg/roachpb/api.go line 1806 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Why did this get moved into roachpb?

roachpb is where RequestUnion is defined, and we want to access RequestUnionSize from several different packages (kvstreamer, colfetcher, row, rowexec), so it seemed like the best option.

pkg/sql/row/kv_batch_streamer.go line 77 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

One of the code complexities I'm struggling with is the subtle difference between the two usages of spansToRequests and specifically how reqsScratch is handled. No suggestion/ask here at the moment, just a thought.

You're right that the reuse of reqs is done differently in two code paths. My main goal was optimizing the streamer code path, so there the same slice is reused throughout the whole lookup or index join operation, as a result, the slice is plumbed all the way up to the join reader and index joiner, and those operators are responsible for memory accounting for the slice.

I decided to add the reuse to the non-streamer code path too, but the reuse is done only in the scope of a single []roachpb.Span slice (in order words, the requests are reused only for ResumeSpans if the initial response was partial). If we execute lookup / index joins via the old non-streamer path, then for every lookup batch we are still allocating a new slice for requests (which might be reused if any of the lookups come back with resume spans). Making the reuse "full" would also pushing out at least the reference to the request slice to the users of txnKVFetcher.

pkg/sql/row/kv_batch_streamer.go line 83 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Maybe add to the comment that reqs is now going to be used as the (potentially new) scratch space.

Done.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

pkg/sql/row/kv_batch_streamer.go line 77 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

You're right that the reuse of reqs is done differently in two code paths. My main goal was optimizing the streamer code path, so there the same slice is reused throughout the whole lookup or index join operation, as a result, the slice is plumbed all the way up to the join reader and index joiner, and those operators are responsible for memory accounting for the slice.

I decided to add the reuse to the non-streamer code path too, but the reuse is done only in the scope of a single []roachpb.Span slice (in order words, the requests are reused only for ResumeSpans if the initial response was partial). If we execute lookup / index joins via the old non-streamer path, then for every lookup batch we are still allocating a new slice for requests (which might be reused if any of the lookups come back with resume spans). Making the reuse "full" would also pushing out at least the reference to the request slice to the users of txnKVFetcher.

I think I'll finish the non-streamer code path too since it would benefit things other than lookup and index joins.

yuzefovich

I implemented the reuse "fully" in the non-streamer code path and unified how the scratch space is held on to and accounted for, I think it should be cleaner now. PTAL.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

michae2

This makes sense, but I wish we didn't have to cross so many layers to make this change. It seems like reqsScratch and expectMultipleCalls tentacles are everywhere. Before this, at least on the batching side, the layers above txnKVFetcher didn't need to know about RequestUnion at all. It seems a shame to make the abstraction messier just for this small optimization.

Here's an idea: what if the caching of reqsScratch were pushed down as low as possible? Meaning, what if we kept reqsScratch in the lowest layers (I guess that would be txnKVFetcher and TxnKVStreamer) and used the lifetime of those objects as the lifetime of the cached []roachpb.RequestUnion. Then in layers above, instead of throwing away those KVBatchFetcher objects, what if we kept them for the lifetime of the parent Fetcher / cFetcher? That seems like a more natural reflection of what we're doing here.

Reviewed 1 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

yuzefovich

That's a good point. I've been annoyed by the current unintuitive way that the fetchers work (meaning different structs have different lifecycles), and it seems like this is finally the time to clean this up.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

yuzefovich · 2022-06-17T02:24:50Z

I opened up #83010 to perform the refactor, and now reusing the slice becomes extremely simple. Putting this PR into a draft mode for now until #83010 merges.

This commit teaches `txnKVFetcher` and `txnKVStreamer` to reuse the same slice of `RequestUnion` objects between different fetches. It is now extremely easy to do given the recent refactor. We do perform memory accounting for this slice (against a memory account bound to an unlimited memory monitor). Additionally, a similar optimization is applied to how resume requests are populated by the Streamer. Release note: None

yuzefovich · 2022-06-24T18:08:34Z

@michae2 @rharding6373 RFAL

michae2

Much simpler! Thank you!

Reviewed 8 of 20 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

I believe all concerns have been addressed.

yuzefovich · 2022-06-27T23:14:01Z

Thanks for the reviews and suggestions!

bors r+

craig · 2022-06-27T23:50:22Z

Build failed (retrying...):

GitHub CI (Cockroach)

78085: storage: add `MVCCStats` for range keys r=jbowens a=erikgrinaker **storage: export some MVCC key encoding functions** Release note: None **roachpb: add `Key.Prevish()` to find a previous key** This patch adds `Key.Prevish()`, which returns a previous key in lexicographical sort order. This is needed to expand a latch span leftwards to peek at any left-adjacent range keys. It is impossible to find the exact immediate predecessor of a key, because it can have an infinite number of `0xff` bytes at the end, so this returns the nearest previous key right-padded with `0xff` up to the given length. It is still possible for an infinite number of keys to exist between `Key` and `Key.Prevish()`, as keys have unbounded length. Release note: None **storage: add `MVCCStats` for range keys** This patch adds `MVCCStats` tracking for range keys. Four new fields are added to `MVCCStats`: * `RangeKeyCount`: the number of (fragmented) range keys, not counting historical versions. * `RangeKeyBytes`: the logical encoded byte size of all range keys. The latest version contributes the encoded key bounds, and all versions contribute encoded timestamps. Unlike point keys, which for historical reasons use a fixed-size timestamp contribution, this uses the actual variable-length timestamp size. * `RangeValCount`: the number of (fragmented) range key versions. * `RangeValBytes`: the encoded size of all range key values across all versions. The same value can be stored in multiple range keys due to fragmentation, which will be counted separately. Even though all range keys are currently MVCC range tombstones with no value, the `MVCCValueHeader` contribution can be non-zero due to e.g. a local timestamp. `ComputeStatsForRange()` has been extended to calculate the above quantities, and additionally account for range tombstones themselves in `GCBytesAge` along with their effect on point keys. All relevant call sites have been updated to surface range keys for the MVCC iterators passed to `ComputeStatsForRange()`. Most MVCC operations have been updated to correctly account for MVCC range tombstones, e.g. during point key writes and intent resolution. KV APIs are not yet updated, this will be addressed later. Range key stats are also adjusted during range splits and merges, which will split and merge any range keys that straddle the split key. This requires a single range key seek to the left and right of the split key during these operations. Touches #70412. Release note: None 82384: sql: reuse the slice of RequestUnion objects between fetches r=yuzefovich a=yuzefovich This commit teaches `txnKVFetcher` and `txnKVStreamer` to reuse the same slice of `RequestUnion` objects between different fetches. It is now extremely easy to do given the recent refactor. We do perform memory accounting for this slice (against a memory account bound to an unlimited memory monitor). Additionally, a similar optimization is applied to how resume requests are populated by the Streamer. Addresses: #82160. Release note: None Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

craig · 2022-06-28T03:58:39Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-06-28T07:35:23Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich requested review from rharding6373, michae2 and a team June 3, 2022 00:51

yuzefovich requested a review from a team as a code owner June 3, 2022 00:51

yuzefovich force-pushed the streamer-reuse-reqs branch from 67d3c3a to eb8920c Compare June 7, 2022 00:15

yuzefovich mentioned this pull request Jun 9, 2022

kvstreamer: remove temporary allocations for []Result #82387

Merged

rharding6373 previously requested changes Jun 9, 2022

View reviewed changes

yuzefovich force-pushed the streamer-reuse-reqs branch from eb8920c to 0158631 Compare June 9, 2022 22:51

yuzefovich commented Jun 9, 2022

View reviewed changes

yuzefovich force-pushed the streamer-reuse-reqs branch from 0158631 to 8a0150c Compare June 10, 2022 17:17

yuzefovich commented Jun 10, 2022

View reviewed changes

michae2 reviewed Jun 13, 2022

View reviewed changes

yuzefovich commented Jun 16, 2022

View reviewed changes

yuzefovich mentioned this pull request Jun 17, 2022

sql: cleanup fetchers #83010

Merged

yuzefovich marked this pull request as draft June 17, 2022 02:24

yuzefovich force-pushed the streamer-reuse-reqs branch from 8a0150c to 27c09d7 Compare June 17, 2022 02:25

yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jun 17, 2022

This was referenced Jun 23, 2022

kvstreamer: improve the performance #82159

Closed

kvstreamer: improve the stability #82160

Closed

yuzefovich force-pushed the streamer-reuse-reqs branch from 27c09d7 to 3a2d49a Compare June 24, 2022 18:07

yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jun 24, 2022

yuzefovich marked this pull request as ready for review June 24, 2022 18:08

michae2 approved these changes Jun 27, 2022

View reviewed changes

craig bot merged commit 5dc6228 into cockroachdb:master Jun 28, 2022

yuzefovich deleted the streamer-reuse-reqs branch June 28, 2022 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: reuse the slice of RequestUnion objects between fetches #82384

sql: reuse the slice of RequestUnion objects between fetches #82384

yuzefovich commented Jun 3, 2022 •

edited

Loading

cockroach-teamcity commented Jun 3, 2022

rharding6373 left a comment

yuzefovich left a comment

yuzefovich left a comment

yuzefovich left a comment

michae2 left a comment

yuzefovich left a comment

yuzefovich commented Jun 17, 2022

yuzefovich commented Jun 24, 2022

michae2 left a comment

yuzefovich commented Jun 27, 2022

craig bot commented Jun 27, 2022

craig bot commented Jun 28, 2022

craig bot commented Jun 28, 2022

sql: reuse the slice of RequestUnion objects between fetches #82384

sql: reuse the slice of RequestUnion objects between fetches #82384

Conversation

yuzefovich commented Jun 3, 2022 • edited Loading

cockroach-teamcity commented Jun 3, 2022

rharding6373 left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

michae2 left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented Jun 17, 2022

yuzefovich commented Jun 24, 2022

michae2 left a comment

Choose a reason for hiding this comment

yuzefovich commented Jun 27, 2022

craig bot commented Jun 27, 2022

craig bot commented Jun 28, 2022

craig bot commented Jun 28, 2022

yuzefovich commented Jun 3, 2022 •

edited

Loading