Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: reuse the slice of RequestUnion objects between fetches #82384

Merged
merged 1 commit into from
Jun 28, 2022

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jun 3, 2022

This commit teaches txnKVFetcher and txnKVStreamer to reuse the same
slice of RequestUnion objects between different fetches. It is now
extremely easy to do given the recent refactor. We do perform memory
accounting for this slice (against a memory account bound to an
unlimited memory monitor). Additionally, a similar optimization is
applied to how resume requests are populated by the Streamer.

Addresses: #82160.

Release note: None

@yuzefovich yuzefovich requested review from rharding6373, michae2 and a team June 3, 2022 00:51
@yuzefovich yuzefovich requested a review from a team as a code owner June 3, 2022 00:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm still wrapping my head around these changes, but here are a few minor nits and questions.

Reviewed 3 of 7 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)


-- commits line 2 at r2:
What's the performance or memory improvement of this in benchmarking? I'm finding the use of reqsScratch for reuse and its memory management throughout hard to follow, and wondering what the realized benefits are. Is it worth the extra complexity?


pkg/kv/kvclient/kvstreamer/streamer.go line 1416 at r2 (raw file):

	// pool.
	if numIncompleteRequests > 0 {
		// Make to sure to nil out old requests that we didn't include into the

s/Make to sure/Make sure


pkg/roachpb/api.go line 1806 at r2 (raw file):

// RequestUnionSize is the size of the RequestUnion object.
const RequestUnionSize = int64(unsafe.Sizeof(RequestUnion{}))

Why did this get moved into roachpb?


pkg/sql/row/kv_batch_streamer.go line 77 at r2 (raw file):

	}
	keyLocking := getKeyLockingStrength(lockStrength)
	reqs := spansToRequests(spans, false /* reverse */, keyLocking, reqsScratch)

One of the code complexities I'm struggling with is the subtle difference between the two usages of spansToRequests and specifically how reqsScratch is handled. No suggestion/ask here at the moment, just a thought.


pkg/sql/row/kv_batch_streamer.go line 83 at r2 (raw file):

	// Make sure to nil out requests in order to lose references to the
	// underlying Get and Scan requests which could keep large byte slices
	// alive.

Maybe add to the comment that reqs is now going to be used as the (potentially new) scratch space.

@yuzefovich yuzefovich force-pushed the streamer-reuse-reqs branch from eb8920c to 0158631 Compare June 9, 2022 22:51
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)


-- commits line 2 at r2:

Previously, rharding6373 (Rachael Harding) wrote…

What's the performance or memory improvement of this in benchmarking? I'm finding the use of reqsScratch for reuse and its memory management throughout hard to follow, and wondering what the realized benefits are. Is it worth the extra complexity?

I don't really have qualitative proof to show that it's worth it (because this won't show up in microbenchmarks since this reuse of reqs only matters when multiple Streamer.Enqueue calls are performed, and in our microbenchmarks there is exactly one call). (Side note: I guess I could modify the microbenchmarks so that they do require multiple Enqueue calls.)

I tried to do some quick runs of TPCH queries on a single node on my laptop and didn't really see any performance difference (which is expected). What prompted me to work on this change is that these []roachpb.RequestUnion slices are one of the largest allocations space-wise when running different TPCH queries. I took some heap profiles of running TPCH Q4 five times before:

    623       9.34MB     9.34MB           	reqs := make([]roachpb.RequestUnion, len(spans)) 

and after

    648            .          .           	if cap(reqsScratch) >= len(spans) { 
    649            .          .           		reqs = reqsScratch[:len(spans)] 
    650            .          .           	} else { 
    651       5.39MB     5.39MB           		reqs = make([]roachpb.RequestUnion, len(spans)) 
    652            .          .           	} 

this change, but not sure how representative this is.


I adjusted some benchmarks as well as batch sizes, and here is what I got:

name                               old time/op    new time/op    delta
LookupJoinNoOrdering/Cockroach-24     147ms ± 4%     147ms ± 4%    ~     (p=1.000 n=10+10)
LookupJoinOrdering/Cockroach-24       146ms ± 2%     142ms ± 4%  -2.85%  (p=0.000 n=9+10)

name                               old alloc/op   new alloc/op   delta
LookupJoinNoOrdering/Cockroach-24    26.6MB ± 5%    27.2MB ± 0%    ~     (p=0.601 n=10+7)
LookupJoinOrdering/Cockroach-24      28.4MB ± 5%    27.5MB ± 4%  -3.21%  (p=0.003 n=10+10)

name                               old allocs/op  new allocs/op  delta
LookupJoinNoOrdering/Cockroach-24      178k ± 3%      182k ± 0%    ~     (p=0.052 n=10+7)
LookupJoinOrdering/Cockroach-24        200k ± 0%      195k ± 3%  -2.93%  (p=0.006 n=8+10)

This doesn't look like much, so I'm ok with putting this PR on hold, until I have more concrete evidence that this change would move the needle on the stability front. I'm also ok to keeping the change only to the streamer code path and leaving a TODO for the non-streamer one.


pkg/kv/kvclient/kvstreamer/streamer.go line 1416 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

s/Make to sure/Make sure

Done.


pkg/roachpb/api.go line 1806 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Why did this get moved into roachpb?

roachpb is where RequestUnion is defined, and we want to access RequestUnionSize from several different packages (kvstreamer, colfetcher, row, rowexec), so it seemed like the best option.


pkg/sql/row/kv_batch_streamer.go line 77 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

One of the code complexities I'm struggling with is the subtle difference between the two usages of spansToRequests and specifically how reqsScratch is handled. No suggestion/ask here at the moment, just a thought.

You're right that the reuse of reqs is done differently in two code paths. My main goal was optimizing the streamer code path, so there the same slice is reused throughout the whole lookup or index join operation, as a result, the slice is plumbed all the way up to the join reader and index joiner, and those operators are responsible for memory accounting for the slice.

I decided to add the reuse to the non-streamer code path too, but the reuse is done only in the scope of a single []roachpb.Span slice (in order words, the requests are reused only for ResumeSpans if the initial response was partial). If we execute lookup / index joins via the old non-streamer path, then for every lookup batch we are still allocating a new slice for requests (which might be reused if any of the lookups come back with resume spans). Making the reuse "full" would also pushing out at least the reference to the request slice to the users of txnKVFetcher.


pkg/sql/row/kv_batch_streamer.go line 83 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Maybe add to the comment that reqs is now going to be used as the (potentially new) scratch space.

Done.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)


pkg/sql/row/kv_batch_streamer.go line 77 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

You're right that the reuse of reqs is done differently in two code paths. My main goal was optimizing the streamer code path, so there the same slice is reused throughout the whole lookup or index join operation, as a result, the slice is plumbed all the way up to the join reader and index joiner, and those operators are responsible for memory accounting for the slice.

I decided to add the reuse to the non-streamer code path too, but the reuse is done only in the scope of a single []roachpb.Span slice (in order words, the requests are reused only for ResumeSpans if the initial response was partial). If we execute lookup / index joins via the old non-streamer path, then for every lookup batch we are still allocating a new slice for requests (which might be reused if any of the lookups come back with resume spans). Making the reuse "full" would also pushing out at least the reference to the request slice to the users of txnKVFetcher.

I think I'll finish the non-streamer code path too since it would benefit things other than lookup and index joins.

@yuzefovich yuzefovich force-pushed the streamer-reuse-reqs branch from 0158631 to 8a0150c Compare June 10, 2022 17:17
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the reuse "fully" in the non-streamer code path and unified how the scratch space is held on to and accounted for, I think it should be cleaner now. PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but I wish we didn't have to cross so many layers to make this change. It seems like reqsScratch and expectMultipleCalls tentacles are everywhere. Before this, at least on the batching side, the layers above txnKVFetcher didn't need to know about RequestUnion at all. It seems a shame to make the abstraction messier just for this small optimization.

Here's an idea: what if the caching of reqsScratch were pushed down as low as possible? Meaning, what if we kept reqsScratch in the lowest layers (I guess that would be txnKVFetcher and TxnKVStreamer) and used the lifetime of those objects as the lifetime of the cached []roachpb.RequestUnion. Then in layers above, instead of throwing away those KVBatchFetcher objects, what if we kept them for the lifetime of the parent Fetcher / cFetcher? That seems like a more natural reflection of what we're doing here.

Reviewed 1 of 7 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I've been annoyed by the current unintuitive way that the fetchers work (meaning different structs have different lifecycles), and it seems like this is finally the time to clean this up.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

@yuzefovich
Copy link
Member Author

I opened up #83010 to perform the refactor, and now reusing the slice becomes extremely simple. Putting this PR into a draft mode for now until #83010 merges.

@yuzefovich yuzefovich marked this pull request as draft June 17, 2022 02:24
@yuzefovich yuzefovich force-pushed the streamer-reuse-reqs branch from 8a0150c to 27c09d7 Compare June 17, 2022 02:25
@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jun 17, 2022
This commit teaches `txnKVFetcher` and `txnKVStreamer` to reuse the same
slice of `RequestUnion` objects between different fetches. It is now
extremely easy to do given the recent refactor. We do perform memory
accounting for this slice (against a memory account bound to an
unlimited memory monitor). Additionally, a similar optimization is
applied to how resume requests are populated by the Streamer.

Release note: None
@yuzefovich yuzefovich force-pushed the streamer-reuse-reqs branch from 27c09d7 to 3a2d49a Compare June 24, 2022 18:07
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jun 24, 2022
@yuzefovich yuzefovich marked this pull request as ready for review June 24, 2022 18:08
@yuzefovich
Copy link
Member Author

@michae2 @rharding6373 RFAL

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Much simpler! Thank you!

Reviewed 8 of 20 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)

@yuzefovich yuzefovich dismissed rharding6373’s stale review June 27, 2022 23:13

I believe all concerns have been addressed.

@yuzefovich
Copy link
Member Author

Thanks for the reviews and suggestions!

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 27, 2022

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Jun 28, 2022
78085: storage: add `MVCCStats` for range keys r=jbowens a=erikgrinaker

**storage: export some MVCC key encoding functions**

Release note: None

**roachpb: add `Key.Prevish()` to find a previous key**

This patch adds `Key.Prevish()`, which returns a previous key in
lexicographical sort order. This is needed to expand a latch span
leftwards to peek at any left-adjacent range keys.

It is impossible to find the exact immediate predecessor of a key,
because it can have an infinite number of `0xff` bytes at the end, so
this returns the nearest previous key right-padded with `0xff` up to the
given length. It is still possible for an infinite number of keys to
exist between `Key` and `Key.Prevish()`, as keys have unbounded length.

Release note: None

**storage: add `MVCCStats` for range keys**

This patch adds `MVCCStats` tracking for range keys. Four new fields are
added to `MVCCStats`:

* `RangeKeyCount`: the number of (fragmented) range keys, not counting
  historical versions.

* `RangeKeyBytes`: the logical encoded byte size of all range keys.
  The latest version contributes the encoded key bounds, and all
  versions contribute encoded timestamps. Unlike point keys, which for
  historical reasons use a fixed-size timestamp contribution, this uses
  the actual variable-length timestamp size.

* `RangeValCount`: the number of (fragmented) range key versions.

* `RangeValBytes`: the encoded size of all range key values across
  all versions. The same value can be stored in multiple range keys
  due to fragmentation, which will be counted separately. Even though
  all range keys are currently MVCC range tombstones with no value, the
  `MVCCValueHeader` contribution can be non-zero due to e.g. a local
  timestamp.

`ComputeStatsForRange()` has been extended to calculate the above
quantities, and additionally account for range tombstones themselves in
`GCBytesAge` along with their effect on point keys. All relevant call
sites have been updated to surface range keys for the MVCC iterators
passed to `ComputeStatsForRange()`.

Most MVCC operations have been updated to correctly account for MVCC
range tombstones, e.g. during point key writes and intent resolution. KV
APIs are not yet updated, this will be addressed later.

Range key stats are also adjusted during range splits and merges, which
will split and merge any range keys that straddle the split key. This
requires a single range key seek to the left and right of the split key
during these operations.

Touches #70412.

Release note: None

82384: sql: reuse the slice of RequestUnion objects between fetches r=yuzefovich a=yuzefovich

This commit teaches `txnKVFetcher` and `txnKVStreamer` to reuse the same
slice of `RequestUnion` objects between different fetches. It is now
extremely easy to do given the recent refactor. We do perform memory
accounting for this slice (against a memory account bound to an
unlimited memory monitor). Additionally, a similar optimization is
applied to how resume requests are populated by the Streamer.

Addresses: #82160.

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jun 28, 2022

Build succeeded:

@craig craig bot merged commit 5dc6228 into cockroachdb:master Jun 28, 2022
@yuzefovich yuzefovich deleted the streamer-reuse-reqs branch June 28, 2022 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants