Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: issue GetRequests from SQL when possible #61583

Merged
merged 3 commits into from
Apr 14, 2021

Conversation

jordanlewis
Copy link
Member

Closes #46758.

Previously, SQL unconditionally emitted ScanRequests for every read that
it produced, even if we were completely sure that the scan could contain
only a single key that was fully known at request time.

This was less efficient than necessary, because ScanRequests cause the
KV layer to do more work than GetRequests.

Here are the reasons why GetRequests are more efficient:

  • MVCCGet, unlike MVCCScan, is able to use a prefix iterator, which
    allows it to make use of RocksDB/Pebble bloom filters. There may also
    be other wins at the storage layer, like avoiding prefetching
  • various data structures have optimizations and fast-paths for
    single-key operations (e.g. timestamp cache, latch manager, refresh
    span tracking)
  • various code paths are simpler in the point operation case
  • a point operation implies only needing a single key allocation instead
    of a start and end key allocation
  • a point operation implies only needing to return up to a single
    result, which means that we can avoid some indirection in various
    places
  • A scan over a span that has at most 1 key, but does not know it, needs
    to iterate over all the versions (or eventually do a second expensive
    seek), looking for the non-existent next key in the span. A Get can do
    a single seek, read the key, and be done.

Release note (performance improvement): SQL will now emit GetRequests
when possible to KV instead of always emitting ScanRequests. This
manifests as a modest performance improvement for some workloads.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis
Copy link
Member Author

So far, this looks to make almost no difference, but I haven't spent time doing a careful benchmark, and I don't know the best/worst cases to test against.

I ran KV95 on a single gceworker node for 60 seconds on empty data directories before and after the patch.

After the patch:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0        2039148        33985.5      0.9      0.7      1.8      5.0     35.7  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         106947         1782.4     10.2     10.0     17.8     21.0     41.9  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   60.0s        0        2146095        35767.9      1.3      0.7      6.3     13.6     41.9

Before the patch:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0        2024099        33734.7      0.9      0.7      2.0      5.0     35.7  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         106675         1777.9      9.9      9.4     16.8     21.0     39.8  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   60.0s        0        2130774        35512.6      1.3      0.7      6.0     13.1     39.8

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty cool stuff! It's the first time I looked into this code, but I still have some comments :)

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, and @solongordon)


pkg/kv/kvclient/kvcoord/dist_sender.go, line 617 at r1 (raw file):

			case *roachpb.QueryIntentRequest, *roachpb.EndTxnRequest, *roachpb.GetRequest:
			// Accepted point requests that can be in batches with limit.

super nit: it'd be nice to keep indentation the same here and a few lines above.


pkg/sql/row/kv_batch_fetcher.go, line 266 at r1 (raw file):

		// spans.
		for i := 1; i < len(spans); i++ {
			prevKey := spans[i-1].EndKey

nit: I'd do s/prevKey/prevEndKey/g and s/curKey/curEndKey/g.


pkg/sql/row/kv_batch_fetcher.go, line 272 at r1 (raw file):

			curKey := spans[i].EndKey
			if curKey == nil {
				curKey = spans[i-1].Key

I think this should be spans[i].Key, no?

@jordanlewis jordanlewis force-pushed the gets-in-fetcher branch 3 times, most recently from fd32a0c to b183da7 Compare March 13, 2021 00:27
Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @solongordon, and @yuzefovich)


pkg/kv/kvclient/kvcoord/dist_sender.go, line 617 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: it'd be nice to keep indentation the same here and a few lines above.

Done.


pkg/sql/row/kv_batch_fetcher.go, line 272 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think this should be spans[i].Key, no?

Done. Thanks for noticing, this was broken.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 35 of 36 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, @solongordon, and @yuzefovich)


pkg/sql/distsql_physical_planner.go, line 883 at r2 (raw file):

		span := spans[i]
		if len(span.EndKey) == 0 {
			span = roachpb.Span{

Can you add a comment why we need to do this?


pkg/sql/span_builder_test.go, line 67 at r2 (raw file):

			prefixLen:         2,
			numNeededFamilies: 1,
			canSplit:          true,

Why is there a change here? Do we now split in more cases than before?


pkg/sql/physicalplan/fake_span_resolver.go, line 82 at r2 (raw file):

	}

	/*

Leftover?

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @solongordon, and @yuzefovich)


pkg/sql/distsql_physical_planner.go, line 883 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Can you add a comment why we need to do this?

Done.


pkg/sql/span_builder_test.go, line 67 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Why is there a change here? Do we now split in more cases than before?

Yes, the idea here is that we now "split" in all cases where we can produce a single-key lookup. This example has just a single column family to look up, so we "split" the span into a single GetRequest for that column family.


pkg/sql/physicalplan/fake_span_resolver.go, line 82 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Leftover?

Done.

@jordanlewis jordanlewis force-pushed the gets-in-fetcher branch 2 times, most recently from 28232e8 to d33e395 Compare March 26, 2021 23:53
@jordanlewis jordanlewis changed the title [DNM] sql: issue GetRequests from SQL when possible sql: issue GetRequests from SQL when possible Mar 27, 2021
@jordanlewis jordanlewis marked this pull request as ready for review March 27, 2021 00:27
@jordanlewis jordanlewis requested a review from a team as a code owner March 27, 2021 00:27
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 11 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, @solongordon, and @yuzefovich)


pkg/sql/colfetcher/colbatch_scan.go, line 278 at r3 (raw file):

	spans := s.spans[:0]
	specSpans := spec.Spans
	// We need to check that, if this scan is for a single row, and this scan

This seems like a leftover comment?


pkg/sql/opt/exec/execbuilder/testdata/select, line 1771 at r3 (raw file):

query T
SELECT message FROM [SHOW TRACE FOR SESSION]
WHERE message LIKE 'querying next range% at /Table/74/1%' OR

nit: is the first % needed?

@jordanlewis jordanlewis force-pushed the gets-in-fetcher branch 2 times, most recently from 6b7ad1b to 78d76b6 Compare March 27, 2021 03:11
Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @solongordon, and @yuzefovich)


pkg/sql/colfetcher/colbatch_scan.go, line 278 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This seems like a leftover comment?

Done.


pkg/sql/opt/exec/execbuilder/testdata/select, line 1771 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: is the first % needed?

No, fixed

@jordanlewis jordanlewis requested a review from tbg March 27, 2021 03:12
@jordanlewis
Copy link
Member Author

@tbg or @nvanbenschoten, PTAL at the first commit, which teaches cmd_get to leave early if it runs out of bytes or key quota. It's entirely possible that this is undesired behavior, I have no idea :) The reason that I added it is that, now that SQL can emit mixed scan and get requests, it will receive out of order results unless something like this change is added.

Consider a 4 column family table and a query that skips an inner column family.

CREATE TABLE a (a INT PRIMARY KEY, b INT, c INT, d INT, FAMILY(a), FAMILY(b), FAMILY(c), FAMILY(d)
SELECT a, b, d FROM a WHERE a = 10

This PR handles this request by creating a ScanRequest for a,b, and a GetRequest for d. If the ScanRequest runs out of quota during a,b, and returns a ResumeSpan for just b, but the GetRequest ignores the key/bytes limit and returns a result unconditionally, then SQL ends up seeing the keys in order a, d, b, which is invalid and causes various things to break.

I suppose we could fix this in another way, like teaching the distsender or fetcher to be smarter somehow, but that seems inelegant - to me it feels natural that GetRequest would obey the bytes/keys limits the same way that ScanRequest does.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! Thanks for pushing this change along and apologies for the delayed review.

Reviewed 4 of 36 files at r2, 2 of 11 files at r3, 38 of 38 files at r4, 36 of 36 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @solongordon, @tbg, and @yuzefovich)


pkg/kv/kvserver/replica_evaluate_test.go, line 130 at r4 (raw file):

			},
		}, {
			// A batch limited to return only one key. Throw in a Get which will come

Nice! This looks like a good change. Can we add some cases for when the Get request comes first and exceeds the byte/key limit?


pkg/kv/kvserver/batcheval/cmd_get.go, line 38 at r4 (raw file):

	var intent *roachpb.Intent
	var err error
	if h.MaxSpanRequestKeys < 0 || h.TargetBytes < 0 {

Let's leave a comment here about how the batch must have already exhausted its key/byte limit and how this mirrors the behavior of MVCCScan with a negative MaxKeys or TargetBytes.


pkg/kv/kvserver/batcheval/cmd_get.go, line 38 at r4 (raw file):

	var intent *roachpb.Intent
	var err error
	if h.MaxSpanRequestKeys < 0 || h.TargetBytes < 0 {

I think it would be cleaner to structure this as an early return, since nothing that follows is needed in this case.


pkg/kv/kvserver/batcheval/cmd_get_test.go, line 28 at r4 (raw file):

	key := roachpb.Key([]byte{'a'})
	_, err := Get(ctx, nil, CommandArgs{
		Header: roachpb.Header{TargetBytes: -1},

Can we split this into three cases, one with a negative TargetBytes, one with a negative MaxSpanRequestKeys, and one with neither set?


pkg/sql/row/kv_batch_fetcher.go, line 361 at r5 (raw file):

		}
	} else {
		scans := make([]struct {

It's unfortunate that this allocation is now wasted in some cases and also that it can't be used for the Get case. I wonder what we can do for this. Would it be crazy to loop through the spans ahead of time (outside of this reverse block), count the point operations, allocate an appropriately sized gets slice, and then do some math to accurately compute the size of these allocations?


pkg/sql/row/kv_batch_fetcher.go, line 517 at r5 (raw file):

		case *roachpb.GetResponse:
			// TODO(jordan): do we have to worry about the IntentValue field on this
			// response?

No, we shouldn't need to. The value is only included for lower consistency-level reads.

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nathan, it's RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @solongordon, @tbg, and @yuzefovich)


pkg/sql/row/kv_batch_fetcher.go, line 361 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It's unfortunate that this allocation is now wasted in some cases and also that it can't be used for the Get case. I wonder what we can do for this. Would it be crazy to loop through the spans ahead of time (outside of this reverse block), count the point operations, allocate an appropriately sized gets slice, and then do some math to accurately compute the size of these allocations?

Done.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, @solongordon, and @tbg)


pkg/ccl/logictestccl/testdata/logic_test/multi_region, line 1477 at r10 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Fixed. I added this change as a separate commit.

Thanks, looks great!


pkg/sql/rowenc/index_encoding.go, line 387 at r13 (raw file):

// input span is for a table that has just a single column family, so that the
// caller can have a precise key to send via a GetRequest if desired.
//

[nit] worth mentioning that the returned Spans that represent a single key would not have EndKey=nil.


pkg/sql/rowenc/index_encoding.go, line 389 at r13 (raw file):

//
// The function accepts a slice of spans to append to.
func SplitSpanIntoFamilySpans(

[nit] It would be a bit more precise if we passed just the Key here instead of a Span.

Previously, bytes/key limits were ignored in GetRequest. Now, they are
respected.

Release note: None
Previously, SQL unconditionally emitted ScanRequests for every read that
it produced, even if we were completely sure that the scan could contain
only a single key that was fully known at request time.

This was less efficient than necessary, because ScanRequests cause the
KV layer to do more work than GetRequests.

Here are the reasons why GetRequests are more efficient:

- MVCCGet, unlike MVCCScan, is able to use a prefix iterator, which
  allows it to make use of RocksDB/Pebble bloom filters. There may also
  be other wins at the storage layer, like avoiding prefetching
- various data structures have optimizations and fast-paths for
  single-key operations (e.g. timestamp cache, latch manager, refresh
  span tracking)
- various code paths are simpler in the point operation case
- a point operation implies only needing a single key allocation instead
  of a start and end key allocation
- a point operation implies only needing to return up to a single
  result, which means that we can avoid some indirection in various
  places
- A scan over a span that has at most 1 key, but does not know it, needs
  to iterate over all the versions (or eventually do a second expensive
  seek), looking for the non-existent next key in the span. A Get can do
  a single seek, read the key, and be done.

Release note (performance improvement): SQL will now emit GetRequests
when possible to KV instead of always emitting ScanRequests. This
manifests as a modest performance improvement for some workloads.
Previously, a single-key span (a roachpb.Span with only a StartKey and
without an EndKey) was displayed like a normal span, without anything
after a dash. For example, a single-key span starting at
`/Table/11/1/foo` would display as `/Table/11/1/foo-`.

Now, a single-key span is shown without a dash.

Release note (sql change): single-key spans in explain and
explain(distsql) are now shown without a misleading dash after them.
Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, @RaduBerinde, @solongordon, and @tbg)


pkg/sql/rowenc/index_encoding.go, line 387 at r13 (raw file):

Previously, RaduBerinde wrote…

[nit] worth mentioning that the returned Spans that represent a single key would not have EndKey=nil.

Actually, returned spans that represent a single key will have nil EndKeys! I've clarified in the comment.


pkg/sql/rowenc/index_encoding.go, line 389 at r13 (raw file):

Previously, RaduBerinde wrote…

[nit] It would be a bit more precise if we passed just the Key here instead of a Span.

Done.

@jordanlewis
Copy link
Member Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 14, 2021

Build succeeded:

@craig craig bot merged commit 2e60515 into cockroachdb:master Apr 14, 2021
@jordanlewis jordanlewis deleted the gets-in-fetcher branch April 14, 2021 16:29
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained


pkg/sql/rowenc/index_encoding.go, line 387 at r13 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Actually, returned spans that represent a single key will have nil EndKeys! I've clarified in the comment.

Yeah, I don't know what that not was doing in there, I gotta stop editing my comments haha

jordanlewis added a commit to jordanlewis/cockroach that referenced this pull request May 1, 2021
After cockroachdb#61583, we emit GetRequests from SQL when possible. We forgot to
update the stats collection code that tracks the number of bytes read
from KV when GetRequests were emitted; this is now corrected.

Release note: None
jordanlewis added a commit to jordanlewis/cockroach that referenced this pull request Jun 4, 2021
After cockroachdb#61583, we emit GetRequests from SQL when possible. We forgot to
update the stats collection code that tracks the number of bytes read
from KV when GetRequests were emitted; this is now corrected.

Release note: None
jordanlewis added a commit to jordanlewis/cockroach that referenced this pull request Jun 11, 2021
After cockroachdb#61583, we emit GetRequests from SQL when possible. We forgot to
update the stats collection code that tracks the number of bytes read
from KV when GetRequests were emitted; this is now corrected.

Release note: None
jordanlewis added a commit to jordanlewis/cockroach that referenced this pull request Jul 9, 2021
After cockroachdb#61583, we emit GetRequests from SQL when possible. We forgot to
update the stats collection code that tracks the number of bytes read
from KV when GetRequests were emitted; this is now corrected.

Release note: None
jordanlewis added a commit to jordanlewis/cockroach that referenced this pull request Jul 16, 2021
After cockroachdb#61583, we emit GetRequests from SQL when possible. We forgot to
update the stats collection code that tracks the number of bytes read
from KV when GetRequests were emitted; this is now corrected.

Release note: None
jordanlewis added a commit to jordanlewis/cockroach that referenced this pull request Sep 10, 2021
After cockroachdb#61583, we emit GetRequests from SQL when possible. We forgot to
update the stats collection code that tracks the number of bytes read
from KV when GetRequests were emitted; this is now corrected.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: perform point key-value reads where possible
5 participants