sql: hint scan batch size by expected row count #62282

jordanlewis · 2021-03-19T23:00:00Z

Might close #62198.

Previously, the "dynamic batch size" strategy for the vectorized
engine's batch allocator worked the same in every situation: batches
would start at size 1, then double on each re-allocation, until they hit
their maximum size of 1024.

Now, to improve performance for scans that return a number of rows
somewhere in between 1 and 1024, we pass in the optimizer's best guess
of the number of rows that the scan will produce all the way down into
the TableReader. That guess is used as the initial size of the batch if
it's less than 1024.

Release note (performance improvement): improve the performance for the
vectorized engine when scanning fewer than 1024 rows at a time.

cockroach-teamcity · 2021-03-19T23:00:07Z

This change is

yuzefovich

Very nice find! I just have a few nits as usual :)

Reviewed 8 of 8 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

pkg/sql/colexec/colbuilder/execplan.go, line 741 at r1 (raw file):

			}
			scanOp, err := colfetcher.NewColBatchScan(
				ctx, streamingAllocator, flowCtx, evalCtx, core.TableReader, post, estimatedRowCount)

super nit: it's nicer to put closing parenthesis on the newline.

pkg/sql/colfetcher/cfetcher.go, line 322 at r1 (raw file):

	}
	rf.machine.batch, reallocated = rf.allocator.ResetMaybeReallocate(
		rf.typs, rf.machine.batch, estimatedRowCount, rf.memoryLimit,

nit: can just use rf.estimatedRowCount directly since ResetMaybeReallocate will truncate itself.

pkg/sql/execinfrapb/processors.proto, line 84 at r1 (raw file):

  repeated sql.sem.types.T result_types = 7;

  optional uint64 estimated_row_count = 8 [(gogoproto.nullable) = false];

nit: I think this deserves at least a quick comment.

yuzefovich

Reviewed 1 of 8 files at r2, 13 of 13 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

pkg/sql/colexec/colbuilder/execplan.go, line 684 at r3 (raw file):

	post := &spec.Post

	estimatedRowCount := spec.EstimatedRowCount

nit: it would probably be nice to either move into the TableReader case or leave a comment here that currently this information is set only for the table readers.

pkg/sql/logictest/testdata/logic_test/explain_analyze_plans_nonmetamorphic, line 1 at r3 (raw file):

# LogicTest: !metamorphic local

We probably don't need this file, right?

pkg/sql/tests/vectorized_batch_size_test.go, line 11 at r3 (raw file):

// licenses/APL.txt.

package tests

I'd move it to colfetcher package.

pkg/sql/tests/vectorized_batch_size_test.go, line 28 at r3 (raw file):

)

func TestScanBatchSize(t *testing.T) {

nit: quick comment for the goal of the test would be beneficial.

jordanlewis

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/sql/logictest/testdata/logic_test/explain_analyze_plans_nonmetamorphic, line 1 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We probably don't need this file, right?

Done.

pkg/sql/tests/vectorized_batch_size_test.go, line 11 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I'd move it to colfetcher package.

Done.

erikgrinaker · 2021-03-21T20:15:39Z

I did some ycsb/E/nodes=3 GCE runs to try this: 5 runs at 24e76d3 (without this PR) and 5 runs at 3d7eecc (with this PR). Happy to say that the median results were 792 ops/s before, and 1829 ops/s after, which is even higher than the original ~1600ops/s baseline back in November! Great work @jordanlewis!

However, I also did some kv95/enc=false/nodes=1/cpu=32 GCE runs to see if this might address #62156. Unfortunately, this seems to have regressed from 60589 ops/s without this PR to 56142 ops/s with this PR. The results were fairly noisy, so I did 10 runs each, and it should probably be taken with a grain of salt. I don't think it should block this PR, but if you have any ideas for why this might have caused kv95 to regress then that'd be worth looking into.

jordanlewis · 2021-03-22T01:28:11Z

Interesting. @erikgrinaker, thank you for working on these benchmark results.

I'm currently baffled why this PR would have made kv95 worse. I would expect that it would have been a no-op. I will look into this as well.

jordanlewis · 2021-03-22T03:28:37Z

I did a couple of brief tests and couldn't reproduce the behavior you saw, but I'm not ruling out the possibility I've missed something just yet.

Also, I wouldn't suspect that this fix could help with the slow regression you're looking at. The behavior for single-key primary key lookup workloads (like the ones in KV) should be exactly the same before and after this patch. 🤔

erikgrinaker · 2021-03-22T10:24:35Z

Yeah, it's weird. I'm fine with putting it down to noise, but I just did another run now and got similar results. The thing that jumps out at me is that it's always slower to ramp up with this change -- have a look at the instantaneous rates below:

Without this PR:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1.0s        0        40747.2        45317.0      0.9      3.5      7.3     15.2 read
    1.0s        0         2148.8         2389.8      3.3      8.9     12.6     17.8 write
    2.0s        0        47194.0        46255.5      0.9      3.1      7.3     16.3 read
    2.0s        0         2516.2         2453.0      3.1      9.4     13.1     18.9 write
    3.0s        0        49113.7        47206.9      0.8      2.9      7.1     18.9 read
    3.0s        0         2600.1         2502.0      3.1      8.9     12.1     19.9 write
    4.0s        0        49923.5        47886.7      0.8      2.9      6.8     16.8 read
    4.0s        0         2656.9         2540.7      3.4      9.4     12.1     21.0 write
    5.0s        0        51812.2        48671.6      0.8      2.4      6.6     17.8 read
    5.0s        0         2646.0         2561.8      2.9     10.0     15.2     21.0 write

With this PR:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1.0s        0        29460.3        33190.0      1.1      5.2      8.9     27.3 read
    1.0s        0         1538.0         1732.7      3.8     10.0     13.6     26.2 write
    2.0s        0        33780.4        33484.5      1.0      5.0      8.9     24.1 read
    2.0s        0         1736.3         1734.5      4.5     11.0     15.7     25.2 write
    3.0s        0        32794.3        33254.4      1.0      5.5     10.0     21.0 read
    3.0s        0         1746.0         1738.3      4.5     12.1     17.8     28.3 write
    4.0s        0        35227.5        33747.8      1.0      4.7      9.4     22.0 read
    4.0s        0         1867.9         1770.7      3.8      9.4     14.7     29.4 write
    5.0s        0        34809.9        33960.6      1.0      5.2     10.5     19.9 read
    5.0s        0         1874.7         1791.5      3.8     11.0     14.7     26.2 write

This pattern holds across all the runs I've done. But I don't have a good explanation for why this change should cause this.

jordanlewis · 2021-03-22T13:27:38Z

@erikgrinaker does the pattern reoccur in your test setup with read-percent=100?

jordanlewis · 2021-03-22T13:50:57Z

Could you share the exact invocations you used to do your tests?

erikgrinaker · 2021-03-22T13:52:58Z

Sure:

$ roachtest -u grinaker --cockroach cockroachshort-linux-jordanlewis-pre run --count 5 --cpu-quota 320 '^kv95/enc=false/nodes=1/cpu=32$'

$ roachtest -u grinaker --cockroach cockroachshort-linux-jordanlewis-post run --count 5 --cpu-quota 320 '^kv95/enc=false/nodes=1/cpu=32$'

I'll do a run with a patched roachtest using read-percent=100 soon, running some other benchmarks atm.

erikgrinaker · 2021-03-22T14:37:41Z

Seeing the same with read-percent=100, median is 66267 ops/s vs. 61842 ops/s. Slow ramp-up:

Without PR:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1.0s        0        45943.9        52634.9      0.9      3.5      7.6     16.8 read
    2.0s        0        53887.3        53256.5      0.9      3.4      6.8     15.7 read
    3.0s        0        54726.6        53746.9      0.9      3.4      6.8     16.8 read
    4.0s        0        55622.4        54215.3      0.9      3.3      7.1     15.7 read
    5.0s        0        56558.9        54684.7      0.9      3.1      7.1     17.8 read
    6.0s        0        58279.0        55282.8      0.9      2.9      6.6     15.2 read
    7.0s        0        57671.5        55624.7      0.9      3.0      6.6     14.7 read
    8.0s        0        58504.5        55984.0      0.9      2.9      6.6     15.7 read
    9.0s        0        57625.4        56166.6      0.9      3.0      7.1     15.2 read
   10.0s        0        58540.9        56404.1      0.9      2.9      6.8     18.9 read

With PR:

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1.0s        0        31237.3        35882.4      1.2      5.0      9.4     22.0 read
    2.0s        0        37625.5        36753.0      1.2      4.7      8.4     23.1 read
    3.0s        0        37586.4        37031.0      1.1      5.0      9.4     23.1 read
    4.0s        0        38858.7        37488.1      1.2      4.5      7.9     18.9 read
    5.0s        0        38599.1        37710.2      1.1      4.5      8.9     18.9 read
    6.0s        0        38557.1        37851.5      1.1      4.7      8.4     19.9 read
    7.0s        0        38632.2        37963.4      1.2      4.5      8.9     18.9 read
    8.0s        0        40179.1        38239.8      1.1      4.5      8.9     18.9 read
    9.0s        0        39869.8        38421.0      1.0      4.7      8.9     23.1 read
   10.0s        0        40219.6        38600.9      1.1      4.5      8.9     16.3 read

jordanlewis · 2021-03-22T15:28:58Z

I am having a great deal of trouble reproducing this in my environment. Perhaps it's noise, perhaps it's not. I think I'd like to merge and backport this soon, unless people have objections - my bandwidth for investigating this will also decrease quickly this week.

erikgrinaker · 2021-03-22T16:11:35Z

Ok -- I think this is a pretty clear win for YCSB/E, and if it turns out not to be noise we can optimize that separately. Thanks!

yuzefovich

Reviewed 6 of 6 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)

jordanlewis · 2021-03-22T17:04:53Z

bors r+

yuzefovich · 2021-03-22T23:48:44Z

I hope you don't mind - I've just rebased and updated explain_analyze logic test.

bors r+

jordanlewis · 2021-03-23T00:13:32Z

On the contrary, I very much appreciate it - thanks so much!

craig · 2021-03-23T00:39:04Z

Build failed (retrying...):

GitHub CI (Cockroach)

yuzefovich · 2021-03-23T00:50:24Z

"vectorized batch count" doesn't show up for some of the logic test configs :/

bors r-

craig · 2021-03-23T00:50:25Z

Canceled.

yuzefovich · 2021-03-23T01:08:17Z

Hm, I'm not sure what to do here. The flakes highlight the fact that introducing purely vectorized engine related field to EXPLAIN output is unfortunate. I wonder whether we should merge this PR without changes to the EXPLAIN (i.e. without a regression test), backport the fix, and then come up with the testing strategy.

Alternatively, we could either run explain_analyze logic test only with vectorized configs (which would reduce our test coverage, but it was even worse until recently #62255), or duplicate the file for non-vectorized configs.

yuzefovich · 2021-03-23T23:42:11Z

I'll pick this up and will move over the finish line. I'll update explain_analyze logic test to run only with vec-on configs so that it is not flaky anymore.

Previously, the colfetcher ignored limit hints: it always fetched data from KV until its batch was full. This produces bad behavior if the batch size is larger than the limit hint. For example, if the expected row count was 500, causing us to create a 500-sized batch, but the limit hint for whatever reason was only 20, we would still go ahead and fetch 500 rows. This, in practice, does not appear to show up too easily - if the optimizer is doing its job, the batch size should always be equal to the limit hint for limited scans. Release note: None

Previously, the "dynamic batch size" strategy for the vectorized engine's batch allocator worked the same in every situation: batches would start at size 1, then double on each re-allocation, until they hit their maximum size of 1024. Now, to improve performance for scans that return a number of rows somewhere in between 1 and 1024, we pass in the optimizer's best guess of the number of rows that the scan will produce all the way down into the TableReader. That guess is used as the initial size of the batch if it's less than 1024. Release note (performance improvement): improve the performance for the vectorized engine when scanning fewer than 1024 rows at a time.

yuzefovich · 2021-03-23T23:46:17Z

bors r+

craig · 2021-03-24T00:36:40Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2021-03-24T01:10:54Z

Build succeeded:

GitHub CI (Cockroach)

In cockroachdb#62282, the estimated row count was passed into the scan batch allocator to avoid growing the batch from 1. However, this also changed the default batch size from 1 to 1024 when no row count estimate was available, giving significant overhead when fetching small result sets. On `kv95/enc=false/nodes=1/cpu=32` this reduced performance from 66304 ops/s to 58862 ops/s (median of 5 runs), since these are single-row reads without estimates. This patch reverts the default batch size to 1 when no row count estimate is available. This fully fixes the `kv95` performance regression. YCSB/E takes a small hit going from 1895 ops/s to 1786 ops/s, but this only seems to happen because it takes a while for the statistics to update: sometime within the first minute of the test (after the 1-minute ramp-up period), throughput abruptly changes from ~700 ops/s to ~1800 ops/s, so using a 2-minute ramp-up period in roachtest would mostly eliminate any difference. Release note: None

62175: sql: add crdb_internal.reset_sql_stats() builtin r=asubiotto a=Azhng Previously, there was no mechanism to immediately clear SQL statistics. Users would have to wait until the reset interval expires. This commit creates a builtin to immediately clears SQL stats. Release note (sql change): SQL stats can now be cleared using crdb_internal.reset_sql_stats() Addresses #33315 62492: roachtest: add hibernate-spatial test r=rafiss a=otan * Bump hibernate to 5.4.30 * Add hibernate-spatial test which tests cockroachdb against hibernate's spatial suite. Used a separate suite because the directory magic of copying may not work since the set of running tests overlap a bit. Release note: None 62511: geo/geomfn: fix st_linelocatepoint to work with ZM coords r=otan a=andyyang890 Previously, st_linelocatepoint would panic when the line had Z and/or M coordinates. Release note: None 62534: sql: default to batch size 1 in allocator r=yuzefovich a=erikgrinaker In #62282, the estimated row count was passed into the scan batch allocator to avoid growing the batch from 1. However, this also changed the default batch size from 1 to 1024 when no row count estimate was available, giving significant overhead when fetching small result sets. On `kv95/enc=false/nodes=1/cpu=32` this reduced performance from 66304 ops/s to 58862 ops/s (median of 5 runs), since these are single-row reads without estimates. This patch reverts the default batch size to 1 when no row count estimate is available. This fully fixes the `kv95` performance regression. YCSB/E takes a small hit going from 1895 ops/s to 1786 ops/s, but this only seems to happen because it takes a while for the statistics to update: sometime within the first minute of the test (after the 1-minute ramp-up period), throughput abruptly changes from ~700 ops/s to ~1800 ops/s, so using a 2-minute ramp-up period in roachtest would mostly eliminate any difference. Resolves #62524. Release note: None 62535: roachtest: use 2-minute ramp times for YCSB workloads r=yuzefovich a=erikgrinaker In #62534 it was shown that it takes up to two minutes before we have good enough statistics to allocate appropriately sized batches. However, the YCSB workloads only had a 1-minute ramp time, which would skew the results as throughput would abruptly change when the statistics were updated during the test. This patch changes the ramp time for YCSB workloads to 2 minutes to make sure we have appropriate statistics before starting the actual test. Release note: None 62548: bazel: mark logictest as working in bazel r=rickystewart a=rickystewart Release note: None 62549: workload/schemachange: temporarily disable ADD REGION r=ajstorm,postamar a=otan This is causing flakes in CI. Resolves #62503 Release note: None Co-authored-by: Azhng <[email protected]> Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>

In cockroachdb#62282, the estimated row count was passed into the scan batch allocator to avoid growing the batch from 1. However, this also changed the default batch size from 1 to 1024 when no row count estimate was available, giving significant overhead when fetching small result sets. On `kv95/enc=false/nodes=1/cpu=32` this reduced performance from 66304 ops/s to 58862 ops/s (median of 5 runs), since these are single-row reads without estimates. This patch reverts the default batch size to 1 when no row count estimate is available. This fully fixes the `kv95` performance regression. YCSB/E takes a small hit going from 1895 ops/s to 1786 ops/s, but this only seems to happen because it takes a while for the statistics to update: sometime within the first minute of the test (after the 1-minute ramp-up period), throughput abruptly changes from ~700 ops/s to ~1800 ops/s, so using a 2-minute ramp-up period in roachtest would mostly eliminate any difference. Release note: None

jordanlewis requested a review from a team March 19, 2021 23:00

yuzefovich reviewed Mar 19, 2021

View reviewed changes

jordanlewis force-pushed the estimated-row-count-in-allocator branch from f66b666 to 11ab70c Compare March 20, 2021 00:14

jordanlewis requested a review from a team as a code owner March 20, 2021 00:14

jordanlewis force-pushed the estimated-row-count-in-allocator branch 3 times, most recently from be14785 to 7f2ce8d Compare March 20, 2021 00:58

yuzefovich reviewed Mar 20, 2021

View reviewed changes

jordanlewis force-pushed the estimated-row-count-in-allocator branch from 7f2ce8d to ff09f45 Compare March 20, 2021 01:22

jordanlewis commented Mar 20, 2021

View reviewed changes

jordanlewis force-pushed the estimated-row-count-in-allocator branch 5 times, most recently from 8bfae2e to 3d7eecc Compare March 20, 2021 23:57

erikgrinaker mentioned this pull request Mar 21, 2021

20.2 → 21.1 roachperf benchmark regressions #62322

Closed

15 tasks

yuzefovich approved these changes Mar 22, 2021

View reviewed changes

jordanlewis mentioned this pull request Mar 22, 2021

release-21.1: sql: hint scan batch size by expected row count #62365

Merged

yuzefovich force-pushed the estimated-row-count-in-allocator branch from 3d7eecc to 957bb40 Compare March 22, 2021 23:48

jordanlewis added 2 commits March 23, 2021 16:42

yuzefovich force-pushed the estimated-row-count-in-allocator branch from 957bb40 to 5041b22 Compare March 23, 2021 23:46

yuzefovich mentioned this pull request Mar 24, 2021

release: 21.1 microbenchmark regression investigation and sign-off #62212

Closed

35 tasks

craig bot merged commit a56498f into cockroachdb:master Mar 24, 2021

yuzefovich mentioned this pull request Mar 24, 2021

YCSB/E performance regression on December 4th #62198

Closed

erikgrinaker mentioned this pull request Mar 24, 2021

sql: initial batch sizes too large with no row estimate hint #62524

Closed

erikgrinaker mentioned this pull request Mar 24, 2021

sql: default to batch size 1 in allocator #62534

Merged

yuzefovich mentioned this pull request Mar 25, 2021

release-21.1: sql: default to batch size 1 in allocator #62603

Merged

jordanlewis deleted the estimated-row-count-in-allocator branch March 26, 2021 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: hint scan batch size by expected row count #62282

sql: hint scan batch size by expected row count #62282

jordanlewis commented Mar 19, 2021

cockroach-teamcity commented Mar 19, 2021

yuzefovich left a comment

yuzefovich left a comment

jordanlewis left a comment

erikgrinaker commented Mar 21, 2021

jordanlewis commented Mar 22, 2021

jordanlewis commented Mar 22, 2021

erikgrinaker commented Mar 22, 2021 •

edited

Loading

jordanlewis commented Mar 22, 2021

jordanlewis commented Mar 22, 2021

erikgrinaker commented Mar 22, 2021 •

edited

Loading

erikgrinaker commented Mar 22, 2021 •

edited

Loading

jordanlewis commented Mar 22, 2021

erikgrinaker commented Mar 22, 2021

yuzefovich left a comment

jordanlewis commented Mar 22, 2021

yuzefovich commented Mar 22, 2021

jordanlewis commented Mar 23, 2021

craig bot commented Mar 23, 2021

yuzefovich commented Mar 23, 2021

craig bot commented Mar 23, 2021

yuzefovich commented Mar 23, 2021

yuzefovich commented Mar 23, 2021

yuzefovich commented Mar 23, 2021

craig bot commented Mar 24, 2021

craig bot commented Mar 24, 2021

sql: hint scan batch size by expected row count #62282

sql: hint scan batch size by expected row count #62282

Conversation

jordanlewis commented Mar 19, 2021

cockroach-teamcity commented Mar 19, 2021

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

erikgrinaker commented Mar 21, 2021

jordanlewis commented Mar 22, 2021

jordanlewis commented Mar 22, 2021

erikgrinaker commented Mar 22, 2021 • edited Loading

jordanlewis commented Mar 22, 2021

jordanlewis commented Mar 22, 2021

erikgrinaker commented Mar 22, 2021 • edited Loading

erikgrinaker commented Mar 22, 2021 • edited Loading

jordanlewis commented Mar 22, 2021

erikgrinaker commented Mar 22, 2021

yuzefovich left a comment

Choose a reason for hiding this comment

jordanlewis commented Mar 22, 2021

yuzefovich commented Mar 22, 2021

jordanlewis commented Mar 23, 2021

craig bot commented Mar 23, 2021

yuzefovich commented Mar 23, 2021

craig bot commented Mar 23, 2021

yuzefovich commented Mar 23, 2021

yuzefovich commented Mar 23, 2021

yuzefovich commented Mar 23, 2021

craig bot commented Mar 24, 2021

craig bot commented Mar 24, 2021

erikgrinaker commented Mar 22, 2021 •

edited

Loading

erikgrinaker commented Mar 22, 2021 •

edited

Loading

erikgrinaker commented Mar 22, 2021 •

edited

Loading