Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: 21.1 microbenchmark regression investigation and sign-off #62212

Closed
30 of 35 tasks
nvanbenschoten opened this issue Mar 18, 2021 · 6 comments
Closed
30 of 35 tasks
Labels
A-benchmarking C-performance Perf of queries or internals. Solution not expected to change functional behavior. GA-blocker

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Mar 18, 2021

This time, split out roughly by team! Of the 568 Go microbenchmarks in the cockroachdb/cockroach repository, a large handful have statistically significant regressions between the release-20.2 branch and the master branch.

We should determine the cause of each of these regressions and decide on whether they are deliberate or accidental and warrant further investigation.

Most of this process was automated with benchdiff. This tool can also be helpful to explore specific regressions here using a command like:

go get github.com/nvanbenschoten/benchdiff
benchdiff --new=master --old=release-20.2 --count=10 --run=<your_benchmark> <your_package>

Or, to automate the process of bisecting a regression, the tool can be used with a command like:

export test_name='BenchmarkEntryCacheClearTo'
export pkg_name='./pkg/kv/kvserver/raftentry'
export threshold=0.1
export base_sha=18b6937745c339264314413de61500e58253d9ee # near release-20.2, but easier to compile

# careful with this! It's an aggressive clean.
echo 'git clean -e benchdiff -e post-checkout.sh -dxff' >> post-checkout.sh
echo 'make buildshort' >> post-checkout.sh

git bisect start master ${base_sha}
git bisect run benchdiff --old=${base_sha} --post-checkout='bash post-checkout.sh' --run=${test_name} --count=20 --threshold=${threshold} ${pkg_name}

SQL

Packages:

./pkg/bench/...
./pkg/ccl/logictestccl
./pkg/ccl/oidcccl
./pkg/ccl/partitionccl
./pkg/ccl/sqlproxyccl
./pkg/col/...
./pkg/config/...
./pkg/sql/...
./pkg/sqlmigrations/...

Spreadsheet

Regressions (check to sign off):

  • KV/Update/SQL, @nvanbenschoten, bisected to 5910ab4, test changed, unclear why allocs/op changed, but 👍
  • Distinct/Ordered/..., @yuzefovich, bisected to 76968ff, the benchmark changed slightly (the source incurs more allocations now), yet the throughput has actually improved
  • Distinct/Unordered/..., @yuzefovich signed off because the regression is expected and is only on a small subset of cases; bisected on the first try to 76968ff, on the second try to d9f7bcb. The true cause is actually 11e6312.
  • TableReader/rows=16, @postamar + @ajwerner, bisected to 15ee387, this one looks real and serious, over a 25% reduction in throughput. After 15ee387 is addressed, there still remains 10% hit or so because of cee3bcf which is discussed elsewhere. @yuzefovich signed off
  • Router/MIRROR, not an actual regression, the difference is explained by TAGS=crdb_test being added if the benchmark binary built via make test
  • Append/bytes/AppendWithSel/NullProbability=0.2, can't reproduce
  • MergeJoiner, not reproducible
  • JoinReader/reqOrdering=false @yuzefovich signed off, same as TableReader/rows=16 above.
  • LogicalProjOp/OR,useSel=true,hasNulls=true, @yuzefovich signed off since this benchmark is known to be a bit flaky. The absolute speed is still very high, not worth digging in.
  • RowChannelPipeline/length=4, no regression, same as Router/MIRROR above
  • SortAll/rows=16, can't reproduce
  • AllSpooler, can't reproduce
  • SQL/MultinodeCockroach/InterleavedSelect/count=10, @yuzefovich saw 20% hit, out of which 6% or so can be attributed to using the vectorized engine, and the remaining difference likely has the same root cause as TableReader/rows=16 the significant increase in allocs is due to the vectorized engine and is expected, but the usage of vec engine doesn't actually reduce the throughput. Still there is about 6-8% hit when comparing against 20.2. @yuzefovich signed off since it doesn't seem worth looking into.
  • Select2/Cockroach @yuzefovich is investigating. The vectorized engine is responsible for 29% hit in throughput. The perf hit is reduced to 13% by colexec: optimize CASE, AND, and OR projection operators #62925, the remaining difference is caused by the fact the CASE, AND/OR logical operations have higher overhead in the vectorized engine, and this benchmark is basically the worst case for it. @yuzefovich signed off.
  • VirtualTableQueries/select_crdb_internal.tables_with_1_fk @rafiss investigated and found no culprit. since this test is new in 21.1 (but got backported to 20.2) it's not that easy to bisect. also, the roundtrips stat did decrease, which is more important since this is I/O bound not CPU/mem bound.
  • EndToEnd/tpcc-new-order/EndToEnd @rytaft bisected to sql: decrease vectorize_row_count_threshold to 0 #55713, "decrease vectorize_row_count_threshold to 0".
  • IsDatumEqual @rytaft investigated, and this is not a regression. If anything, looks like a small improvement.
  • WideTableIgnoreColumns/Cockroach, actually an improvement, likely same as Router/MIRROR above
  • Scan/Cockroach/count=1000/limit=10 @yuzefovich signed off. The 2x or so hit is explained by the switch to the vectorized engine by default for everything, and it is mitigated substantially by sql: hint scan batch size by expected row count #62282. The remaining difference is likely explained by TableReader/rows=16 above. mitigated by colfetcher: use the limit hint to size the batch #63020
  • Planning @rytaft investigated, at least half of the regression is due to using the vectorized engine. Couldn't pin the remainder of the regression to a single commit.
  • VecSkipScan/Bench_scan_with_skip, actually an improvement, likely same as Router/MIRROR above
  • AlterTableAddCheckConstraint/alter_table_add_3_check_constraints
  • NameResolution/MultinodeCockroach @rafiss investigated; bisected to sql: decrease vectorize_row_count_threshold to 0 #55713 (vectorize_row_count_threshold=0). seems expected, and also doesn't matter since the test is supposed to be testing name resolution.
  • SQL/Cockroach/InsertFK/count=1000/nFks=1 @RaduBerinde bisected to *: configure the use of separated intents in a cluster #59829, more info in there.
  • Phases/kv-read-const @mgartner bisected to opt: use safer struct initialization pattern #58386, mitigated partially by opt: speed up new init pattern #62767
  • IndexConstraints @mgartner bisected to opt/idxconstraint: recognize stored column expressions #55867 and opt: use safer struct initialization pattern #58386, regression is minimal and unavoidable
  • many more across time/op, alloc/op, allocs/op, and speed. roundtrips looks good though!

Bulk IO

Packages:

./pkg/blobs
./pkg/ccl/backupccl # not run, https://github.com/cockroachdb/cockroach/issues/62083
./pkg/ccl/baseccl
./pkg/ccl/cliccl
./pkg/ccl/importccl # not run, https://github.com/cockroachdb/cockroach/issues/62085
./pkg/jobs/...

Spreadsheet

Regressions (check to sign off):
None, though the two most important packages here are skipped.

KV

Packages:

./pkg/ccl/kvccl/...
./pkg/ccl/serverccl
./pkg/gossip/...
./pkg/kv/...
./pkg/kv/kvserver/gc # not run, too slow
./pkg/roachpb
./pkg/rpc/...
./pkg/server/...

Spreadsheet

Regressions (check to sign off):

Storage

Packages:

./pkg/ccl/storageccl/...
./pkg/storage/...

Spreadsheet

Regressions (check to sign off):

  • PebbleMapIteration
  • ScanDecodeKeyValue/getTs=true: @nvanbenschoten, bisected to 11abdda, 1ns difference due to new switch case in DecodeKey, 👍

Misc

Packages:

./pkg/base
./pkg/ccl/utilccl/...
./pkg/ccl/workloadccl/...
./pkg/cli/...
./pkg/clusterversion
./pkg/cmd/...
./pkg/internal/...
./pkg/security
./pkg/settings
./pkg/testutils/...
./pkg/ts/...
./pkg/util/...
./pkg/workload/...

Spreadsheet

Regressions (check to sign off):

@nvanbenschoten nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-benchmarking GA-blocker labels Mar 18, 2021
@nvanbenschoten nvanbenschoten modified the milestone: 20.2 Mar 18, 2021
nvanbenschoten added a commit to nvanbenschoten/etcd that referenced this issue Mar 20, 2021
…he fields in protos

This commit removes the `XXX_NoUnkeyedLiteral`, `XXX_unrecognized`, and
`XXX_sizecache` auto-generated fields from generated protobuf structs in
the raft package. This was done for all of the same reasons CockroachDB
removed the generation of these fields in cockroachdb/cockroach#38404.
They come with very limited advantages but moderate disadvantages.

`XXX_NoUnkeyedLiteral` and `XXX_sizecache` were only enabled recently in
cc7b4fa, and this appears to have been unintentional. Meanwhile,
`XXX_unrecognized` has been around for longer and has arguably more
reason to stay because it can assist with forwards compatibility.
However, any real mixed-version upgrade story for this package is mostly
untold at this point, and keeping this field seems just as likely to
cause unexpected bugs (e.g. a field was propagated but not updated
correctly when passed through an old version) as it seems to fix real
issues, so it also doesn't warrant its cost.

This reduces the in-memory representation size of all Raft protos.
Notably, it reduces the memory size of an `Entry` proto from *80 bytes*
to *48 bytes* and the memory size of a `Message` proto from *392 bytes*
to *264 bytes*. Both of these structs are used frequently, and often in
slices, where this wasted space really starts to add up.

This was motivated by a regression in microbenchmarks in CockroachDB due
to cc7b4fa, which was caught in cockroachdb/cockroach#62212.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 20, 2021
The benchmark was not using the subtest's `b.N`, so it was getting bogus
results. This confused cockroachdb#62212, which saw a regression from 0.000008611111111
nanoseconds to 0.0000101 nanoseconds.

With this fix, the benchmark no longer shows any movement over the past release.
nvanbenschoten added a commit to nvanbenschoten/etcd that referenced this issue Mar 20, 2021
…he fields in protos

This commit removes the `XXX_NoUnkeyedLiteral`, `XXX_unrecognized`, and
`XXX_sizecache` auto-generated fields from generated protobuf structs in
the raft package. This was done for all of the same reasons CockroachDB
removed the generation of these fields in cockroachdb/cockroach#38404.
They come with very limited advantages but moderate disadvantages.

`XXX_NoUnkeyedLiteral` and `XXX_sizecache` were only enabled recently in
cc7b4fa, and this appears to have been unintentional. Meanwhile,
`XXX_unrecognized` has been around for longer and has arguably more
reason to stay because it can assist with forwards compatibility.
However, any real mixed-version upgrade story for this package is mostly
untold at this point, and keeping this field seems just as likely to
cause unexpected bugs (e.g. a field was propagated but not updated
correctly when passed through an old version) as it seems to fix real
issues, so it also doesn't warrant its cost.

This reduces the in-memory representation size of all Raft protos.
Notably, it reduces the memory size of an `Entry` proto from *80 bytes*
to *48 bytes* and the memory size of a `Message` proto from *392 bytes*
to *264 bytes*. Both of these structs are used frequently, and often in
slices, where this wasted space really starts to add up.

This was motivated by a regression in microbenchmarks in CockroachDB due
to cc7b4fa, which was caught in cockroachdb/cockroach#62212.
postamar pushed a commit to postamar/cockroach that referenced this issue Mar 22, 2021
Recent benchmark results revealed a performance regression in
TableReader caused by the recent introduction of the descriptor builder
pattern which deep-copies descpb messages using protoutil.Clone.
See issue cockroachdb#62212 and commit 15ee387.

This commit adds an unsafe table descriptor builder which is used with
table descriptors in execinfrapb messages. This unsafe builder does not
perform a deep copy, however in this particular context this isn't
a problem, because the descpb.TableDescriptor only exists as a means to
serialize table descriptors across nodes in a read-only fashion and is
known to not be used in any other capacity.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Mar 23, 2021
Recent benchmark results revealed a performance regression in
TableReader caused by the recent introduction of the descriptor builder
pattern which deep-copies descpb messages using protoutil.Clone.
See issue cockroachdb#62212 and commit 15ee387.

This commit adds an unsafe table descriptor constructor which is used
with table descriptors in execinfrapb messages. This unsafe constructor
does not perform a deep copy. In this particular context this isn't a
problem, because the table descriptor proto only exists as a means to
serialize table descriptors across nodes in a read-only fashion and is
known to not be used in any other capacity.

Release note: None
@yuzefovich
Copy link
Member

PSA comment: the bisect might be incorrect (not sure on the conditions though) - for example, for Distinct/Unordered/... in pkg/sql/colexec, Nathan bisected to one commit, I then bisected to another, yet the true cause is the third commit.

@yuzefovich
Copy link
Member

PSA: for folks looking into the benchmark regressions, I’d recommend explicitly specifying TAGS=crdb_test_off when building/running the benchmarks in order for metamorphic randomizations not influence the results. The thing is if the binary is built via make test …, we will add TAGS=crdb_test by default automatically during the build, and then it will be somewhat random whether the metamorphic values are actually used. I imagine that this should mostly affect the SQL execution benchmarks, but it’s a good rule of thumb for everyone

craig bot pushed a commit that referenced this issue Mar 24, 2021
62350: tracing: elide logtag creation for noop span r=erikgrinaker a=tbg

Release note: None


62381: cli: update locality help to refer to region/zone r=rmloveland a=rmloveland

Fixes #62375.

Previously, the help output for `--locality` and `--locality-addr`
referred to "data centers" as part of their usage explanations.

This needed to change because we expect the majority of users to be
running CockroachDB on a cloud provider that primarily uses terms like
'region' and 'availability zone'.

To address this, we updated the help strings to use those terms, rather
than 'datacenter', etc.

Release note (cli): Updated the output of `--locality` and
`--locality-addr` flags to use terms that match cloud provider names for
things such as 'region' and 'zone'.

62497: server: refactor StartTenant() to pkg/server/tenant.go r=tbg a=Azhng

Previously, StartTenant() function resided in pkg/server/testserver.go.
However, since this function is not solely used for testing purposes,
it is moved to its own file.

Release note: None

62513: colexecjoin: optimize the initialization of the merge joiner r=yuzefovich a=yuzefovich

Whenever the merge joiner doesn't know whether the group continues into
the next batch, it appends all tuples belonging to the group currently
being built to the "buffered group". Internally, that append requires us
that we perform a deep copy of the relevant tuples from the input batch
into a scratch space since `Enqueue` method only supports full-batch
operation.

Previously, we would always allocate the scratch batch in `Init` with
maximum possible size; however, this is too much when the input data
sets are small. This commit switches the allocation to be lazy and
utilizes our `ResetMaybeReallocate` method of dynamic batch sizing. This
shows up somewhat noticeably, for example, on
`SQL/MultinodeCockroach/InterleavedSelect` benchmark (which is using the
merge join since the interleaved reader joiner has been removed).

Addresses: #62212.

Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Rich Loveland <[email protected]>
Co-authored-by: Azhng <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
craig bot pushed a commit that referenced this issue Mar 24, 2021
62388: sql: add tabledesc.NewUnsafeImmutable constructor r=postamar a=postamar

    sql: add tabledesc.NewUnsafeImmutable constructor
    
    Recent benchmark results revealed a performance regression in
    TableReader caused by the recent introduction of the descriptor builder
    pattern which deep-copies descpb messages using protoutil.Clone.
    See issue #62212 and commit 15ee387.
    
    This commit adds an unsafe table descriptor constructor which is used
    with table descriptors in execinfrapb messages. This unsafe constructor
    does not perform a deep copy. In this particular context this isn't a
    problem, because the table descriptor proto only exists as a means to
    serialize table descriptors across nodes in a read-only fashion and is
    known to not be used in any other capacity.
    
    Release note: None


62500: teamcity: copy roachtest weekly script to build/ r=rail a=rickystewart

This new script is just the content of the "custom script" of the
Cockroach > Nightlies > Roachtest Weekly build configuration; but with
a reference to `make` replaced by `mkrelease`. (See
https://github.com/cockroachdb/dev-inf/issues/347.)

Release note: None

Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
postamar pushed a commit to postamar/cockroach that referenced this issue Mar 25, 2021
Recent benchmark results revealed a performance regression in
TableReader caused by the recent introduction of the descriptor builder
pattern which deep-copies descpb messages using protoutil.Clone.
See issue cockroachdb#62212 and commit 15ee387.

This commit adds an unsafe table descriptor constructor which is used
with table descriptors in execinfrapb messages. This unsafe constructor
does not perform a deep copy. In this particular context this isn't a
problem, because the table descriptor proto only exists as a means to
serialize table descriptors across nodes in a read-only fashion and is
known to not be used in any other capacity.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Mar 25, 2021
Recent benchmark results revealed a performance regression in
TableReader caused by the recent introduction of the descriptor builder
pattern which deep-copies descpb messages using protoutil.Clone.
See issue cockroachdb#62212 and commit 15ee387.

This commit adds an unsafe table descriptor constructor which is used
with table descriptors in execinfrapb messages. This unsafe constructor
does not perform a deep copy. In this particular context this isn't a
problem, because the table descriptor proto only exists as a means to
serialize table descriptors across nodes in a read-only fashion and is
known to not be used in any other capacity.

Release note: None
craig bot pushed a commit that referenced this issue Mar 25, 2021
62294: util/interval: fix BenchmarkBTreeDeleteInsertCloneEachTime r=nvanbenschoten a=nvanbenschoten

The benchmark was not using the subtest's `b.N`, so it was getting bogus
results. This confused #62212, which saw a regression from 0.000008611111111
nanoseconds to 0.0000101 nanoseconds.

With this fix, the benchmark no longer shows any movement over the past release
and produces much more realistic results:
```
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkBTreeDeleteInsertCloneEachTime/count=16/reset=false-16         	 2367748	       492.7 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=16/reset=true-16          	 6090852	       191.7 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=128/reset=false-16        	  991534	      1102 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=128/reset=true-16         	 2328866	       508.7 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=1024/reset=false-16       	  550702	      2088 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=1024/reset=true-16        	 1209684	      1054 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=8192/reset=false-16       	  463558	      2176 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=8192/reset=true-16        	  724923	      1542 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=65536/reset=false-16      	  264100	      4833 ns/op
BenchmarkBTreeDeleteInsertCloneEachTime/count=65536/reset=true-16       	  429841	      2467 ns/op
```

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 30, 2021
This commit picks up etcd-io/etcd#12790, which
resolves a regression observed in cockroachdb#62212.

Picks up the following four commits:
- etcd-io/etcd@87258ef
- etcd-io/etcd@2c2456b
- etcd-io/etcd@ebc0174
- etcd-io/etcd@e51c697

```
name                  old time/op    new time/op    delta
EntryCache-16            114µs ± 7%      96µs ±12%  -15.29%  (p=0.000 n=19+19)
EntryCacheClearTo-16    37.7µs ± 4%    34.9µs ± 4%   -7.55%  (p=0.000 n=17+18)

name                  old alloc/op   new alloc/op   delta
EntryCacheClearTo-16    1.28kB ± 0%    0.77kB ± 0%  -40.00%  (p=0.000 n=20+20)
EntryCache-16           83.3kB ± 0%    50.0kB ± 0%  -39.95%  (p=0.000 n=19+20)

name                  old allocs/op  new allocs/op  delta
EntryCache-16             3.00 ± 0%      3.00 ± 0%     ~     (all equal)
EntryCacheClearTo-16      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```
craig bot pushed a commit that referenced this issue Mar 31, 2021
62819: go.mod: bump etcd/raft to pick up proto size improvements r=tbg a=nvanbenschoten

This commit picks up etcd-io/etcd#12790, which resolves a regression observed in #62212.

Picks up the following four commits, each of which I've reviewed for safety:
- etcd-io/etcd@87258ef
- etcd-io/etcd@2c2456b
- etcd-io/etcd@ebc0174
- etcd-io/etcd@e51c697

```
name                  old time/op    new time/op    delta
EntryCache-16            114µs ± 7%      96µs ±12%  -15.29%  (p=0.000 n=19+19)
EntryCacheClearTo-16    37.7µs ± 4%    34.9µs ± 4%   -7.55%  (p=0.000 n=17+18)

name                  old alloc/op   new alloc/op   delta
EntryCacheClearTo-16    1.28kB ± 0%    0.77kB ± 0%  -40.00%  (p=0.000 n=20+20)
EntryCache-16           83.3kB ± 0%    50.0kB ± 0%  -39.95%  (p=0.000 n=19+20)

name                  old allocs/op  new allocs/op  delta
EntryCache-16             3.00 ± 0%      3.00 ± 0%     ~     (all equal)
EntryCacheClearTo-16      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```

Co-authored-by: Nathan VanBenschoten <[email protected]>
@yuzefovich
Copy link
Member

@sumeerbhola I want to bring to your attention that cee3bcf is the cause of the several regressions in the microbenchmarks of SQL package. For example, here is the impact of that commit on JoinReader microbenchmark:

name                                                                                    old time/op    new time/op    delta
JoinReader/reqOrdering=false/matchratio=onetosixteen/lookuprows=32768/mem=unlimited-24     2.03s ± 1%     2.37s ± 0%   +17.16%  (p=0.000 n=9+8)

name                                                                                    old speed      new speed      delta
JoinReader/reqOrdering=false/matchratio=onetosixteen/lookuprows=32768/mem=unlimited-24  2.20MB/s ± 1%  1.88MB/s ± 1%   -14.46%  (p=0.000 n=8+8)

name                                                                                    old alloc/op   new alloc/op   delta
JoinReader/reqOrdering=false/matchratio=onetosixteen/lookuprows=32768/mem=unlimited-24     333MB ± 4%     362MB ± 3%    +8.61%  (p=0.000 n=10+10)

name                                                                                    old allocs/op  new allocs/op  delta
JoinReader/reqOrdering=false/matchratio=onetosixteen/lookuprows=32768/mem=unlimited-24     1.03M ± 0%     2.84M ± 0%  +174.55%  (p=0.000 n=8+8)

I know that you have been looking into addressing the impact of that change, and I want to call out that it has impact outside of simple reads like KV95 workload.

@yuzefovich
Copy link
Member

yuzefovich commented Apr 1, 2021

I looked into Planning benchmark from pkg/bench. The current diff between 20.2 and 21.1 is pretty large (up to 25% slowdown on some queries). The difference is partially explained by the switch to the vectorized engine (here is the comparison of release-21.1 with vectorize=off against vectorize=on). On average I'd say a third of regression is explained by it.

As @rytaft pointed out, bisecting this regression is actually quite hard because the benchmarks are affected by changes in multiple layers. I arrived at a couple (but I'm sure there are others) commits that had some impact:

craig bot pushed a commit that referenced this issue Apr 1, 2021
61948: roachtest/cdc: update status when watching changefeed latency r=andreimatei a=andreimatei

When watching the latency, we were logging every second. This patch
reduces the frequency to 1/10s and makes the test also update its status
so that the current latency high-water mark is visible in the web UI.

Release note: None

62166: sql: add roach test / workload for testing connection latencies r=rafiss a=RichardJCai

Add roach test and workload test for testing connection latencies.

Release note: None

Making it a private test right now (not available through `cockroach workload`)

Need to also add it to teamcity and make sure the results are reported in roachperf, will open PR there.

Resolves #59394


62287: cdc: Improve avro encoder performance.  r=miretskiy a=miretskiy

Add avro encoder benchmarks.

Avoid expensive allocations (maps) when encoding datums.
Improve encoder performance by ~40%, and significantly reduce
memory allocations per op.

Release Notes: None

62883: sqlsmith: add support for ALTER TYPE ... DROP VALUE r=rafiss a=arulajmani

Now that `ALTER TYPE ... DROP VALUE` is a thing, let's add it to SQL
Smith.

Closes #62812

Release note: None

62922: roachtest/tpccbench: don't require workload binary r=andreimatei a=andreimatei

tpccbench was copying around the `workload` binary, but it then doesn't
use it; it uses `cockroach workload ...`. This patch removes the
copying.

Release note: None

62925: colexec: optimize CASE, AND, and OR projection operators r=yuzefovich a=yuzefovich

Previously, in CASE, AND, and OR projection operators we would always
allocate internal selection vectors of maximum capacity in the
constructor. This is inefficient when the query processes only small
data sets. This commit makes the allocations lazy and of tight sizes
based on the batch lengths.

Addresses: #62212

Release note: None

62926: bench,sql: add logScope to some benchmarks r=yuzefovich a=yuzefovich

This commits adds `defer log.Scope(b).Close(b)` to some benchmarks in
`bench`, `colexec`, and `rowexec` packages. Note that in some cases this
might not matter, but I didn't think through each carefully since it
shouldn't do any harm.

Release note: None

62962: sql: fix bug in close of ieSyncResultChannel r=yuzefovich a=ajwerner

When finishing sending on the ieSyncResultChannel we do not need to and should
not close the channel used to unblock the sender. Doing so can result in a
panic if the reader is concurrently trying to unblock the sender. We could
close that channel but there's no obvious reason to.

Fixes #62939

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: richardjcai <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: arulajmani <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@sumeerbhola
Copy link
Collaborator

@sumeerbhola I want to bring to your attention that cee3bcf is the cause of the several regressions in the microbenchmarks of SQL package. For example, here is the impact of that commit on JoinReader microbenchmark:

@yuzefovich thanks for bringing this up. I ran this benchmark on master with some instrumentation and looked at profiles. There are a few things to note:

  • as expected, there is significant batching of scans, so we are reusing the iterators across the batch, which is meant to help.
  • the benchmark is heavy on SeekGE and Next, with an iterator with lower and upper bound. There are Pebble optimizations for 21.1, when data is in sstables, that are meant to help with this. But they are not playing much of a role since the benchmark has only 7MB in 2 files in L6 and the rest in the memtable. This LSM is not very representative of real systems. This is also why SetBounds is showing up significantly in the profile, even though we do it once followed by 1 SeekGE and 16 Next calls -- there isn't much work to be done in those SeekGE and Next calls. I expect that the matchratio=onetoone and matchratio=onetofour would show even more of a regression for these reasons.
  • there are memory allocations in the SetBounds path that can be optimized. I'll make those changes.

craig bot pushed a commit that referenced this issue Apr 3, 2021
63020: colfetcher: use the limit hint to size the batch r=yuzefovich a=yuzefovich

We recently merged a couple of commits to take advantage of the
optimizer-driven estimated row count that will be output by the cfetcher
and of the limit hint when available. However, one scenario was
overlooked - if there is no estimate but the limit hint is available, we
don't use the latter to size the original batch, and now we do.

As an additional optimization this commit refactors the initial state of
the state machine so that we didn't allocate the output batch in the
constructor of the ColBatchScan but do it only on the very first call
to `NextBatch`. This was prompted by the fact that the limit hint is
currently only set in `StartScan` which is called in
`ColBatchScan.Init`, so without some changes, in the case when the limit
hint can be used to size the batch we would allocate a redundant batch
of size one in `NewColBatchScan`. I also think that the new initial
state of the state machine makes a bit more sense to me.

Additionally, this commit generalizes the logic to keep track of the
limit hint as "remaining rows to be fetched."

Addresses: #62212.

Release note: None (this is a follow-up to a recently merged commit that
had a release note)

Co-authored-by: Yahor Yuzefovich <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 9, 2021
This commit picks up etcd-io/etcd#12790, which
resolves a regression observed in cockroachdb#62212.

Picks up the following four commits:
- etcd-io/etcd@87258ef
- etcd-io/etcd@2c2456b
- etcd-io/etcd@ebc0174
- etcd-io/etcd@e51c697

```
name                  old time/op    new time/op    delta
EntryCache-16            114µs ± 7%      96µs ±12%  -15.29%  (p=0.000 n=19+19)
EntryCacheClearTo-16    37.7µs ± 4%    34.9µs ± 4%   -7.55%  (p=0.000 n=17+18)

name                  old alloc/op   new alloc/op   delta
EntryCacheClearTo-16    1.28kB ± 0%    0.77kB ± 0%  -40.00%  (p=0.000 n=20+20)
EntryCache-16           83.3kB ± 0%    50.0kB ± 0%  -39.95%  (p=0.000 n=19+20)

name                  old allocs/op  new allocs/op  delta
EntryCache-16             3.00 ± 0%      3.00 ± 0%     ~     (all equal)
EntryCacheClearTo-16      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```
@yuzefovich
Copy link
Member

Pretty much all major regressions have been investigated, most have been fixed, and I don't think anyone is planning on looking more into them, so I'm closing this issue. Thanks everyone who joined in on the fun!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-benchmarking C-performance Perf of queries or internals. Solution not expected to change functional behavior. GA-blocker
Projects
None yet
Development

No branches or pull requests

3 participants