release: 22.2 microbenchmark regression investigation and sign-off #87685

herkolategan · 2022-09-09T09:42:47Z

This is a tracking bug for issues we want to investigate related to performance for 22.1 similar to what's been done for prior releases (#78592).

SQL

benchdiff --new==5d7bbe7 --old=212fc1a --count=10 --post-checkout='make buildshort' --sheets ./pkg/sql/...

spread sheets

NOTE: importer was skipped due to a panic and other errors.

pkg/bench

benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/bench

spread sheets

pkg/col...

benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/col...

spread sheets

KV

./pkg/kv/kvserver/concurrency
./pkg/kv/kvserver/gc
./pkg/kv/kvserver/raftentry
./pkg/kv/kvserver/rangefeed
./pkg/kv/kvserver/spanlatch
./pkg/kv/kvserver/tscache

benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/kv/kvserver/concurrency ./pkg/kv/kvserver/gc ./pkg/kv/kvserver/raftentry ./pkg/kv/kvserver/rangefeed ./pkg/kv/kvserver/spanlatch ./pkg/kv/kvserver/tscache

spread sheets

./pkg/gossip/...
./pkg/roachpb
./pkg/rpc/...
./pkg/server/...

benchdiff --new=release-22.2 --old=release-22.1 --count=10 --post-checkout='make buildshort' --sheets ./pkg/gossip/... ./pkg/roachpb ./pkg/rpc/... ./pkg/server/...

spread sheets
NOTE: BenchmarkGRPCDial failed with context_test.go:1936: initial connection heartbeat failed: rpc error: code = Unimplemented desc = unknown service cockroach.rpc.Heartbeat

Bulk I/O

benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/blobs

spread sheets

NOTE: The results are empty because the benches don't suppress stdout causing results to be on subsequent lines
NOTE: Same issue as in #78592.

Update

spread sheets
NOTE: Managed to get output from pkg/blobs by temporarily applying Stan's fix (release: 22.1 microbenchmark regression investigation and sign-off #78592 (comment))

Storage

pkg/ccl/storageccl/...

benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/ccl/storageccl/...

spread sheets

pkg/storage/...

benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/storage/...

spread sheets
NOTE: Various names for these Benchmarks have changed between revisions resulting in omission from the spread sheet.

Misc

./pkg/base ./pkg/ccl/utilccl/... ./pkg/ccl/workloadccl/... ./pkg/cli/... ./pkg/clusterversion ./pkg/cmd/... ./pkg/internal/... ./pkg/security ./pkg/settings ./pkg/testutils/... ./pkg/ts/... ./pkg/util/... ./pkg/workload/...

benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/base ./pkg/ccl/utilccl/... ./pkg/ccl/workloadccl/... ./pkg/cli/... ./pkg/clusterversion ./pkg/cmd/... ./pkg/internal/... ./pkg/security ./pkg/settings ./pkg/testutils/... ./pkg/ts/... ./pkg/workload/...

spread sheets

NOTE: pkg/cli had to be skipped because it failed to build in the old version,

building benchmark binaries for '212fc1a'  14/177 \
fatal: building test binary: exit status 2: # github.com/cockroachdb/cockroach/pkg/cli/clisqlclient
pkg/cli/clisqlclient/conn.go:389:6: fmt.Fprintln arg list ends with redundant newline

Benchmark notes:

Even though dev supports setting up the correct go version for a revision, benchdiff does not and still needs to build the benchmark binaries. Thus a post checkout script is required to switch to the appropriate go version for each revision (only applies if the revisions require different go versions).
pkg/storage is more resource intensive than the other benchmarks and requires at least 200 GB of available disk space as well as 64 GB memory.
pkg/storage also exceed the default 1024 file limit per user, which needs to be adjusted for the benchmark to complete.

Issues:

SQL Importer testdata path issues when running from benchdiff (BenchmarkOCFImport, BenchmarkBinaryJSONImport)
Segfault in BenchmarkConvertToKVs
Regressions in pkg/col/…
Regressions in pkg/ccl/storageccl/…
Regression in CSVRowsReader-24

Jira issue: CRDB-19475

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2022-09-12T15:47:55Z

Don't see anything for ./pkg/storage/ here, is that an oversight?

herkolategan · 2022-09-13T11:15:28Z

Don't see anything for ./pkg/storage/ here, is that an oversight?

Could be, I will check back on Thursday and run it, if I missed it.

mgartner · 2022-09-26T15:35:27Z

Did pkg/sql/opt benchmarks get run? I don't see them here.

herkolategan · 2022-09-26T16:55:05Z

Did pkg/sql/opt benchmarks get run? I don't see them here.

I did a quick grep but didn't see it in the output. I'll see what happened to it and update accordingly.

@mgartner It seem the panic I mentioned in the Issues list truncated the run, thus more might be missing from pkg/sql. I'll rerun without the package causing the panic and produce a new sheet with the results. Thanks for pointing it out.

herkolategan · 2022-09-27T21:43:53Z

@mgartner pkg/sql/opt should now be available in the updated description.

knz · 2022-09-27T22:47:00Z

@herkolategan for the kv stuff
./pkg/gossip/...
./pkg/roachpb
./pkg/rpc/...
./pkg/server/...

I don't see them in the spreadsheet. It looks like the command you used does not include them? I see this command:

benchdiff ... --sheets ./pkg/kv/kvserver/concurrency 
./pkg/kv/kvserver/gc ./pkg/kv/kvserver/raftentry 
./pkg/kv/kvserver/rangefeed ./pkg/kv/kvserver/spanlatch ./pkg/kv/kvserver/tscache

which is the one that contains pkg/gossip, pkg/roachpb, pkg/rpc and pkg/server?

herkolategan · 2022-09-28T11:00:54Z

@knz Right, that doesn't make sense, I should have verified the template I used more thoroughly.
Added a command for those and added the spread sheets.
I did see a failure and the spread sheet does look a bit light, so let me know if anything is up or missing.

cat out.2022-09-28T10_28_26Z | grep pkg/ | uniq
pkg: github.com/cockroachdb/cockroach/pkg/gossip
pkg: github.com/cockroachdb/cockroach/pkg/roachpb
pkg: github.com/cockroachdb/cockroach/pkg/server

yuzefovich · 2022-10-04T00:01:41Z

Signing off on pkg/col regressions - I couldn't reproduce them manually, and no relevant changes that would affect those come to mind.

Signing off on pkg/bench regressions - most of them aren't reproducible (or less severe) when run with longer benchtime parameter. Some regressions have been confirmed (1, 2, 3), and there have been some fixes (#88614, #89317).

Signing off on pkg/sql/... regressions. The most severe regressions are a problem with the benchmark (fixed by #88944), GetZoneConfig regression is tracked in #89410, join reader regressions are expected. There is a fix in #89461 to some of the row-by-row processor regressions, WriteTextColumnarFloat is due to a fix in b43ed2b. Regressions in BenchmarkKV/Scan/Native are tracked in #89545, in BenchmarkEndToEnd in #89547. Other regressions don't seem concerning to me.

#89410 and #89545 are marked as GA blockers until further triage.

yuzefovich · 2022-10-04T23:43:31Z

What do people think about running the microbenchmarks with longer -test.benchtime parameter? It looks like benchdiff tool doesn't specify that, so the default of 1s is used. That can be problematic when a single run of the benchmark takes more than a second since it could introduce a lot of variance to the running times.

For example, I have been looking at a possible regression in one of the benchmarks from the sheet in this issue (from pkg/bench):

WideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-24     2.21s ±18%     2.94s ±25%   +33.10%  (p=0.002 n=10+10)

Here a single run takes on the order of 2-3 seconds, so there is quite a bit of variance, and there appear to be a significant regression. Once I gave 10s of bench time to each run (I was using scripts/bench with 10 runs each, with each run taking at least 10s):

WideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-24     2.21s ± 8%     2.15s ± 7%     ~     (p=0.136 n=9+9)

We see that the confidence in these numbers have increased and there is no noticeable regressions. I used the same SHAs for both runs with the only difference that the latter had -benchtime=10s addition to scripts/bench.

I'm proposing the following adjustment for the next time we do this:

first, run all of the benchmarks with -test.benchtime=1x to make sure that everything compiles and that all expected benchmarks are in fact run
then, during the second actual run do -test.benchtime=5s (or possibly longer) to eliminate some sources of variance in the benchmarks.

One might argue that it's a poor benchmark choice if a single iteration takes more than a second to run, but we have what we have. Thoughts? cc @nvanbenschoten @srosenberg

blathers-crl · 2022-10-05T03:18:20Z

cc @cockroachdb/test-eng

srosenberg · 2022-10-10T22:41:34Z

One might argue that it's a poor benchmark choice if a single iteration takes more than a second to run, but we have what we have. Thoughts? cc @nvanbenschoten @srosenberg

Right. This particular benchmark is hardly a micro one. It definitely makes sense to run more iterations, but maybe we should increase the number of iterations instead of duration? or both? This way we could at least ensure there is a lower bound on the number of iterations. Note that even increasing the number of iterations yields a couple of different options, e.g.,

Run for 1 second (default)

./dev bench pkg/bench --filter=BenchmarkWideTable --bench-mem --ignore-cache -v --timeout 5m

BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1415921500 ns/op	6867795496 B/op	  110127 allocs/op

Run for 5 seconds (i.e., run as many iterations as possible)

./dev bench pkg/bench --filter=BenchmarkWideTable --bench-mem --ignore-cache -v --bench-time=5s --timeout 5m

BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       5	1274804042 ns/op	4475226016 B/op	  513871 allocs/op

Run 5 times (i.e., each run is independent and uses the default duration of 1 second),

 ./dev bench pkg/bench --filter=BenchmarkWideTable --bench-mem --ignore-cache -v --count=5 --timeout 5m

BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1524094833 ns/op	5113291336 B/op	  554734 allocs/op
BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1385707083 ns/op	5433097440 B/op	  653375 allocs/op
BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1450949708 ns/op	5397546736 B/op	  620576 allocs/op
BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1393761625 ns/op	5985737768 B/op	  771592 allocs/op
BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1008088250 ns/op	4883512936 B/op	  106154 allocs/op

Run 5 iterations (i.e., benchmark function is executed exactly `b.N == 5` times)

./dev bench pkg/bench --filter=BenchmarkWideTable --bench-mem --ignore-cache -v --bench-time=5x --timeout 5m

BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       5	1430362492 ns/op	5136016025 B/op	  199160 allocs/op

srosenberg · 2022-10-10T22:57:05Z

Clearly, there isn't a "magic" option which will result in lower variance. Microbenchmarks are notorious for having high variance. E.g., in [1], the authors surveyed a large number of microbenchmarks executed in the cloud; their conclusion was to use at least 20 instances (i.e., VMs) to find slowdowns with high confidence,

In 77–83% of the time, a slowdown below 10% is reliably detectable when using trial-based sampling and 20 instances.

So, we should aim to increase the number of independent runs as well as iterations. @herkolategan Let's capture these notes in the umbrella issue for automating bench runs in CI.

[1] https://www.ifi.uzh.ch/dam/jcr:326f9543-d719-4e8c-a27e-d5a5cace1abf/emse_smb_cloud.pdf

nvanbenschoten · 2022-10-18T01:21:31Z

Signing off on all KV regressions. The interesting, reproducible regressions identified here were resolved by a combination of #89841, #89545, and #88911.

nicktrav · 2022-10-24T16:42:44Z

Signing off on the pkg/storage regressions. The last remaining issue on #88723 is under review with folks from KV / KV-repl.

mgartner · 2022-10-24T21:12:05Z

Signing off on all pkg/sql regressions. The only unsolved regression is #89547 which is a relatively minor regression for a very specific edge case. We will explore the regression further in the future.

89461: execinfra: remove possible logging for each output row of processors r=yuzefovich a=yuzefovich This commit removes possible logging (hidden behind level 3 verbosity) for each row that flows through `ProcessRowHelper` (which effectively all processors use). The verbosity check itself has non-trivial performance cost in this case. I think it's ok to remove it given that this logging was commented out until ea559df, and I don't recall ever having a desire to see all of the rows. ``` name old time/op new time/op delta Noop/cols=1-24 907µs ± 0% 749µs ± 0% -17.43% (p=0.000 n=9+9) Noop/cols=2-24 906µs ± 0% 748µs ± 0% -17.44% (p=0.000 n=9+10) Noop/cols=4-24 908µs ± 0% 748µs ± 0% -17.64% (p=0.000 n=10+9) Noop/cols=16-24 908µs ± 0% 749µs ± 0% -17.57% (p=0.000 n=10+10) Noop/cols=256-24 911µs ± 0% 751µs ± 0% -17.50% (p=0.000 n=10+10) name old speed new speed delta Noop/cols=1-24 578MB/s ± 0% 700MB/s ± 0% +21.12% (p=0.000 n=9+9) Noop/cols=2-24 1.16GB/s ± 0% 1.40GB/s ± 0% +21.13% (p=0.000 n=9+10) Noop/cols=4-24 2.31GB/s ± 0% 2.80GB/s ± 0% +21.41% (p=0.000 n=10+9) Noop/cols=16-24 9.24GB/s ± 0% 11.20GB/s ± 0% +21.32% (p=0.000 n=10+10) Noop/cols=256-24 147GB/s ± 0% 179GB/s ± 0% +21.22% (p=0.000 n=10+10) name old alloc/op new alloc/op delta Noop/cols=1-24 1.45kB ± 0% 1.45kB ± 0% -0.07% (p=0.000 n=10+10) Noop/cols=2-24 1.46kB ± 0% 1.46kB ± 0% -0.07% (p=0.000 n=10+9) Noop/cols=4-24 1.47kB ± 0% 1.47kB ± 0% -0.07% (p=0.000 n=10+10) Noop/cols=16-24 1.57kB ± 0% 1.57kB ± 0% -0.06% (p=0.000 n=10+10) Noop/cols=256-24 3.49kB ± 0% 3.49kB ± 0% -0.03% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Noop/cols=1-24 5.00 ± 0% 5.00 ± 0% ~ (all equal) Noop/cols=2-24 5.00 ± 0% 5.00 ± 0% ~ (all equal) Noop/cols=4-24 5.00 ± 0% 5.00 ± 0% ~ (all equal) Noop/cols=16-24 5.00 ± 0% 5.00 ± 0% ~ (all equal) Noop/cols=256-24 5.00 ± 0% 5.00 ± 0% ~ (all equal) ``` Addresses: #87685. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

erikgrinaker · 2022-12-02T14:19:16Z

Should we close this out now that 22.2 has been cut?

herkolategan added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Sep 9, 2022

nvanbenschoten added GA-blocker branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 labels Sep 26, 2022

cockroachdb deleted a comment from blathers-crl bot Sep 26, 2022

erikgrinaker mentioned this issue Sep 26, 2022

storage: review 22.2 microbenchmark regressions #88723

Closed

18 tasks

yuzefovich mentioned this issue Oct 4, 2022

catalog/descs: remove an allocation due to logging #89260

Closed

kenliu-crl added the T-testeng TestEng Team label Oct 5, 2022

This was referenced Oct 6, 2022

execinfra: remove possible logging for each output row of processors #89461

Merged

kv: regression on BenchmarkKV/Scan/Native #89545

Closed

sql: regression on EndToEnd/ored-preds #89547

Closed

celiala added the blocks-22.2.0-rc.1 label Oct 17, 2022

celiala added the branch-release-22.2.0 label Oct 20, 2022

srosenberg removed GA-blocker blocks-22.2.0-rc.1 labels Oct 26, 2022

blathers-crl bot mentioned this issue Oct 27, 2022

release-22.2: execinfra: remove possible logging for each output row of processors #90753

Merged

srosenberg closed this as completed Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: 22.2 microbenchmark regression investigation and sign-off #87685

release: 22.2 microbenchmark regression investigation and sign-off #87685

herkolategan commented Sep 9, 2022 •

edited by nicktrav

Loading

erikgrinaker commented Sep 12, 2022

herkolategan commented Sep 13, 2022

mgartner commented Sep 26, 2022

herkolategan commented Sep 26, 2022 •

edited

Loading

herkolategan commented Sep 27, 2022

knz commented Sep 27, 2022

herkolategan commented Sep 28, 2022

yuzefovich commented Oct 4, 2022 •

edited

Loading

yuzefovich commented Oct 4, 2022 •

edited

Loading

blathers-crl bot commented Oct 5, 2022

srosenberg commented Oct 10, 2022

srosenberg commented Oct 10, 2022

nvanbenschoten commented Oct 18, 2022

nicktrav commented Oct 24, 2022

mgartner commented Oct 24, 2022

erikgrinaker commented Dec 2, 2022

release: 22.2 microbenchmark regression investigation and sign-off #87685

release: 22.2 microbenchmark regression investigation and sign-off #87685

Comments

herkolategan commented Sep 9, 2022 • edited by nicktrav Loading

SQL

KV

Bulk I/O

Update

Storage

Misc

Benchmark notes:

Issues:

erikgrinaker commented Sep 12, 2022

herkolategan commented Sep 13, 2022

mgartner commented Sep 26, 2022

herkolategan commented Sep 26, 2022 • edited Loading

herkolategan commented Sep 27, 2022

knz commented Sep 27, 2022

herkolategan commented Sep 28, 2022

yuzefovich commented Oct 4, 2022 • edited Loading

yuzefovich commented Oct 4, 2022 • edited Loading

blathers-crl bot commented Oct 5, 2022

srosenberg commented Oct 10, 2022

Run for 1 second (default)

Run for 5 seconds (i.e., run as many iterations as possible)

Run 5 times (i.e., each run is independent and uses the default duration of 1 second),

Run 5 iterations (i.e., benchmark function is executed exactly b.N == 5 times)

srosenberg commented Oct 10, 2022

nvanbenschoten commented Oct 18, 2022

nicktrav commented Oct 24, 2022

mgartner commented Oct 24, 2022

erikgrinaker commented Dec 2, 2022

herkolategan commented Sep 9, 2022 •

edited by nicktrav

Loading

herkolategan commented Sep 26, 2022 •

edited

Loading

yuzefovich commented Oct 4, 2022 •

edited

Loading

yuzefovich commented Oct 4, 2022 •

edited

Loading

Run 5 iterations (i.e., benchmark function is executed exactly `b.N == 5` times)