Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: 22.2 microbenchmark regression investigation and sign-off #87685

Closed
13 of 16 tasks
herkolategan opened this issue Sep 9, 2022 · 16 comments
Closed
13 of 16 tasks
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-testeng TestEng Team

Comments

@herkolategan
Copy link
Collaborator

herkolategan commented Sep 9, 2022

This is a tracking bug for issues we want to investigate related to performance for 22.1 similar to what's been done for prior releases (#78592).

SQL

benchdiff --new==5d7bbe7 --old=212fc1a --count=10 --post-checkout='make buildshort' --sheets ./pkg/sql/...

spread sheets

NOTE: importer was skipped due to a panic and other errors.

  • pkg/bench
benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/bench

spread sheets

  • pkg/col...
benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/col...

spread sheets

KV

  • ./pkg/kv/kvserver/concurrency
    ./pkg/kv/kvserver/gc
    ./pkg/kv/kvserver/raftentry
    ./pkg/kv/kvserver/rangefeed
    ./pkg/kv/kvserver/spanlatch
    ./pkg/kv/kvserver/tscache
benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/kv/kvserver/concurrency ./pkg/kv/kvserver/gc ./pkg/kv/kvserver/raftentry ./pkg/kv/kvserver/rangefeed ./pkg/kv/kvserver/spanlatch ./pkg/kv/kvserver/tscache

spread sheets

  • ./pkg/gossip/...
    ./pkg/roachpb
    ./pkg/rpc/...
    ./pkg/server/...
benchdiff --new=release-22.2 --old=release-22.1 --count=10 --post-checkout='make buildshort' --sheets ./pkg/gossip/... ./pkg/roachpb ./pkg/rpc/... ./pkg/server/...

spread sheets
NOTE: BenchmarkGRPCDial failed with context_test.go:1936: initial connection heartbeat failed: rpc error: code = Unimplemented desc = unknown service cockroach.rpc.Heartbeat

Bulk I/O

benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/blobs

spread sheets

NOTE: The results are empty because the benches don't suppress stdout causing results to be on subsequent lines
NOTE: Same issue as in #78592.

Update

Storage

  • pkg/ccl/storageccl/...
benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/ccl/storageccl/...

spread sheets

  • pkg/storage/...
benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/storage/...

spread sheets
NOTE: Various names for these Benchmarks have changed between revisions resulting in omission from the spread sheet.

Misc

  • ./pkg/base ./pkg/ccl/utilccl/... ./pkg/ccl/workloadccl/... ./pkg/cli/... ./pkg/clusterversion ./pkg/cmd/... ./pkg/internal/... ./pkg/security ./pkg/settings ./pkg/testutils/... ./pkg/ts/... ./pkg/util/... ./pkg/workload/...
benchdiff --new=5d7bbe7 --old=212fc1a --count=10  --post-checkout='make buildshort' --sheets ./pkg/base ./pkg/ccl/utilccl/... ./pkg/ccl/workloadccl/... ./pkg/cli/... ./pkg/clusterversion ./pkg/cmd/... ./pkg/internal/... ./pkg/security ./pkg/settings ./pkg/testutils/... ./pkg/ts/... ./pkg/workload/...

spread sheets

NOTE: pkg/cli had to be skipped because it failed to build in the old version,

building benchmark binaries for '212fc1a'  14/177 \
fatal: building test binary: exit status 2: # github.com/cockroachdb/cockroach/pkg/cli/clisqlclient
pkg/cli/clisqlclient/conn.go:389:6: fmt.Fprintln arg list ends with redundant newline

Benchmark notes:

  • Even though dev supports setting up the correct go version for a revision, benchdiff does not and still needs to build the benchmark binaries. Thus a post checkout script is required to switch to the appropriate go version for each revision (only applies if the revisions require different go versions).
  • pkg/storage is more resource intensive than the other benchmarks and requires at least 200 GB of available disk space as well as 64 GB memory.
  • pkg/storage also exceed the default 1024 file limit per user, which needs to be adjusted for the benchmark to complete.

Issues:

  • SQL Importer testdata path issues when running from benchdiff (BenchmarkOCFImport, BenchmarkBinaryJSONImport)
  • Segfault in BenchmarkConvertToKVs
  • Regressions in pkg/col/…
  • Regressions in pkg/ccl/storageccl/…
  • Regression in CSVRowsReader-24

Jira issue: CRDB-19475

@herkolategan herkolategan added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Sep 9, 2022
@erikgrinaker
Copy link
Contributor

Don't see anything for ./pkg/storage/ here, is that an oversight?

@herkolategan
Copy link
Collaborator Author

Don't see anything for ./pkg/storage/ here, is that an oversight?

Could be, I will check back on Thursday and run it, if I missed it.

@nvanbenschoten nvanbenschoten added GA-blocker branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 labels Sep 26, 2022
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Sep 26, 2022
@mgartner
Copy link
Collaborator

Did pkg/sql/opt benchmarks get run? I don't see them here.

@herkolategan
Copy link
Collaborator Author

herkolategan commented Sep 26, 2022

Did pkg/sql/opt benchmarks get run? I don't see them here.

I did a quick grep but didn't see it in the output. I'll see what happened to it and update accordingly.

@mgartner It seem the panic I mentioned in the Issues list truncated the run, thus more might be missing from pkg/sql. I'll rerun without the package causing the panic and produce a new sheet with the results. Thanks for pointing it out.

@herkolategan
Copy link
Collaborator Author

@mgartner pkg/sql/opt should now be available in the updated description.

@knz
Copy link
Contributor

knz commented Sep 27, 2022

@herkolategan for the kv stuff
./pkg/gossip/...
./pkg/roachpb
./pkg/rpc/...
./pkg/server/...

I don't see them in the spreadsheet. It looks like the command you used does not include them? I see this command:

benchdiff ... --sheets ./pkg/kv/kvserver/concurrency 
./pkg/kv/kvserver/gc ./pkg/kv/kvserver/raftentry 
./pkg/kv/kvserver/rangefeed ./pkg/kv/kvserver/spanlatch ./pkg/kv/kvserver/tscache

which is the one that contains pkg/gossip, pkg/roachpb, pkg/rpc and pkg/server?

@herkolategan
Copy link
Collaborator Author

@knz Right, that doesn't make sense, I should have verified the template I used more thoroughly.
Added a command for those and added the spread sheets.
I did see a failure and the spread sheet does look a bit light, so let me know if anything is up or missing.

cat out.2022-09-28T10_28_26Z | grep pkg/ | uniq
pkg: github.com/cockroachdb/cockroach/pkg/gossip
pkg: github.com/cockroachdb/cockroach/pkg/roachpb
pkg: github.com/cockroachdb/cockroach/pkg/server

@yuzefovich
Copy link
Member

yuzefovich commented Oct 4, 2022

Signing off on pkg/col regressions - I couldn't reproduce them manually, and no relevant changes that would affect those come to mind.

Signing off on pkg/bench regressions - most of them aren't reproducible (or less severe) when run with longer benchtime parameter. Some regressions have been confirmed (1, 2, 3), and there have been some fixes (#88614, #89317).

Signing off on pkg/sql/... regressions. The most severe regressions are a problem with the benchmark (fixed by #88944), GetZoneConfig regression is tracked in #89410, join reader regressions are expected. There is a fix in #89461 to some of the row-by-row processor regressions, WriteTextColumnarFloat is due to a fix in b43ed2b. Regressions in BenchmarkKV/Scan/Native are tracked in #89545, in BenchmarkEndToEnd in #89547. Other regressions don't seem concerning to me.

#89410 and #89545 are marked as GA blockers until further triage.

@yuzefovich
Copy link
Member

yuzefovich commented Oct 4, 2022

What do people think about running the microbenchmarks with longer -test.benchtime parameter? It looks like benchdiff tool doesn't specify that, so the default of 1s is used. That can be problematic when a single run of the benchmark takes more than a second since it could introduce a lot of variance to the running times.

For example, I have been looking at a possible regression in one of the benchmarks from the sheet in this issue (from pkg/bench):

WideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-24     2.21s ±18%     2.94s ±25%   +33.10%  (p=0.002 n=10+10)

Here a single run takes on the order of 2-3 seconds, so there is quite a bit of variance, and there appear to be a significant regression. Once I gave 10s of bench time to each run (I was using scripts/bench with 10 runs each, with each run taking at least 10s):

WideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-24     2.21s ± 8%     2.15s ± 7%     ~     (p=0.136 n=9+9)

We see that the confidence in these numbers have increased and there is no noticeable regressions. I used the same SHAs for both runs with the only difference that the latter had -benchtime=10s addition to scripts/bench.

I'm proposing the following adjustment for the next time we do this:

  • first, run all of the benchmarks with -test.benchtime=1x to make sure that everything compiles and that all expected benchmarks are in fact run
  • then, during the second actual run do -test.benchtime=5s (or possibly longer) to eliminate some sources of variance in the benchmarks.

One might argue that it's a poor benchmark choice if a single iteration takes more than a second to run, but we have what we have. Thoughts? cc @nvanbenschoten @srosenberg

@kenliu-crl kenliu-crl added the T-testeng TestEng Team label Oct 5, 2022
@blathers-crl
Copy link

blathers-crl bot commented Oct 5, 2022

cc @cockroachdb/test-eng

@srosenberg
Copy link
Member

One might argue that it's a poor benchmark choice if a single iteration takes more than a second to run, but we have what we have. Thoughts? cc @nvanbenschoten @srosenberg

Right. This particular benchmark is hardly a micro one. It definitely makes sense to run more iterations, but maybe we should increase the number of iterations instead of duration? or both? This way we could at least ensure there is a lower bound on the number of iterations. Note that even increasing the number of iterations yields a couple of different options, e.g.,

Run for 1 second (default)

./dev bench pkg/bench --filter=BenchmarkWideTable --bench-mem --ignore-cache -v --timeout 5m

BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1415921500 ns/op	6867795496 B/op	  110127 allocs/op

Run for 5 seconds (i.e., run as many iterations as possible)

./dev bench pkg/bench --filter=BenchmarkWideTable --bench-mem --ignore-cache -v --bench-time=5s --timeout 5m

BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       5	1274804042 ns/op	4475226016 B/op	  513871 allocs/op

Run 5 times (i.e., each run is independent and uses the default duration of 1 second),

 ./dev bench pkg/bench --filter=BenchmarkWideTable --bench-mem --ignore-cache -v --count=5 --timeout 5m

BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1524094833 ns/op	5113291336 B/op	  554734 allocs/op
BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1385707083 ns/op	5433097440 B/op	  653375 allocs/op
BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1450949708 ns/op	5397546736 B/op	  620576 allocs/op
BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1393761625 ns/op	5985737768 B/op	  771592 allocs/op
BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       1	1008088250 ns/op	4883512936 B/op	  106154 allocs/op

Run 5 iterations (i.e., benchmark function is executed exactly b.N == 5 times)

./dev bench pkg/bench --filter=BenchmarkWideTable --bench-mem --ignore-cache -v --bench-time=5x --timeout 5m

BenchmarkWideTable/MultinodeCockroach/count=10/bigColumnBytes=1000000-10    	       5	1430362492 ns/op	5136016025 B/op	  199160 allocs/op

@srosenberg
Copy link
Member

Clearly, there isn't a "magic" option which will result in lower variance. Microbenchmarks are notorious for having high variance. E.g., in [1], the authors surveyed a large number of microbenchmarks executed in the cloud; their conclusion was to use at least 20 instances (i.e., VMs) to find slowdowns with high confidence,

In 77–83% of the time, a slowdown below 10% is reliably detectable when using trial-based sampling and 20 instances.

So, we should aim to increase the number of independent runs as well as iterations. @herkolategan Let's capture these notes in the umbrella issue for automating bench runs in CI.

[1] https://www.ifi.uzh.ch/dam/jcr:326f9543-d719-4e8c-a27e-d5a5cace1abf/emse_smb_cloud.pdf

@nvanbenschoten
Copy link
Member

Signing off on all KV regressions. The interesting, reproducible regressions identified here were resolved by a combination of #89841, #89545, and #88911.

@nicktrav
Copy link
Collaborator

Signing off on the pkg/storage regressions. The last remaining issue on #88723 is under review with folks from KV / KV-repl.

@mgartner
Copy link
Collaborator

Signing off on all pkg/sql regressions. The only unsolved regression is #89547 which is a relatively minor regression for a very specific edge case. We will explore the regression further in the future.

craig bot pushed a commit that referenced this issue Oct 27, 2022
89461: execinfra: remove possible logging for each output row of processors r=yuzefovich a=yuzefovich

This commit removes possible logging (hidden behind level 3 verbosity) for each row that flows through `ProcessRowHelper` (which effectively all processors use). The verbosity check itself has non-trivial performance cost in this case. I think it's ok to remove it given that this logging was commented out until ea559df, and I don't recall ever having a desire to see all of the rows.

```
name              old time/op    new time/op     delta
Noop/cols=1-24       907µs ± 0%      749µs ± 0%  -17.43%  (p=0.000 n=9+9)
Noop/cols=2-24       906µs ± 0%      748µs ± 0%  -17.44%  (p=0.000 n=9+10)
Noop/cols=4-24       908µs ± 0%      748µs ± 0%  -17.64%  (p=0.000 n=10+9)
Noop/cols=16-24      908µs ± 0%      749µs ± 0%  -17.57%  (p=0.000 n=10+10)
Noop/cols=256-24     911µs ± 0%      751µs ± 0%  -17.50%  (p=0.000 n=10+10)

name              old speed      new speed       delta
Noop/cols=1-24     578MB/s ± 0%    700MB/s ± 0%  +21.12%  (p=0.000 n=9+9)
Noop/cols=2-24    1.16GB/s ± 0%   1.40GB/s ± 0%  +21.13%  (p=0.000 n=9+10)
Noop/cols=4-24    2.31GB/s ± 0%   2.80GB/s ± 0%  +21.41%  (p=0.000 n=10+9)
Noop/cols=16-24   9.24GB/s ± 0%  11.20GB/s ± 0%  +21.32%  (p=0.000 n=10+10)
Noop/cols=256-24   147GB/s ± 0%    179GB/s ± 0%  +21.22%  (p=0.000 n=10+10)

name              old alloc/op   new alloc/op    delta
Noop/cols=1-24      1.45kB ± 0%     1.45kB ± 0%   -0.07%  (p=0.000 n=10+10)
Noop/cols=2-24      1.46kB ± 0%     1.46kB ± 0%   -0.07%  (p=0.000 n=10+9)
Noop/cols=4-24      1.47kB ± 0%     1.47kB ± 0%   -0.07%  (p=0.000 n=10+10)
Noop/cols=16-24     1.57kB ± 0%     1.57kB ± 0%   -0.06%  (p=0.000 n=10+10)
Noop/cols=256-24    3.49kB ± 0%     3.49kB ± 0%   -0.03%  (p=0.000 n=10+10)

name              old allocs/op  new allocs/op   delta
Noop/cols=1-24        5.00 ± 0%       5.00 ± 0%     ~     (all equal)
Noop/cols=2-24        5.00 ± 0%       5.00 ± 0%     ~     (all equal)
Noop/cols=4-24        5.00 ± 0%       5.00 ± 0%     ~     (all equal)
Noop/cols=16-24       5.00 ± 0%       5.00 ± 0%     ~     (all equal)
Noop/cols=256-24      5.00 ± 0%       5.00 ± 0%     ~     (all equal)
```

Addresses: #87685.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@erikgrinaker
Copy link
Contributor

Should we close this out now that 22.2 has been cut?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-testeng TestEng Team
Projects
None yet
Development

No branches or pull requests

10 participants