Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: 22.1 microbenchmark regression investigation and sign-off #78592

Closed
6 of 7 tasks
srosenberg opened this issue Mar 28, 2022 · 22 comments
Closed
6 of 7 tasks

release: 22.1 microbenchmark regression investigation and sign-off #78592

srosenberg opened this issue Mar 28, 2022 · 22 comments
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@srosenberg
Copy link
Member

srosenberg commented Mar 28, 2022

This is a tracking bug for issues we want to investigate related to performance for 21.2 similar to what's been done for prior releases (#71820).

SQL

pkg/sql/...

benchdiff --new==574b6ac --old=89530d8 --count=10  --post-checkout='make buildshort' --sheets ./pkg/sql...

spread sheets

pkg/bench

benchdiff --new=d54f5b78be --old=e179238221 --count=10  --post-checkout='make buildshort' --sheets ./pkg/bench

spread sheets

NOTE: BenchmarkSQL//InsertFK has been skipped since it occasionally gets stuck on the old sha.

pkg/col...

benchdiff --new=e179238 --old=d54f5b7 --count=10  --post-checkout='make buildshort' --sheets ./pkg/col...

spread sheets

KV

./pkg/gossip/...
./pkg/roachpb
./pkg/rpc/...
./pkg/server/...

benchdiff --new=99972ef --old=1831ab7 --count=10  --post-checkout='make buildshort' --sheets ./pkg/kv/kvserver/concurrency ./pkg/kv/kvserver/gc ./pkg/kv/kvserver/raftentry ./pkg/kv/kvserver/rangefeed ./pkg/kv/kvserver/spanlatch ./pkg/kv/kvserver/tscache

spread sheets

Bulk I/O

benchdiff --new=99972ef --old=1831ab7 --count=10  --post-checkout='make buildshort' --sheets ./pkg/blobs

spread sheets

NOTE: results are empty because the benches don't suppress stdout causing results to be on subsequent lines

Storage

pkg/ccl/storageccl/...

benchdiff --new=99972ef --old=1831ab7 --count=10  --post-checkout='make buildshort' --sheets ./pkg/ccl/storageccl/...

spread sheets

Misc

./pkg/base ./pkg/ccl/utilccl/... ./pkg/ccl/workloadccl/... ./pkg/cli/... ./pkg/clusterversion ./pkg/cmd/... ./pkg/internal/... ./pkg/security ./pkg/settings ./pkg/testutils/... ./pkg/ts/... ./pkg/util/... ./pkg/workload/...

benchdiff --new=99972ef --old=1831ab7 --count=10  --post-checkout='make buildshort' --sheets ./pkg/base ./pkg/ccl/utilccl/... ./pkg/ccl/workloadccl/... ./pkg/cli/... ./pkg/clusterversion ./pkg/cmd/... ./pkg/internal/... ./pkg/security ./pkg/settings ./pkg/testutils/... ./pkg/ts/... ./pkg/workload/...

spread sheets

NOTE: pkg/util had to be skipped because it failed to build in the old version,

building benchmark binaries for '1831ab7' 160/257 |
fatal: building test binary: exit status 2: # github.com/cockroachdb/cockroach/pkg/util/goschedstats.test
go.info.runtime.sched: relocation target go.info.github.com/cockroachdb/cockroach/pkg/util/goschedstats.schedt not defined
go.info.runtime.allp: relocation target go.info.[]*github.com/cockroachdb/cockroach/pkg/util/goschedstats.p not defined

Issues:

Jira issue: CRDB-14191

@srosenberg srosenberg added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 28, 2022
@srosenberg
Copy link
Member Author

srosenberg commented Mar 28, 2022

Missing Benchmark Results

Results of many (as in hundreds) benches both in 21.2 and 22.1 could not be correctly parsed by benchstat because they are not suppressing stdout.
E.g., running BenchmarkStreamingReadFile (pkg/blobs) yields the following output,

BenchmarkStreamingReadFile-24           I220327 00:52:35.394408 141 2@rpc/context.go:1083  [-] 5  dialing n2: 127.0.0.1:38103 (default)
I220327 00:52:35.573020 198 2@rpc/context.go:710  [-] 6  closing [{n2: 127.0.0.1:38103 (default)} {n0: 127.0.0.1:38103 (default)}]
I220327 00:52:35.582694 141 2@rpc/context.go:1083  [-] 7  dialing n2: 127.0.0.1:44817 (default)
I220327 00:52:36.692774 1520 2@rpc/context.go:710  [-] 8  closing [{n2: 127.0.0.1:44817 (default)}]
I220327 00:52:36.692897 1575 2@rpc/context.go:710  [-] 9  closing [{n2: 127.0.0.1:44817 (default)} {n0: 127.0.0.1:44817 (default)}]
     445           2512331 ns/op          52.58 MB/s     1898941 B/op       3800 allocs/op
I220327 00:52:36.703170 9512 2@rpc/context.go:1083  [-] 10  dialing n2: 127.0.0.1:42187 (default)
I220327 00:52:36.706679 9814 2@rpc/context.go:710  [-] 11  closing [{n2: 127.0.0.1:42187 (default)}]
I220327 00:52:36.706676 9821 2@rpc/context.go:710  [-] 12  closing [{n2: 127.0.0.1:42187 (default)} {n0: 127.0.0.1:42187 (default)}]
find benchdiff/1831ab7 -name "out*"|xargs grep -E "Benchmark.*-24" |grep -v allocs|awk -F":" '{print $2}'|awk '{print $1}'|sort -u >~/missing_benches_1831ab7.txt
find benchdiff/99972ef -name "out*"|xargs grep -E "Benchmark.*-24" |grep -v allocs|awk -F":" '{print $2}'|awk '{print $1}'|sort -u >~/missing_benches_99972ef.txt

missing_benches_1831ab7.txt
missing_benches_99972ef.txt

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Mar 28, 2022

Thanks for running these benchmarks!

The logging can be suppressed by adding this to the start of the benchmarks:

func BenchmarkStreamingReadFile(b *testing.B) {
	defer log.Scope(b).Close(b)

This still outputs a couple of lines, but they should be easy enough to grep out:

    /home/erik/go/src/github.com/cockroachdb/cockroach/pkg/blobs/test_log_scope.go:79: test logs captured to: /tmp/logBenchmarkStreamingReadFile1452856895
    /home/erik/go/src/github.com/cockroachdb/cockroach/pkg/blobs/test_log_scope.go:80: use -show-logs to present logs inline
    /home/erik/go/src/github.com/cockroachdb/cockroach/pkg/blobs/bench_test.go:85: -- test log scope end --

@erikgrinaker erikgrinaker added C-performance Perf of queries or internals. Solution not expected to change functional behavior. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Mar 28, 2022
@michae2
Copy link
Collaborator

michae2 commented Mar 28, 2022

The regressions in Append/bytes and Copy/bytes were known consequences of #76202 which we traded for faster unordered distinct and top-k operations over bytes. (In absolute numbers the speeds of append and copy are still in GB/sec.)

@mgartner
Copy link
Collaborator

The optimizer time/op regressions Phases/many-columns-and-indexes... are mostly due to #72798, which was an intentional tradeoff to reduce allocations (see #72733).

@mgartner
Copy link
Collaborator

mgartner commented Mar 28, 2022

The optimizer alloc/op regressions like Phases/many-columns-and-indexes-a/Prepared/Explore are mainly from two changes:

  • sql, opt: add AvgSize stat to scan row cost and session setting to enable/disable the old cost methodology #74551
    • RequestColStatTable accounts for ~90% of the increase in allocations. It makes allocations via ColStatsMap.Add and ColStatsMap.addToIndex. I think the main difference is that column stats are now requested for every column when rowScanCost is called for the primary index. Before the change, column stats were never requested for columns that were not referenced by a filter. In the many-columns-and-indexes-a test, the table has ~25 columns but only filters by a single one, so we see a big increase in allocations from before, allocating colstats for a single column, to now, allocating colstats for all 25 columns. I don't think this will likely regress performance for production workloads. If it did, it would only affect queries on tables with many non-filtered columns. cc @rharding6373
  • opt: make index names in test catalog consistent with real catalog #71818
    • This is a test-only change to the test catalog. It will not impact the performance of any production workloads. That being said, I think there's an easy way to reduce allocations here, so I'll do it anyway to keep future profiles trimmed down.
    • EDIT: These allocations are not actually included in the benchmark measurements. But they do clutter the memory profile, so I will still address them.

@srosenberg
Copy link
Member Author

srosenberg commented Mar 29, 2022

The logging can be suppressed by adding this to the start of the benchmarks:

func BenchmarkStreamingReadFile(b *testing.B) {
	defer log.Scope(b).Close(b)

Thanks @erikgrinaker, this works! The extra stdout is actually appended after the benchmark results (b.output in processBench),

go test -c -o blobs_bench ./pkg/blobs
./blobs_bench -test.run - -test.bench . -test.benchmem

goos: linux
goarch: amd64
pkg: github.com/cockroachdb/cockroach/pkg/blobs
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
BenchmarkStreamingReadFile-24                454           2390463 ns/op          55.26 MB/s     1856522 B/op       2972 allocs/op
--- BENCH: BenchmarkStreamingReadFile-24
    test_log_scope.go:79: test logs captured to: /tmp/logBenchmarkStreamingReadFile2716458423
    test_log_scope.go:80: use -show-logs to present logs inline
    bench_test.go:85: -- test log scope end --
    test_log_scope.go:79: test logs captured to: /tmp/logBenchmarkStreamingReadFile1057954318
    test_log_scope.go:80: use -show-logs to present logs inline
    bench_test.go:85: -- test log scope end --
    test_log_scope.go:79: test logs captured to: /tmp/logBenchmarkStreamingReadFile2640029030
    test_log_scope.go:80: use -show-logs to present logs inline
    bench_test.go:85: -- test log scope end --
BenchmarkStreamingWriteFile-24               540           2123551 ns/op          62.21 MB/s     1846070 B/op       2494 allocs/op
--- BENCH: BenchmarkStreamingWriteFile-24
    test_log_scope.go:79: test logs captured to: /tmp/logBenchmarkStreamingWriteFile2486312465
    test_log_scope.go:80: use -show-logs to present logs inline
    bench_test.go:145: -- test log scope end --
    test_log_scope.go:79: test logs captured to: /tmp/logBenchmarkStreamingWriteFile1039338908
    test_log_scope.go:80: use -show-logs to present logs inline
    bench_test.go:145: -- test log scope end --
    test_log_scope.go:79: test logs captured to: /tmp/logBenchmarkStreamingWriteFile3731698165
    test_log_scope.go:80: use -show-logs to present logs inline
    bench_test.go:145: -- test log scope end --
PASS

Note, TestLogScope redirects to temp. files; the additional IO might end up skewing benchmark results. I didn't find a direct way to emulate /dev/null. Forcing nil in the first arg to getTestConfig [1] ends up redirecting everything to stderr and setting Stderr.Filter = severity.NONE [2] which effectively disables IO due to logging.

My other dilemma is what to do with all the missing benchmarks because of the unsuppressed logging? Should we instrument them with defer log.Scope(b).Close(b)?

[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/util/log/test_log_scope.go#L161
[2] https://github.com/cockroachdb/cockroach/blob/master/pkg/util/log/test_log_scope.go#L282

@erikgrinaker
Copy link
Contributor

Note, TestLogScope redirects to temp. files; the additional IO might end up skewing benchmark results. I didn't find a direct way to emulate /dev/null. Forcing nil in the first arg to getTestConfig [1] ends up redirecting everything to stderr and setting Stderr.Filter = severity.NONE [2] which effectively disables IO due to logging.

I see. Seems reasonable to change it to pass nil when running benchmarks, but with an option to override it for debugging -- or maybe just removing the log.Scope().Close() call might be sufficient as an escape hatch.

My other dilemma is what to do with all the missing benchmarks because of the unsuppressed logging? Should we instrument them with defer log.Scope(b).Close(b)?

Yeah, I think in general all integration tests and benchmarks should use defer log.Scope(b).Close(b). I guess we don't lint this because it's unnecessary for smaller unit-tests. Do you mind submitting a PR for the worst offenders?

@petermattis
Copy link
Collaborator

It used to be the case that you could run benchmarks with -logtostderr=NONE to silence logging. Does that no longer work? If it doesn't, can we fix -logtostderr=NONE to work again?

@yuzefovich
Copy link
Member

I fixed up the crash on pkg/col/colserde benchmarks on 21.2 branch and updated the spreadsheets. As Michael mentioned above, all regressions are expected and deemed acceptable in pkg/col/....

@mgartner
Copy link
Collaborator

It used to be the case that you could run benchmarks with -logtostderr=NONE to silence logging.

@petermattis That no longer works. See the discussion in #57979.

craig bot pushed a commit that referenced this issue Mar 29, 2022
78704: sql: propagate limit for top K sort correctly in tests r=yuzefovich a=yuzefovich

In 22.1 time frame we started propagating the value of K for top K sort
in the spec of the processor, and not in the post-processing spec, but
we forgot to update some of the tests accordingly.

Informs: #78592.

Release note: None

78949: kvserver: gossip l0sublevels instead of read amp r=kvoli a=kvoli

Previously read amplification was gossipped among stores to enable
future allocation decisions that would avoid candidates with high read
amplification. L0 Sublevels represents the number of levels with L0 and
is a portion of read amplification. This patch change read amplification
to l0 sublevels, as it is a better indicator of store health.

Release justification: low risk, replace deprecated gossip signal.

Release note: None

78984: sql: deflake unique logic test r=mgartner a=mgartner

PR #78685 changes the query plan of one query in a logic test in a way
that makes the test flakey. This commit guarantees that the test cannot
be flakey, regardless of the query plan.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
craig bot pushed a commit that referenced this issue Mar 30, 2022
78703: colexecop: reset the internal batch in a test operator r=yuzefovich a=yuzefovich

This commit makes a test operator that "owns" a batch to properly reset
the batch on each `Next` call. This is the expectation that we forgot to
follow.

This was discovered when running `BenchmarkLikeOps` which uses a Bytes
vector. Due to the way `Bytes.Copy` is implemented, the Bytes vector's
buffer could grow arbitrarily large because the test operator never
reset the vector.

Informs: #78592.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@petermattis
Copy link
Collaborator

It used to be the case that you could run benchmarks with -logtostderr=NONE to silence logging.

@petermattis That no longer works. See the discussion in #57979.

That stinks. I only skimmed #57979, but couldn't catch the reason that a flag to globally disable logging is infeasible. Such a flag seems preferable to sprinkling def log.Scope(b).Close(b) statements everywhere as it is one-time effort to build, rather than an ongoing burden.

@knz
Copy link
Contributor

knz commented Mar 30, 2022

Have you tried -test-log-config='file-defaults: dir: ""'?

this should disable file creation

@knz
Copy link
Contributor

knz commented Mar 30, 2022

to disable entirely, maybe something like -test-log-config='file-defaults: {filter: NONE, dir: ""}'

@knz
Copy link
Contributor

knz commented Mar 30, 2022

to remove the stderr logging, -test-log-config='sinks: {stderr: {filter: NONE}}' this is equivalent to -logtostderr=NONE

@petermattis
Copy link
Collaborator

to remove the stderr logging, -test-log-config='sinks: {stderr: {filter: NONE}} this is equivalent to -logtostderr=NONE

Can there be a short-hand added for that -test-log-config setting? It isn't particularly memorable or easy to type.

@knz
Copy link
Contributor

knz commented Mar 30, 2022

Can there be a short-hand

i suppose there can but what is the desired behavior?

@petermattis
Copy link
Collaborator

i suppose there can but what is the desired behavior?

Disabling all logging to stderr. When running microbenchmarks such logging isn't needed. Or perhaps disable all logging below FATAL.

@knz
Copy link
Contributor

knz commented Mar 30, 2022

okay but the convo above mentioned/started with file/directory creation. should this be disabledd too?

@petermattis
Copy link
Collaborator

okay but the convo above mentioned/started with file/directory creation. should this be disabledd too?

Yes. Good to avoid any disk IO or other activity which can skew the benchmarks.

@yuzefovich
Copy link
Member

I rerun ./pkg/bench benchmarks by skipping BenchmarkSQL//InsertFK which occasionally gets stuck. The issue has been updated above accordingly.

@srosenberg
Copy link
Member Author

srosenberg commented Mar 31, 2022

Have you tried -test-log-config='file-defaults: dir: ""'?

this should disable file creation

It doesn't work because ApplyConfig is done in init [1] (flags.go) but flags.Parse() isn't called until after main invokes m.Run(). Furthermore, skip.UnderBench [2] doesn't hold until after flags.Parse.
However, with a slight modification we're back in business,

func TestMain(m *testing.M) {
	flag.Parse()
	log.Init()
	os.Exit(m.Run())
}

Init just reruns [1] (expected to be idempotent) after flag.Parse() to affect the value of test-log-config from the CLI.

Thus, by including the above TestMain in pkg/blobs/main_test.go we get the desired default behavior, namely no output to stderr (including no file redirection),

./blobs_bench -test.run - -test.bench . -test.benchmem

goos: linux
goarch: amd64
pkg: github.com/cockroachdb/cockroach/pkg/blobs
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
BenchmarkStreamingReadFile-24     	     446	   2489233 ns/op	  53.07 MB/s	 1926685 B/op	    2897 allocs/op
BenchmarkStreamingWriteFile-24    	     523	   2265356 ns/op	  58.31 MB/s	 1898296 B/op	    2508 allocs/op

If we really want to enable logging (original case),

./blobs_bench -test.run - -test.bench . -test.benchmem -test-log-config='sinks: {stderr: {filter: INFO}}'

I220331 02:12:08.275730 66 2@rpc/context.go:1217  [rnode=2,raddr=127.0.0.1:33099,class=default] 1  dialing
I220331 02:12:08.282876 7 2@rpc/context.go:813  [-] 2  closing [{n2: 127.0.0.1:33099 (default)}]
I220331 02:12:08.282999 7 2@rpc/context.go:813  [-] 3  closing [{n0: 127.0.0.1:33099 (default)}]
I220331 02:12:08.283054 14 2@rpc/context.go:813  [-] 4  closing [{n2: 127.0.0.1:33099 (default)} {n0: 127.0.0.1:33099 (default)}]
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/cockroach/pkg/blobs
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
BenchmarkStreamingReadFile-24     	I220331 02:12:08.292282 130 2@rpc/context.go:1217  [rnode=2,raddr=127.0.0.1:37983,class=default] 5  dialing
W220331 02:12:08.310693 153 2@rpc/clock_offset.go:216  [rnode=2,raddr=127.0.0.1:37983,class=default,heartbeat] 6  latency jump (prev avg 0.19ms, current 1.02ms)
I220331 02:12:08.528765 146 2@rpc/context.go:813  [-] 7  closing [{n2: 127.0.0.1:37983 (default)}]
I220331 02:12:08.528843 153 2@rpc/context.go:813  [-] 8  closing [{n2: 127.0.0.1:37983 (default)} {n0: 127.0.0.1:37983 (default)}]
I220331 02:12:08.537100 130 2@rpc/context.go:1217  [rnode=2,raddr=127.0.0.1:40335,class=default] 9  dialing
W220331 02:12:08.550263 1616 2@rpc/clock_offset.go:216  [rnode=2,raddr=127.0.0.1:40335,class=default,heartbeat] 10  latency jump (prev avg 0.20ms, current 1.04ms)
I220331 02:12:09.691375 1609 2@rpc/context.go:813  [-] 11  closing [{n2: 127.0.0.1:40335 (default)}]
I220331 02:12:09.691463 1616 2@rpc/context.go:813  [-] 12  closing [{n2: 127.0.0.1:40335 (default)} {n0: 127.0.0.1:40335 (default)}]
     456	   2548062 ns/op	  51.84 MB/s	 1981002 B/op	    2978 allocs/op
I220331 02:12:09.701654 8775 2@rpc/context.go:1217  [rnode=2,raddr=127.0.0.1:45095,class=default] 13  dialing
I220331 02:12:09.705618 8836 2@rpc/context.go:813  [-] 14  closing [{n2: 127.0.0.1:45095 (default)}]
I220331 02:12:09.705690 8843 2@rpc/context.go:813  [-] 15  closing [{n2: 127.0.0.1:45095 (default)} {n0: 127.0.0.1:45095 (default)}]
BenchmarkStreamingWriteFile-24    	I220331 02:12:09.714607 8733 2@rpc/context.go:1217  [rnode=2,raddr=127.0.0.1:35053,class=default] 16  dialing
I220331 02:12:09.942947 8846 2@rpc/context.go:813  [-] 17  closing [{n2: 127.0.0.1:35053 (default)}]
I220331 02:12:09.943045 8846 2@rpc/context.go:813  [-] 18  closing [{n0: 127.0.0.1:35053 (default)}]
I220331 02:12:09.943069 8901 2@rpc/context.go:813  [-] 19  closing [{n2: 127.0.0.1:35053 (default)} {n0: 127.0.0.1:35053 (default)}]
I220331 02:12:09.953203 8733 2@rpc/context.go:1217  [rnode=2,raddr=127.0.0.1:38803,class=default] 20  dialing
W220331 02:12:10.031133 10246 2@rpc/clock_offset.go:216  [rnode=2,raddr=127.0.0.1:38803,class=default,heartbeat] 21  latency jump (prev avg 0.21ms, current 1.00ms)
I220331 02:12:11.025610 10175 2@rpc/context.go:813  [-] 22  closing [{n2: 127.0.0.1:38803 (default)}]
I220331 02:12:11.025704 10246 2@rpc/context.go:813  [-] 23  closing [{n2: 127.0.0.1:38803 (default)} {n0: 127.0.0.1:38803 (default)}]
     507	   2129050 ns/op	  62.04 MB/s	 1888216 B/op	    2408 allocs/op

If the above sounds reasonable, I'll send a PR to add the missing TestMain and expose the proper (log) init. sequence.

[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/util/log/flags.go#L56-L64
[2] https://github.com/cockroachdb/cockroach/blob/master/pkg/util/log/test_log_scope.go#L276

@yuzefovich
Copy link
Member

I reran pkg/sql/... benchmark with the latest benchmark fixes on 22.1 and updated the spreadsheet, now we have much fewer regressions to look at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

8 participants