Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: reduce writer allocations during index flushing #1379

Merged
merged 2 commits into from
Nov 23, 2021

Conversation

dt
Copy link
Member

@dt dt commented Nov 20, 2021

While working on a separate change that included passing a large number
of pre-generated blocks all to addIndexEntry in a tight loop, some of
the allocations it was doing per block flushed became much more apparent in
profiles. By batch allocating keys used as separators, and by adjusting
the buffering of filled sub-index blocks to store their finished byte
representation (also batch allocated) with relevant details, rather than
the whole block writer, that blockWriter can instead be reused, which
importantly allows reuse of its allocated buffers eliminating a significant source of
allocations.

There overall performance impact to Writer is negligible but was more pronounced
in another change that flushes blocks in a tight loop.

name                                                          old time/op    new time/op    delta
Writer/block=4.0_K/filter=false/compression=NoCompression-10    50.7ms ± 0%    51.5ms ± 4%      ~     (p=0.792 n=5+6)
Writer/block=4.0_K/filter=false/compression=Snappy-10           62.8ms ± 2%    62.5ms ± 3%      ~     (p=0.310 n=6+6)
Writer/block=32_K/filter=false/compression=NoCompression-10     48.9ms ± 1%    48.8ms ± 1%      ~     (p=1.000 n=6+6)
Writer/block=32_K/filter=false/compression=Snappy-10            60.3ms ± 4%    59.0ms ± 5%      ~     (p=0.310 n=6+6)

name                                                          old speed      new speed      delta
Writer/block=4.0_K/filter=false/compression=NoCompression-10   756MB/s ± 0%   744MB/s ± 4%      ~     (p=0.792 n=5+6)
Writer/block=4.0_K/filter=false/compression=Snappy-10          156MB/s ± 2%   156MB/s ± 3%      ~     (p=0.310 n=6+6)
Writer/block=32_K/filter=false/compression=NoCompression-10    773MB/s ± 1%   774MB/s ± 1%      ~     (p=1.000 n=6+6)
Writer/block=32_K/filter=false/compression=Snappy-10           113MB/s ± 4%   115MB/s ± 4%      ~     (p=0.310 n=6+6)

name                                                          old alloc/op   new alloc/op   delta
Writer/block=4.0_K/filter=false/compression=NoCompression-10    1.18MB ± 0%    0.80MB ± 0%   -32.14%  (p=0.002 n=6+6)
Writer/block=4.0_K/filter=false/compression=Snappy-10           1.20MB ± 0%    0.83MB ± 0%   -31.08%  (p=0.002 n=6+6)
Writer/block=32_K/filter=false/compression=NoCompression-10      252kB ± 0%     952kB ± 0%  +278.61%  (p=0.002 n=6+6)
Writer/block=32_K/filter=false/compression=Snappy-10             456kB ± 0%    1157kB ± 0%  +153.55%  (p=0.002 n=6+6)

name                                                          old allocs/op  new allocs/op  delta
Writer/block=4.0_K/filter=false/compression=NoCompression-10     10.7k ± 0%      0.1k ± 0%   -99.03%  (p=0.002 n=6+6)
Writer/block=4.0_K/filter=false/compression=Snappy-10            10.7k ± 0%      0.1k ± 0%   -98.97%  (p=0.002 n=6+6)
Writer/block=32_K/filter=false/compression=NoCompression-10      1.27k ± 0%     0.10k ± 0%   -92.20%  (p=0.002 n=6+6)
Writer/block=32_K/filter=false/compression=Snappy-10             1.27k ± 0%     0.10k ± 0%   -91.84%  (p=0.002 n=6+6)

This extends BenchmarkWriter to test block sizes as well as compressions, and
generates names based on the parameters. Additionally it Close()'s the Writer
since some writes are deferred until Close but should be included in the overall
benchmark numbers. Finally, the discardWriter is extended to include a written
byte counter to provide MB/s metrics to the benchmark.
@dt dt requested a review from jbowens November 20, 2021 05:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt force-pushed the writer-index-reallocs branch from 9a3bb51 to fc14850 Compare November 20, 2021 06:09
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @dt)


sstable/writer.go, line 536 at r2 (raw file):

			w.indexBlockAlloc = make([]byte, len(bk)*16)
		} else {
			w.indexBlockAlloc = make([]byte, len(bk)*32)

how deliberate or empirical is the choice of these constants and the splitting of these two cases? wonder if it's worth the special casing, or if it is, we should do something adaptive (eg, 2 × the size of our previous indexBlockAlloc allocation)

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens)


sstable/writer.go, line 536 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

how deliberate or empirical is the choice of these constants and the splitting of these two cases? wonder if it's worth the special casing, or if it is, we should do something adaptive (eg, 2 × the size of our previous indexBlockAlloc allocation)

Oh, I'm not actually sure I meant to leave the two cases here. I might just make it a constant 16 for now?

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @dt)


sstable/writer.go, line 536 at r2 (raw file):

Previously, dt (David Taylor) wrote…

Oh, I'm not actually sure I meant to leave the two cases here. I might just make it a constant 16 for now?

sgtm

While working on a separate change that included passing a large number
of pre-generated blocks all to addIndexEntry in a tight loop, some of
the allocations it was doing per block flushed became more apparent in
profiles. By batch allocating keys used as separators, and by adjusting
the buffering of filled sub-index blocks to store there finished byte
representation (also batch allocated) and relevant details, rather than
the whole block writer used for that sub-index block, that blockWriter
can instead be reused, which importantly allows reuse of its allocated
buffers for things like restarts, eliminating a significant source of
allocations for when flushing many sub-index blocks.

```
name                                                          old time/op    new time/op    delta
Writer/block=4.0_K/filter=false/compression=NoCompression-10    50.4ms ± 0%    50.4ms ± 1%      ~     (p=0.730 n=4+5)
Writer/block=4.0_K/filter=false/compression=Snappy-10           62.6ms ± 2%    62.1ms ± 1%      ~     (p=0.222 n=5+5)
Writer/block=32_K/filter=false/compression=NoCompression-10     48.1ms ± 0%    48.5ms ± 0%    +0.73%  (p=0.008 n=5+5)
Writer/block=32_K/filter=false/compression=Snappy-10            59.0ms ± 3%    58.1ms ± 2%      ~     (p=0.690 n=5+5)

name                                                          old speed      new speed      delta
Writer/block=4.0_K/filter=false/compression=NoCompression-10   759MB/s ± 0%   759MB/s ± 1%      ~     (p=0.730 n=4+5)
Writer/block=4.0_K/filter=false/compression=Snappy-10          156MB/s ± 2%   157MB/s ± 1%      ~     (p=0.222 n=5+5)
Writer/block=32_K/filter=false/compression=NoCompression-10    785MB/s ± 0%   779MB/s ± 0%    -0.73%  (p=0.008 n=5+5)
Writer/block=32_K/filter=false/compression=Snappy-10           115MB/s ± 3%   117MB/s ± 2%      ~     (p=0.690 n=5+5)

name                                                          old alloc/op   new alloc/op   delta
Writer/block=4.0_K/filter=false/compression=NoCompression-10    1.18MB ± 0%    0.80MB ± 0%   -32.14%  (p=0.008 n=5+5)
Writer/block=4.0_K/filter=false/compression=Snappy-10           1.20MB ± 0%    0.83MB ± 0%   -31.08%  (p=0.008 n=5+5)
Writer/block=32_K/filter=false/compression=NoCompression-10      252kB ± 0%     952kB ± 0%  +278.63%  (p=0.008 n=5+5)
Writer/block=32_K/filter=false/compression=Snappy-10             456kB ± 0%    1157kB ± 0%  +153.58%  (p=0.008 n=5+5)

name                                                          old allocs/op  new allocs/op  delta
Writer/block=4.0_K/filter=false/compression=NoCompression-10     10.7k ± 0%      0.1k ± 0%   -99.03%  (p=0.008 n=5+5)
Writer/block=4.0_K/filter=false/compression=Snappy-10            10.7k ± 0%      0.1k ± 0%   -98.97%  (p=0.008 n=5+5)
Writer/block=32_K/filter=false/compression=NoCompression-10      1.27k ± 0%     0.10k ± 0%      ~     (p=0.079 n=4+5)
Writer/block=32_K/filter=false/compression=Snappy-10             1.27k ± 0%     0.10k ± 0%   -91.84%  (p=0.008 n=5+5)
```
@dt dt force-pushed the writer-index-reallocs branch from fc14850 to bd9b59c Compare November 23, 2021 20:01
@dt
Copy link
Member Author

dt commented Nov 23, 2021

TFTR!

@dt dt merged commit db1a741 into cockroachdb:master Nov 23, 2021
@dt dt deleted the writer-index-reallocs branch November 23, 2021 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants