-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: copy user-provided bounds and optimize unchanged bounds #1675
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to have gone down a rabbit hole where we are having to make performance tradeoffs, or worry about what use case we may miss when measuring for performance regression. Can we avoid all these concerns?
We currently have a setup where:
Case 1: bounds have changed. There is one set of key copies in CockroachDB.
Case 2: bounds have not changed. There are zero key copies.
I can't actually remember where case 2 occurs. There is definitely MVCCGet which is using a prefix iter and repeatedly sets setBounds(nil, nil) for each GetRequest contained in a BatchRequest -- this is not interesting. ScanRequests should fall into case 1. @erikgrinaker do you remember?
So let's focus on case 1. CockroachDB has to copy, in order to convert from roachpb.Key to a Pebble user key. Since CockroachDB is making a promise of key stability until after the next SetBounds call can we do something like the following in Pebble.
- SetBounds has changed bounds: don't copy since caller has promised key stability. Remember that not copied.
- SetBounds does not change bounds: if not-copied bit is not set, noop. If not-copied bit is set, make a copy, and pass it down as SetBounds(lower, upper, force-change=true), to disable any lower-level optimizations that will seek using next.
I realize this seems complicated. I am not suggesting we actually do it. But I am also searching for something with limited fallout.
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not suggesting we actually do it. But I am also searching for something with limited fallout.
I think #1667 is still viable to preserve the same performance characteristics for both cases. (This is with the caveat that tracking down metamorphic tests failures may yet reveal an overlooked reason that option is fundamentally flawed). This bound-copying alternative is seeking to address a concern about complexity and fragility rather than shifting a performance tradeoff.
I think we should try to get some sense of the actual impact of copying these bounds before we try to avoid it.
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't actually remember where case 2 occurs. There is definitely MVCCGet which is using a prefix iter and repeatedly sets setBounds(nil, nil) for each GetRequest contained in a BatchRequest -- this is not interesting. ScanRequests should fall into case 1. @erikgrinaker do you remember?
I was going to say the intent interleaving iterator, but it clearly uses different bounds for the engine iterator and cloned MVCC iterator.
Another example would be MVCCIncrementalIterator, where we'll be setting up a TBI and a normal iter with the same bounds, cloning via the batch/readonly. But I don't think it's used in any code paths that are hot enough to matter.
In general, I think we should initially focus on scans and gets (puts can be considered a variant of gets), since those are the most performance critical.
I think we should try to get some sense of the actual impact of copying these bounds before we try to avoid it.
Yep, I'll run some quick tests.
Reviewable status: 0 of 8 files reviewed, all discussions resolved
I ran some benchmarks for scans and gets, using a single batch that clones an iterator and calls
I would expect the copy overhead to show up in the scan benchmarks, but it appears to be negligible, so I don't think the copy is a problem in itself. For gets, I looked at profile diffs, and it seemed like it was mostly down to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For gets, I looked at profile diffs, and it seemed like it was mostly down to
SeekGE
overhead. That's likely because this affects the lower-level seek optimization, and should be orthogonal to the copies themselves.
I agree it looks like we're losing the seek optimizations, but I don't understand why? Do you mind pushing a branch that I can experiment with?
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, here you go:
E.g.:
$ dev bench --ignore-cache pkg/storage -f BenchmarkMVCCGet_Pebble -v --count 10
$ dev bench --ignore-cache pkg/storage -f '^BenchmarkMVCCScan_Pebble$' -v --count 10
$ dev bench --ignore-cache pkg/storage -f BenchmarkMVCCGet_Pebble --test-args '-test.cpuprofile=get.pprof'
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, is the original not reading through a batch? It makes sense that we're losing the seek-using-next optimization if we're reading through a batch now, but we should be able to claw that back with #1640. When SetOptions
is called and the iterator's batch doesn't have any new mutations, we should still be able to apply the seek-using-next optimization.
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, is the original not reading through a batch? It makes sense that we're losing the seek-using-next optimization if we're reading through a batch now, but we should be able to claw that back with #1640. When SetOptions is called and the iterator's batch doesn't have any new mutations, we should still be able to apply the seek-using-next optimization.
They're both reading through a batch -- the old benchmark was run after the commit that changed the benchmark to use a batch. However, the old benchmark omitted the SetOptions
call in CRDB when possible, while the new benchmark doesn't (not sure if it actually triggered here though since these were cloned iterators). If we additionally need #1675 to retain the SeekGE optimization in these cases then that makes sense.
In any case, for the copying I think the single-row scan is the benchmark we care about, since that uses random bounds for every scan. That looks viable based on these results at least.
Would it make sense to combine this and #1640 for a final benchmark before we merge this, just to verify that we're not regressing somehow?
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good call. This branch has the combination of the batch semantics and the bounds copy, preserving the seek-using-next optimizations with batches when possible: https://github.com/cockroachdb/pebble/compare/master...jbowens:bounds-copy-and-batch-semantics?expand=1
I'll try to compare it with the branch you linked above this afternoon.
Reviewable status: 0 of 8 files reviewed, all discussions resolved
Thanks! Looks like we still have the get regression though:
|
18f80d4
to
4c98b0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the regression was actually additional overhead in the SetOptions
noop case. Increasing the benchmark count when taking the cpu profile produced a more representative cpu profile. In that MVCCGet
benchmark, we're testing the noop case since each iterator is configured with the same options.
The SetOptions
function has to perform slightly more work to diff two IterOptions
because it exposes a larger surface area on its pebble.IterOptions
. I was able to get it down to a ~3% regression through restructuring some conditionals. I might be able to get it down more.
name old time/op new time/op delta
MVCCGet_Pebble/versions=1/valueSize=8-10 2.26µs ± 6% 2.20µs ± 5% -2.89% (p=0.000 n=49+50)
name old speed new speed delta
MVCCGet_Pebble/versions=1/valueSize=8-10 3.54MB/s ± 6% 3.64MB/s ± 5% +2.97% (p=0.000 n=49+50)
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errr, what am I saying, that's faster? Mind double checking that benchmark?
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined with the changes from #1640, there is a regression because also now need to detect if the batch was mutated. Let me see if I can't get that down.
name old time/op new time/op delta
MVCCGet_Pebble/versions=1/valueSize=8-10 2.26µs ± 6% 2.37µs ± 7% +4.76% (p=0.000 n=49+49)
name old speed new speed delta
MVCCGet_Pebble/versions=1/valueSize=8-10 3.54MB/s ± 6% 3.38MB/s ± 6% -4.55% (p=0.000 n=49+49)
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 2cd336795dae99c4fd422ca9db40cdf3c0ec81de on master...jbowens:bounds-copy-and-batch-semantics I got the delta back down to a minor performance win:
jackson@jacksonmbp cockroach % benchstat old.txt e5dabf12.txt
name old time/op new time/op delta
MVCCGet_Pebble/versions=1/valueSize=8-10 2.26µs ± 6% 2.23µs ± 2% -1.36% (p=0.014 n=49+49)
name old speed new speed delta
MVCCGet_Pebble/versions=1/valueSize=8-10 3.54MB/s ± 6% 3.59MB/s ± 1% +1.35% (p=0.010 n=49+48)
Reviewable status: 0 of 8 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic news, thanks for digging into this @jbowens! ❤️
Do the scan benchmarks look fine as well? If so, I think we're essentially good to go here.
Reviewable status: 0 of 10 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, I'm learning that these benchmarks are unstable and highly sensitive to the count. There's enough state (eg, in iterator reuse) that running with --count 50
is dramatically different than running with --count 10
.
Comparing old and new with --count 20
, there's still a MVCCGet regression:
name old time/op new time/op delta
MVCCScan_Pebble/rows=1/versions=1/valueSize=8-10 3.28µs ± 3% 2.45µs ± 1% -25.53% (p=0.000 n=20+20)
MVCCScan_Pebble/rows=10/versions=1/valueSize=8-10 13.9µs ± 3% 4.8µs ± 2% -65.64% (p=0.000 n=20+20)
MVCCScan_Pebble/rows=100/versions=1/valueSize=8-10 113µs ± 2% 23µs ± 1% -79.32% (p=0.000 n=20+20)
MVCCScan_Pebble/rows=1000/versions=1/valueSize=8-10 1.09ms ± 1% 0.20ms ± 0% -81.57% (p=0.000 n=19+20)
MVCCGet_Pebble/versions=1/valueSize=8-10 1.99µs ± 2% 2.28µs ± 5% +14.45% (p=0.000 n=19+20)
name old speed new speed delta
MVCCScan_Pebble/rows=1/versions=1/valueSize=8-10 2.43MB/s ± 3% 3.27MB/s ± 1% +34.35% (p=0.000 n=20+20)
MVCCScan_Pebble/rows=10/versions=1/valueSize=8-10 5.76MB/s ± 3% 16.76MB/s ± 2% +190.97% (p=0.000 n=20+20)
MVCCScan_Pebble/rows=100/versions=1/valueSize=8-10 7.07MB/s ± 2% 34.17MB/s ± 1% +383.51% (p=0.000 n=20+20)
MVCCScan_Pebble/rows=1000/versions=1/valueSize=8-10 7.31MB/s ± 1% 39.69MB/s ± 0% +442.70% (p=0.000 n=19+20)
MVCCGet_Pebble/versions=1/valueSize=8-10 4.01MB/s ± 4% 3.52MB/s ± 5% -12.40% (p=0.000 n=20+20)
Diffing cpu profiles points to keyspan.SeekGE
, the function used for seeking among range deletion tombstones and range keys.
Showing top 20 nodes out of 908
flat flat% sum% cum cum%
0.77s 2.26% 2.26% 1.14s 3.36% github.com/cockroachdb/pebble/internal/keyspan.SeekGE
0.76s 2.24% 4.50% 0.76s 2.24% syscall.syscall6
0.64s 1.88% 6.38% 0.63s 1.85% syscall.syscall
-0.35s 1.04% 5.33% -1.32s 3.89% github.com/cockroachdb/pebble/sstable.(*blockIter).SeekGE
-0.35s 1.04% 4.29% -0.45s 1.34% github.com/cockroachdb/cockroach/pkg/storage.mvccGet
-0.32s 0.94% 3.35% -0.30s 0.87% github.com/cockroachdb/pebble/sstable.(*blockIter).readEntry
-0.32s 0.94% 2.41% -0.56s 1.64% github.com/cockroachdb/cockroach/pkg/storage.EngineKeyCompare
-0.28s 0.82% 1.59% -0.45s 1.31% github.com/cockroachdb/pebble/sstable.(*blockIter).Next
I think this is a consequence of #1640. Previously a batch without any range deletions did not require a batch range deletion iterator to be added to the iterator. Now, we must add it to the iterator regardless, because we need a container to populate with the range deletions in case one is written to the batch later.
I think this is necessary and possibly something we can optimize down in subsequent PRs.
Reviewable status: 0 of 10 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, I'm learning that these benchmarks are unstable and highly sensitive to the count. There's enough state (eg, in iterator reuse) that running with --count 50 is dramatically different than running with --count 10.
Huh, I though each benchmark would be an independent run of the benchmark binary or something.
Anyway, this still isn't terrible -- in fact, the scan numbers are pretty fantastic (are we sure they're real?). If the gets are better than baseline for high counts that still seems like it might be a win.
I think this is necessary and possibly something we can optimize down in subsequent PRs.
Ok. Let me do a couple of quick tests here myself (might have to wait until tomorrow). As long as we can land this in CRDB master
(that doesn't have any of the range key changes yet) without any severe regressions on end-to-end benchmarks then I'm fine with micro-optimizing this later.
Reviewable status: 0 of 10 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(are we sure they're real?)
It'd be good to verify, but I think the wins are coming from the fact that with #1640 we can now use the try-seek-using-next optimization when we're reading through batches too, so long as SetOptions
wasn't just called and needed to invalidate the iterator because there are new mutations to the batch. So they're "real" for the narrow access pattern of that benchmark, but I'm not sure whether they're representative of any win for real production workloads.
If I run the benchmarks once per process, the variance is high enough that I'm not getting a bump recognized as statistically significant:
name old time/op new time/op delta
MVCCScan_Pebble/rows=1/versions=1/valueSize=8-10 2.70µs ±23% 2.53µs ± 7% ~ (p=0.780 n=40+40)
MVCCScan_Pebble/rows=10/versions=1/valueSize=8-10 8.20µs ±68% 5.99µs ±36% ~ (p=0.459 n=40+40)
MVCCScan_Pebble/rows=100/versions=1/valueSize=8-10 58.0µs ±94% 36.4µs ±60% ~ (p=0.262 n=40+40)
MVCCScan_Pebble/rows=1000/versions=1/valueSize=8-10 547µs ±98% 328µs ±60% ~ (p=0.289 n=40+40)
MVCCGet_Pebble/versions=1/valueSize=8-10 2.04µs ± 4% 2.20µs ± 2% +7.58% (p=0.008 n=5+5)
name old speed new speed delta
MVCCScan_Pebble/rows=1/versions=1/valueSize=8-10 3.03MB/s ±20% 3.17MB/s ± 7% ~ (p=0.776 n=40+40)
MVCCScan_Pebble/rows=10/versions=1/valueSize=8-10 12.7MB/s ±54% 14.3MB/s ±31% ~ (p=0.459 n=40+40)
MVCCScan_Pebble/rows=100/versions=1/valueSize=8-10 23.4MB/s ±70% 26.3MB/s ±48% ~ (p=0.264 n=40+40)
MVCCScan_Pebble/rows=1000/versions=1/valueSize=8-10 26.6MB/s ±72% 30.0MB/s ±49% ~ (p=0.288 n=40+40)
MVCCGet_Pebble/versions=1/valueSize=8-10 3.92MB/s ± 4% 3.64MB/s ± 2% -7.09% (p=0.008 n=5+5)
Ok. Let me do a couple of quick tests here myself (might have to wait until tomorrow).
Thanks, appreciate it. It'd be good to verify I'm not doing something silly collecting the scan benchmarks too.
Reviewable status: 0 of 10 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran some more benchmarks. On CRDB master
there is no diff (not surprising, just wanted to check). With SetOptions()
, my results are essentially the same as yours:
name old time/op new time/op delta
MVCCScan_Pebble/rows=1/versions=1/valueSize=8-24 4.36µs ± 1% 3.15µs ± 1% -27.75% (p=0.000 n=20+20)
MVCCScan_Pebble/rows=10/versions=1/valueSize=8-24 18.1µs ± 0% 6.1µs ± 2% -66.36% (p=0.000 n=19+20)
MVCCScan_Pebble/rows=100/versions=1/valueSize=8-24 147µs ± 0% 29µs ± 2% -80.20% (p=0.000 n=18+20)
MVCCScan_Pebble/rows=1000/versions=1/valueSize=8-24 1.44ms ± 1% 0.25ms ± 1% -82.65% (p=0.000 n=20+18)
MVCCGet_Pebble/versions=1/valueSize=8-24 2.63µs ± 0% 2.85µs ± 1% +8.62% (p=0.000 n=15+19)
I'm fine with merging this as-is for now, and revisiting optimizations.
Reviewable status: 0 of 10 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. @sumeerbhola mind giving this a review?
Reviewable status: 0 of 10 files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 7 files at r1, 1 of 5 files at r6, 1 of 3 files at r8, 4 of 4 files at r10, 1 of 1 files at r11, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @jbowens)
iterator.go
line 1794 at r10 (raw file):
return err }
I assume we have not changed the stability requirements for base.InternalIterator. InternalIterator.SetBounds
doesn't say anything about the bounds stability -- could you add an explicit code comment there?
iterator.go
line 1925 at r10 (raw file):
if (i.pointIter != nil || !i.opts.pointKeys()) && (i.rangeKey != nil || !i.opts.rangeKeys()) && boundsEqual && o.KeyTypes == i.opts.KeyTypes &&
is o.KeyTypes == i.opts.KeyTypes
subsumed by (i.pointIter != nil || !i.opts.pointKeys()) && (i.rangeKey != nil || !i.opts.rangeKeys())
?
Even if the KeyTypes have changed, if we have closed certain iterators and ended up with exactly the iterators we need, that seems ok, yes?
Hmm, I see the problem. What if the condition was
(i.pointIter != nil == o.pointKeys()) && (i.rangeKey != nil == o.rangeKeys())
and in the preceding section we did something like:
if i.pointIter != nil && (closeBoth || len(o.PointKeyFilters) > 0 || len(i.opts.PointKeyFilters) > 0) || !o.pointKeys()) {
i.err = firstError(i.err, i.pointIter.Close())
i.pointIter = nil
}
And similarly for range keys.
That is, we have one place we close iterators.
Or maybe I am missing the fact that we are not closing the point and range iterators even if they are not being used. Can you remind me what use case this optimization helps in?
testdata/iterator_bounds_lifetimes
line 1 at r10 (raw file):
new-iter label=first lower=bar upper=foo
nice test
testdata/iterator_bounds_lifetimes
line 56 at r10 (raw file):
second: ("a", "z") # Test no-op set-options.
can we tell it is a noop? Should the test print out Iterator.boundsBufIdx
?
testdata/iterator_bounds_lifetimes
line 64 at r10 (raw file):
# Test SetOptions with unchanged bounds but changes to other options. SetOptions # should hold onto the existing bounds buffers.
same question
testdata/iterator_seek_opt
line 180 at r10 (raw file):
SeekPrefixGEs with trySeekUsingNext: 4 # Shifting bounds, with non-overlapping and monotonic bounds. A set-bounds sits
The non-overlapping and monotonic bounds optimization is important for sstables even though it isn't observable using the usual iterator stats since we don't plumb it via trySeekUsingNext
. The InternalIteratorStats
may make this observable. Can you try printing those out too, and check? There is a comment in the test about them being non-deterministic, but this test is laying out a particular set of sstables so I think it should be deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 12 files reviewed, 6 unresolved discussions (waiting on @sumeerbhola)
iterator.go
line 1794 at r10 (raw file):
Previously, sumeerbhola wrote…
I assume we have not changed the stability requirements for base.InternalIterator.
InternalIterator.SetBounds
doesn't say anything about the bounds stability -- could you add an explicit code comment there?
yeah, no changes to the internal iterator contract. Added documentation to the interface type and SetBounds
.
iterator.go
line 1925 at r10 (raw file):
Previously, sumeerbhola wrote…
is
o.KeyTypes == i.opts.KeyTypes
subsumed by(i.pointIter != nil || !i.opts.pointKeys()) && (i.rangeKey != nil || !i.opts.rangeKeys())
?
Even if the KeyTypes have changed, if we have closed certain iterators and ended up with exactly the iterators we need, that seems ok, yes?Hmm, I see the problem. What if the condition was
(i.pointIter != nil == o.pointKeys()) && (i.rangeKey != nil == o.rangeKeys())
and in the preceding section we did something like:if i.pointIter != nil && (closeBoth || len(o.PointKeyFilters) > 0 || len(i.opts.PointKeyFilters) > 0) || !o.pointKeys()) { i.err = firstError(i.err, i.pointIter.Close()) i.pointIter = nil }
And similarly for range keys.
That is, we have one place we close iterators.Or maybe I am missing the fact that we are not closing the point and range iterators even if they are not being used. Can you remind me what use case this optimization helps in?
Yeah, I was trying to preserve the optimization of switching away and then back to a key type. It's not based in any known CockroachDB iterator usage.
In the case the key types change, we still need to switch i.iter
to point to the correct iterator. If range keys are enabled, we also need to initialize the range key interleaving iterator to correctly either wrap the appropriate point iterator (or remove the point iterator if range keys only). Reinitializing the interleaving iterator requires invalidating the iterator and disabling seek-using-next optimizations.
testdata/iterator_bounds_lifetimes
line 56 at r10 (raw file):
Previously, sumeerbhola wrote…
can we tell it is a noop? Should the test print out
Iterator.boundsBufIdx
?
good call, done.
testdata/iterator_bounds_lifetimes
line 64 at r10 (raw file):
Previously, sumeerbhola wrote…
same question
Done.
testdata/iterator_seek_opt
line 180 at r10 (raw file):
The
InternalIteratorStats
may make this observable.
I think the current granularity of internal stats don't provide any visibility here because block-reads aren't incremented if the reader attempts to load the block and the reader is already positioned over it. Whether we hit the optimization or not, the number block reads and keys surfaced to the merging iterator are constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r12, 1 of 1 files at r13, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @jbowens and @sumeerbhola)
iterator_test.go
line 1622 at r13 (raw file):
seed := uint64(time.Now().UnixNano()) t.Logf("seed = %d", seed) rng := rand.New(rand.NewSource(seed))
glad to see this randomized test.
iterator_test.go
line 1628 at r13 (raw file):
o = &IterOptions{KeyTypes: IterKeyType(rng.Intn(3))} if rng.Intn(2) == 1 { o.LowerBound = testkeys.KeyAt(ks, rng.Intn(ks.Count()), rng.Intn(ks.Count()))
With 26*26 keys it seems extremely rare to have bounds unchanged.
Should generateNewOptions take the previous options and with some probability keep bounds etc. the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)
iterator_test.go
line 1628 at r13 (raw file):
Previously, sumeerbhola wrote…
With 26*26 keys it seems extremely rare to have bounds unchanged.
Should generateNewOptions take the previous options and with some probability keep bounds etc. the same?
Done.
When a new iterator is constructed or when an existing iterator's bounds are modified, copy the bounds into a Pebble-allocated slice. This gives Pebble control over the lifetime of the bound buffers and allows Pebble to restore the optimization for unchanged bounds that was removed in cockroachdb#1073. Two buffers are used, one for the current bounds and one for old bounds. This scheme allows low-level internal iterators in Pebble to compare new and old bounds so that context-dependent optimizations may be applied. These bounds buffers are pooled on the iterAlloc to reduce allocations. These new semantics are simpler and allow Pebble's SetOptions to perform more optimizations. They come at the cost of the additional key copies, regardless of whether the Iterator's bounds will change. Supersedes cockroachdb#1667.
After passing bounds to Pebble, overwrite the provided buffers with garbage. This helps ensure we never accidentally retain user-provided bounds buffers after a NewIter/SetOptions/SetBounds call.
When a new iterator is constructed or when an existing iterator's bounds are
modified, copy the bounds into a Pebble-allocated slice. This gives Pebble
control over the lifetime of the bound buffers and allows Pebble to restore the
optimization for unchanged bounds that was removed in #1073.
Two buffers are used, one for the current bounds and one for old bounds. This
scheme allows low-level internal iterators in Pebble to compare new and old
bounds so that context-dependent optimizations may be applied.
These bounds buffers are pooled on the
iterAlloc
to reduce allocations.These new semantics are simpler and allow Pebble's SetOptions to perform more
optimizations. They come at the cost of the additional key copies, regardless
of whether the Iterator's bounds will change.
Alternative to #1667.