-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: add external sstable merging iterator #1529
Conversation
cbf3696
to
2359792
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to flesh out test coverage.
Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on @erikgrinaker, @nicktrav, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! How should overlapping range keys across SSTs be handled? This is relevant for backups, as we discussed, where two SSTs may have range keys [a-d)@3
and [c-e)@3
because we're resuming the second export from e.g. c@5
. In these cases, the suffixes and values for these range keys will be the same.
I suppose if we seek to c@6
then we'd hit the first range key, exposed as [a-d)
, and when we get to c@5
we'd expose [c-e)
? Or will we somehow merge them or truncate the bounds?
Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens, @nicktrav, and @sumeerbhola)
external_iterator.go, line 26 at r1 (raw file):
o *Options, iterOpts *IterOptions, maxTableFormat sstable.TableFormat,
Do we need this? Shouldn't Pebble just error on table formats greater than the ones it knows about, and accept the rest?
2359792
to
54cde90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose if we seek to c@6 then we'd hit the first range key, exposed as [a-d), and when we get to c@5 we'd expose [c-e)? Or will we somehow merge them or truncate the bounds?
My understanding is that it will be the merged bound since we are using the defragmenting iter in this code.
Reviewed 6 of 10 files at r1.
Reviewable status: 7 of 10 files reviewed, 2 unresolved discussions (waiting on @erikgrinaker, @jbowens, @nicktrav, and @sumeerbhola)
external_iterator.go, line 20 at r1 (raw file):
// input files slice must be sorted in reverse chronological ordering. A key in // an a file at a lower index will shadow a key with an identical user key // contained within a file at a higher index.
btw, in the CockroachDB use case do we expect
- anything other than trivial shadowing (e.g. same point key => value).
- range key unsets or deletes
sstable/block.go, line 993 at r1 (raw file):
} type rangeKeyIter struct {
This code repetition is unfortunate.
Can we share the same code if we specify the constraint on what key kinds are permitted when constructing it?
54cde90
to
df5ff62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the defragmenting iterator will merge the bounds. I added a test case demonstrating the merging.
Reviewable status: 7 of 10 files reviewed, 2 unresolved discussions (waiting on @erikgrinaker, @nicktrav, and @sumeerbhola)
external_iterator.go, line 20 at r1 (raw file):
Previously, sumeerbhola wrote…
btw, in the CockroachDB use case do we expect
- anything other than trivial shadowing (e.g. same point key => value).
- range key unsets or deletes
My understanding is that those are the only cases of shadowing we expect in CockroachDB ^
external_iterator.go, line 26 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Do we need this? Shouldn't Pebble just error on table formats greater than the ones it knows about, and accept the rest?
Good call, removed!
sstable/block.go, line 993 at r1 (raw file):
Previously, sumeerbhola wrote…
This code repetition is unfortunate.
Can we share the same code if we specify the constraint on what key kinds are permitted when constructing it?
Yeah, the writing was already on the wall for this reuse of blockIter
to directly implement keyspan.FragmentIterator
, since we intend to disentangle keyspan.FragmentIterator
and base.InternalIterator
. I updated this type to be used for both range deletions and range keys, and to explicitly implement keyspan.FragmentIterator
rather than relying on the blockIter
embedded type to transparently satisfy it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r2, 4 of 4 files at r3.
Reviewable status: all files reviewed (commit messages unreviewed), 3 unresolved discussions (waiting on @erikgrinaker, @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
sstable/block.go, line 995 at r3 (raw file):
// TODO(jackson): Remove remaining dependencies on the internal // value being propagated via positioning methods and refactor the // FragmentIterator to be indepenedent of base.InternalIterator.
independent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1, 3 of 3 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @erikgrinaker, @itsbilal, @jbowens, and @sumeerbhola)
external_iterator.go, line 19 at r3 (raw file):
// Input sstables may contain point keys, range keys, range deletions, etc. The // input files slice must be sorted in reverse chronological ordering. A key in // an a file at a lower index will shadow a key with an identical user key
nit: key in a file
sstable/block.go, line 968 at r3 (raw file):
// decode the end key from a fragment's raw internal value. Range // deletions store the end key directly as the value, whereas range
nit: whereas other range key kinds ...
testdata/external_iterator, line 1 at r3 (raw file):
build 1
Great test!
df5ff62
to
2da13c2
Compare
Add a pebble.NewExternalIter function that may be used to construct a *pebble.Iterator that reads from a provided slice of sstables rather than committed database state. Input sstables are required to contain all zero-sequence number keys. Shadowing of keys is resolved by treating the files as ordered in reverse chronological order. This iterator is intended to replace the storage package's multiIterator.
2da13c2
to
dea9518
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: 7 of 11 files reviewed, 5 unresolved discussions (waiting on @erikgrinaker, @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the defragmenting iterator will merge the bounds. I added a test case demonstrating the merging.
Great stuff, thanks for getting this in so fast!
Reviewed all commit messages.
Reviewable status: 7 of 11 files reviewed, 4 unresolved discussions
external_iterator.go, line 20 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
My understanding is that those are the only cases of shadowing we expect in CockroachDB ^
I believe that's accurate, yeah. But I'm not intimately familiar with the bulk IO code.
Add a pebble.NewExternalIter function that may be used to construct a
*pebble.Iterator that reads from a provided slice of sstables rather than
committed database state. Input sstables are required to contain all
zero-sequence number keys. Shadowing of keys is resolved by treating the files
as ordered in reverse chronological order.
This iterator is intended to replace the storage package's multiIterator.
Close #1526.