-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: defer loading L0 range key blocks during iterator construction #3004
Conversation
Add a predicate that evaluates to true only for (vfs.File).ReadAt operations at the provided offset. This allows datadriven tests to inject errors into specific block loads if they know the layout of the sstable.
This comment was marked as resolved.
This comment was marked as resolved.
Previously, construction of an iterator over range keys that found sstables containing range keys within L0 performed I/O to load the range key blocks during iterator construction. This was less efficient: If the iterator ultimately didn't need to read the keyspace overlapping the sstables containing range keys, the block loads were unnecessary. More significantly, if the I/O failed during iterator construction, the resulting iterator was unusable. It would always error with the original error returned by the failed block load. This is a deviation from iterator error handling across the rest of the iterator stack, which allows an Iterator to be re-seeked to clear the current iterator error. Resolves cockroachdb#2994.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens and @RaduBerinde)
range_keys.go
line 70 at r2 (raw file):
// through the fileMetadata to determine that. Since L0's file count should // not significantly exceed ~1000 files (see L0CompactionFileThreshold), // this should be okay.
I wonder if we can quickly verify the cost of this by running kv0 and taking a CPU profile.
We need kv0 with block size that both overloads the LSM and doesn't run very cold wrt CPU. Maybe block size of 100 with very high client concurrency? May need some experimentation. Then we may have ~400 files in L0 with ~10 sub-levels. And the MVCCPuts are creating an iterator that needs to read range keys.
testdata/iter_histories/errors
line 94 at r2 (raw file):
---- combined-iter
loving this simple to read but sophisticated error injection test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)
range_keys.go
line 70 at r2 (raw file):
Previously, sumeerbhola wrote…
I wonder if we can quickly verify the cost of this by running kv0 and taking a CPU profile.
We need kv0 with block size that both overloads the LSM and doesn't run very cold wrt CPU. Maybe block size of 100 with very high client concurrency? May need some experimentation. Then we may have ~400 files in L0 with ~10 sub-levels. And the MVCCPuts are creating an iterator that needs to read range keys.
It seems tricky because we also need range keys to exist within L0. And there need to exist range keys within sstables that overlap the put key (in order to defeat the lazy-combined iterator optimization). Another option is to add one keyspan.LevelIter
per L0 file containing a range key (ignoring the sublevels structure entirely) by iterating through current.RangeKeyLevels
. This becomes O(files containing range keys) which should be significantly less, at the cost of more levels in the iterator stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens and @RaduBerinde)
range_keys.go
line 70 at r2 (raw file):
This becomes O(files containing range keys) which should be significantly less, at the cost of more levels in the iterator stack.
It's no more than the levels we currently have on master, except that we'd be now wrapping each of those in a level iter, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)
range_keys.go
line 70 at r2 (raw file):
Previously, sumeerbhola wrote…
This becomes O(files containing range keys) which should be significantly less, at the cost of more levels in the iterator stack.
It's no more than the levels we currently have on master, except that we'd be now wrapping each of those in a level iter, yes?
Yeah, that's right. I gave this a try but unfortunately it's not viable today because the keyspan.LevelIter
requires that the underlying manifest.LevelIterator
be sorted by user keys, even if it's bounded to contain a single file. The manifest.LevelIterator
's bounding is implemented by first seeking within the b-tree, and then adjusting to the bounds. If the underlying b-tree is sorted by sequence numbers, it can return incorrect results.
Some additional alternatives:
- implement a special
keyspan.FragmentIterator
that takes a singlefileMetadata
, defering the I/O until a relevant seek call is made. - adjust the L0Sublevels code to construct a parallel b-tree containing only files with range keys for each sublevel. We don't need to actually recompute the L0 sublevels as if these range-key files are the only files in L0 to get the optimal read-amp.
TFTR! Merging this, and filed #3007 for following up on potentially avoiding the O(# files in L0) iteration. |
Previously, construction of an iterator over range keys that found sstables
containing range keys within L0 performed I/O to load the range key blocks
during iterator construction. This was less efficient: If the iterator
ultimately didn't need to read the keyspace overlapping the sstables containing
range keys, the block loads were unnecessary.
More significantly, if the I/O failed during iterator construction, the
resulting iterator was unusable. It would always error with the original error
returned by the failed block load. This is a deviation from iterator error
handling across the rest of the iterator stack, which allows an Iterator to be
re-seeked to clear the current iterator error.
Resolves #2994.