-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: block interval annotations for fine-grained time-bound iteration #1231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is lacking comprehensive tests and benchmarks. Sending it out for initial opinions.
Reviewable status: 0 of 14 files reviewed, all discussions resolved (waiting on @itsbilal, @jbowens, and @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea looks great to me! Only left some minor comments. Are you thinking of stuffing the annotator name in the sstable properties block?
Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @jbowens, @petermattis, and @sumeerbhola)
internal/base/block_interval_annotator.go, line 18 at r1 (raw file):
// Interval annotations are of the form [lower,upper) where both lower and // upper are uint64, and represent a set such that the block can contain keys // that belong to this set, but no keys outside this set. That is, the set is
It took me a while to understand this. Maybe a phrase like If block annotations are happening, for a given annotator function / format, all keys in a block will be present in the set
would help.
internal/base/block_interval_annotator.go, line 26 at r1 (raw file):
// - These annotations are written as part of the BlockHandle in index blocks // (first level and second level), where they refer to either data blocks // (when written in a first level index block) or first level index blocks
Nit: Could also use lower/only level
and upper level
(or top
level as in the twoLevelIterator code). The first/second thing is a little unclear. Feel free to dismiss.
internal/base/block_interval_annotator.go, line 60 at r1 (raw file):
// block. Represents [Lower, Upper). type BlockInterval struct { Lower uint64
I imagine we're using uint64
s instead of the more generic []byte
s here so that packing it into the block handle is easier / reliable, correct?
The other extreme could be moving logic about merging / etc outside of Pebble completely, and only handling one []byte
here . I'm thinking about a type BlockInterval interface
that implements the below methods plus Encode
, a struct type that specifies decode / creator / annotator functions (similar to Comparer
), and a max size for how long an Interval (or maybe more generally a BlockFilter
could be). That might be cleaner but it might actually be less performant due to a greater number of UVarint encodes/decodes so feel free to dismiss this.
The reason why I thought of that was because the two uint64
s seem to pigeonhole us into handling timestamps well and little else, but maybe that kind of specialization is worthwhile here.
internal/base/block_interval_annotator.go, line 93 at r1 (raw file):
// block in an sstable. This is not precisely set intersection (see the code // below). func BlockIntervalsIntersect(iter, block BlockInterval) bool {
Why not make this func (bi *BlockInterval) Intersect(other BlockInterval) bool
to match the Union function?
internal/base/block_interval_annotator.go, line 95 at r1 (raw file):
func BlockIntervalsIntersect(iter, block BlockInterval) bool { if isEmptySet(iter) { // Bizarre iterator that is interested in nothing.
😂
internal/base/block_interval_annotator.go, line 112 at r1 (raw file):
} // INVARIANT: Neither set is empty or universal set. return !(iter.Upper <= block.Lower || iter.Lower >= block.Upper)
Nit: This seems more readable when written as return iter.Upper > block.Lower && iter.Lower < block.Upper
, but either works.
internal/base/block_interval_annotator.go, line 115 at r1 (raw file):
} func isEmptySet(x BlockInterval) bool {
Nit: can take x as receiver.
sstable/writer.go, line 104 at r1 (raw file):
compression Compression separator Separator successor Successor
Nit: indents should match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the term block handle annotations
be better than block interval annotations
? The former connotes to me that the annotation is on the block handle, while the latter connotes the type of annotation, but not where it applies.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
options.go, line 85 at r1 (raw file):
// function can be used by multiple iterators, if the iterator is cloned. TableFilter func(userProps map[string]string) bool // BlockInterval represents the set the caller is interested in, and
I think this comment could use a little bit more detail. the set
is vague. Perhaps we can point to the comment in block_interal_annotator.go
.
If we generalize this functionality more, this could become a BlockFilter func()
which takes a BlockHandle
and returns a bool. That is, we move the BlockIntervalIntersects
call into this callback.
options.go, line 462 at r1 (raw file):
// TODO: Also plumb this into the ingested sstable construction code in // CockroachDB. BlockIntervalAnnotatorFunc BlockIntervalAnnotatorFunc
Nit: The Func
suffix naming is a bit inconsistent with some of the other options. This seems to have been introduced with KeyValidationFunc
, but contrast that to DebugCheck
and the function pointers in Comparer
.
internal/base/block_interval_annotator.go, line 17 at r1 (raw file):
// // Interval annotations are of the form [lower,upper) where both lower and // upper are uint64, and represent a set such that the block can contain keys
represent a set of 0-based block indexes
?
internal/base/block_interval_annotator.go, line 60 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I imagine we're using
uint64
s instead of the more generic[]byte
s here so that packing it into the block handle is easier / reliable, correct?The other extreme could be moving logic about merging / etc outside of Pebble completely, and only handling one
[]byte
here . I'm thinking about atype BlockInterval interface
that implements the below methods plusEncode
, a struct type that specifies decode / creator / annotator functions (similar toComparer
), and a max size for how long an Interval (or maybe more generally aBlockFilter
could be). That might be cleaner but it might actually be less performant due to a greater number of UVarint encodes/decodes so feel free to dismiss this.The reason why I thought of that was because the two
uint64
s seem to pigeonhole us into handling timestamps well and little else, but maybe that kind of specialization is worthwhile here.
Ditto on this comment. I made a similar comment elsewhere about further abstracting from a concrete BlockInterval
.
Related, are we at all worried that Lower
and Upper
contain only the wall time and not a full HLC timestamp?
sstable/filter.go, line 39 at r1 (raw file):
type BlockHandle struct { Offset, Length uint64 BlockInterval base.BlockInterval
I wonder if we should make this a more general facility for BlockHandle
annotations. Some of the infrastructure being added here is already generalized, yet then tied concretely to BlockInterval
. I'm imagining something where the BlockHandle
annotation is encoded in []byte
and there is an options callback to decode and process these annotations. I haven't thought through this in detail, but it seems feasible to do this.
Block interval annotations are an optional user-facing feature that can be used to filter data blocks from an Iterator before they are loaded. Interval annotations are of the form [lower,upper) where both lower and upper are uint64, and represent a set such that the block can contain keys that belong to this set, but no keys outside this set. That is, the set is not necessarily tight. The interval [0,x) for any x is reserved and represents the universal set. A [0,x) annotation is not written out to the sstable, and is useful in two cases: - These annotations are written as part of the BlockHandle in index blocks (first level and second level), where they refer to either data blocks (when written in a first level index block) or first level index blocks (when written in a second level index block). BlockHandles for other kinds of blocks default to the [0,0) annotation and avoid writing anything for the annotation. - Tables written prior to this feature default to blocks having a [0,0) annotation, which is correctly interpreted as the universal set since we do not have any useful filtering information. This is needed in order to turn on this feature without rewriting any data. The implementation requires [lower,upper) to satisfy upper >= lower. And when lower > 0, these are encoded as either: - lower, upper-lower when upper-lower > 0 - lower The implementation does not require any particular lower value for representation of the empty set [lower,lower), but from an encoding perspective [1,1) is the most efficient. Bi-directional compatibility is not a goal, and files written with block interval annotations cannot be read by old code that does not know about the existence of this feature. The main use case for block interval annotations is as a fine-grained version of CockroachDB's time-bound iterator, that is used to ignore blocks that do not contain MVCC timestamps that are relevant. See Fixes cockroachdb#1190
8115164
to
1c69944
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs! I've responded to the comments here. I'm going to open a new PR that addresses all these comments.
Are you thinking of stuffing the annotator name in the sstable properties block?
Good point. I will be renaming this to BlockPropertyCollector
and requiring that the block property name be unique across all user properties, and then put the aggregated value across all blocks into the user properties. Then BlockPropertyCollector
effectively subsumes table properties. And we can get rid of all the special casing regarding what is interpreted as an empty or universal set, since the presence of the block property name in the sstable properties happens if and only if the sstable was written with that property collector. If it isn't written with that property collector, a BlockPropertyFilter
specified for an iterator is a noop.
Would the term block handle annotations be better than block interval annotations? The former connotes to me that the annotation is on the block handle, while the latter connotes the type of annotation, but not where it applies.
I have a mild preference that the name specify what it is describing, which is a block interval, and not where it is stored, since that can happen at various levels, both index levels, and the sstable (after aggregation). So it is really describing the finest granularity at which this interval applies.
Based on other comments, these will now be called BlockPropertyCollector
and BlockPropertyFilter
. The interval case is a special case, where the set is described using a [uint64, uint64) interval, and we will provide a helper class in Pebble that does most of the heavy lifting for that case.
Reviewable status: 9 of 14 files reviewed, 7 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
options.go, line 85 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I think this comment could use a little bit more detail.
the set
is vague. Perhaps we can point to the comment inblock_interal_annotator.go
.If we generalize this functionality more, this could become a
BlockFilter func()
which takes aBlockHandle
and returns a bool. That is, we move theBlockIntervalIntersects
call into this callback.
Good point about generalizing. This will be generalized to BlockPropertyFilters []BlockPropertyFilter
, where all filters need to match. There can be multiple properties on a block, if there were multiple BlockPropertyCollectors.
options.go, line 462 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: The
Func
suffix naming is a bit inconsistent with some of the other options. This seems to have been introduced withKeyValidationFunc
, but contrast that toDebugCheck
and the function pointers inComparer
.
This func will go away and be replaced by BlockPropertyCollectors []func() BlockPropertyCollector
.
internal/base/block_interval_annotator.go, line 17 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
represent a set of 0-based block indexes
?
I didn't quite understand the questions. Maybe it is to do with the fact that the comment below talks about [0,x)
being a special case, which was all rather complicated and unnecessary. That complexity goes away in the new code.
internal/base/block_interval_annotator.go, line 18 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
It took me a while to understand this. Maybe a phrase like
If block annotations are happening, for a given annotator function / format, all keys in a block will be present in the set
would help.
Done (in the new code).
internal/base/block_interval_annotator.go, line 26 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nit: Could also use
lower/only level
andupper level
(ortop
level as in the twoLevelIterator code). The first/second thing is a little unclear. Feel free to dismiss.
Done
internal/base/block_interval_annotator.go, line 60 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Ditto on this comment. I made a similar comment elsewhere about further abstracting from a concrete
BlockInterval
.Related, are we at all worried that
Lower
andUpper
contain only the wall time and not a full HLC timestamp?
Generalization is a good point. It goes together with not limiting this to a single block property. I've gone ahead and done that, since I believe the cost can be contained -- we just need to encode and decode an extra varint for the short-id of the block property. I've added significant helper implementations that only outsource the core logic for property extraction, so that we can test this thoroughly, and keep the encoding/decoding logic for the integer set case inside these Pebble provided helpers.
internal/base/block_interval_annotator.go, line 93 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Why not make this
func (bi *BlockInterval) Intersect(other BlockInterval) bool
to match the Union function?
This was a peculiar implementation because of the special cases, which go away in the new code, and this becomes a method on the interval
struct.
internal/base/block_interval_annotator.go, line 112 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nit: This seems more readable when written as
return iter.Upper > block.Lower && iter.Lower < block.Upper
, but either works.
Done
internal/base/block_interval_annotator.go, line 115 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nit: can take x as receiver.
Will be removed
sstable/filter.go, line 39 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I wonder if we should make this a more general facility for
BlockHandle
annotations. Some of the infrastructure being added here is already generalized, yet then tied concretely toBlockInterval
. I'm imagining something where theBlockHandle
annotation is encoded in[]byte
and there is an options callback to decode and process these annotations. I haven't thought through this in detail, but it seems feasible to do this.
Done
sstable/writer.go, line 104 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nit: indents should match
Done. I was relying on goland, which didn't seem to mind.
#1315 is the replacement |
Block interval annotations are an optional user-facing feature that can be
used to filter data blocks from an Iterator before they are loaded.
Interval annotations are of the form [lower,upper) where both lower and
upper are uint64, and represent a set such that the block can contain keys
that belong to this set, but no keys outside this set. That is, the set is
not necessarily tight.
The interval [0,x) for any x is reserved and represents the universal set.
A [0,x) annotation is not written out to the sstable, and is useful in
two cases:
(first level and second level), where they refer to either data blocks
(when written in a first level index block) or first level index blocks
(when written in a second level index block). BlockHandles for other
kinds of blocks default to the [0,0) annotation and avoid writing
anything for the annotation.
annotation, which is correctly interpreted as the universal set
since we do not have any useful filtering information. This is needed
in order to turn on this feature without rewriting any data.
The implementation requires [lower,upper) to satisfy upper >= lower. And
when lower > 0, these are encoded as either:
The implementation does not require any particular lower value for
representation of the empty set [lower,lower), but from an encoding
perspective [1,1) is the most efficient.
Bi-directional compatibility is not a goal, and files written with block
interval annotations cannot be read by old code that does not know
about the existence of this feature.
The main use case for block interval annotations is as a fine-grained
version of CockroachDB's time-bound iterator, that is used to ignore
blocks that do not contain MVCC timestamps that are relevant. See
Fixes #1190