-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: refactor compaction splitting to reduce key comparisons #2259
Conversation
7c15acc
to
6ac10ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion
compaction_iter.go
line 781 at r1 (raw file):
i.key.Trailer = i.iterKey.Trailer i.keyTrailer = i.iterKey.Trailer i.frontiers.advance(i.key.UserKey)
I think ideally these frontiers would actually be incorporated into the merging iterator heap, to further reduce key comparisons and to detect that the frontier was reached before the compaction iterator logic executes. I'm not sure how to integrate it cleanly though, so I opted for this initial baby step which is good enough for avoiding introducing extra per-key comparisons in implementing the boundary alignment heuristic (#2156).
269d555
to
b881c50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(drive-by)
Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @jbowens and @RaduBerinde)
compaction.go
line 191 at r2 (raw file):
// split point. if startKey := lf.c.rangeDelFrag.Start(); startKey != nil { lf.limit = lf.limitFunc(startKey)
happy to see this code move into DB.compact1
.
compaction_iter.go
line 914 at r2 (raw file):
// surpassed, the frontier's reached method is invoked with the key that reached // the frontier. During the execution of reached, a frontier implementation may // update the value of its `key`. If the `key` method returns nil, the frontier
clarification on "may update":
it's not going to update the parameter passed in reached(key []byte)
and this update just means that future calls to key()
may return a different key than before, yes?
Also is there a constraint that the reached-key passed in reached
is >= the last returned frontier-key
?
compaction_iter.go
line 931 at r2 (raw file):
// compaction is about to surpass a key may add a frontier, with a `reached` // function that will be invoked when the key is about to be reached or // surpassed.
I guess the "about to be reached" clarifies my previous question. But I think it is worth being very explicit in the specification comment.
d5c15fc
to
2d28811
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)
compaction.go
line 191 at r2 (raw file):
Previously, sumeerbhola wrote…
happy to see this code move into
DB.compact1
.
🙌
compaction_iter.go
line 914 at r2 (raw file):
Previously, sumeerbhola wrote…
clarification on "may update":
it's not going to update the parameter passed inreached(key []byte)
and this update just means that future calls tokey()
may return a different key than before, yes?Also is there a constraint that the reached-key passed in
reached
is >= the last returnedfrontier-key
?
I made a few comment updates to make this all a little more explicit.
compaction_iter.go
line 931 at r2 (raw file):
Previously, sumeerbhola wrote…
I guess the "about to be reached" clarifies my previous question. But I think it is worth being very explicit in the specification comment.
I think is this done now, but lemme know if it's still unclear.
c9a4ccf
to
c9cf2eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 9 unresolved discussions (waiting on @bananabrick, @jbowens, and @sumeerbhola)
compaction_iter.go
line 926 at r4 (raw file):
// value. // // A frontier's `key` must be stable between calls to `reached`. If a frontier
Requiring this of the implementor is fragile.. And we must warn against using advance()
(or really any other part of the API) between the key()
changing and update()
being called because the heap is internally broken at that time. In general, I'd avoid having a data structure where we expect the internal invariants to be broken at times.
A more robust way would be to pass the first key of interest in push()
and update()
and have reached()
return the next key of interest. Or change the API so we always remove the frontier when the key is reached and the caller has to push()
it again. I doubt that a Fix instead of a Pop and a Push is that much faster. The API can become as simple as push(key []byte, reached func())
compaction_iter.go
line 1001 at r4 (raw file):
// adds the frontier if not already contained with the heap, and fixes up its // position if it already is. func (f *frontiers) update(ff frontier) {
[nit] It would be nice to differentiate between functions that are part of the "interface" and internal-only functions. The former can be uppercased even if the type is not exported. We can also consider moving this to a separate package
compaction_iter.go
line 1003 at r4 (raw file):
func (f *frontiers) update(ff frontier) { hasKey := ff.key() != nil for i := 0; i < len(f.items); i++ {
If necessary, we can make this logarithmic by maintaining a map[frontier]int
(map from frontier to index, updated in Swap). (maybe leave it as a TODO).
If linear time is acceptable for update
and push
, we might as well maintain a sorted array instead of a heap. Which could actually reduce the number of key comparisons (we don't need any to "pop", we only binary search when we insert).
compaction_iter.go
line 1022 at r4 (raw file):
// push adds the provided frontier to the set of frontiers. If the provided // frontier is already in the heap, it will be added again and will receive
If we allow a frontier to be added multiple times, we have to explain what happens when we call update
after that.
compaction_iter.go
line 1043 at r4 (raw file):
} // fix, up and down are copied from the go stdlib.
Why aren't we using Go's stdlib? If the performance difference of calling Less
and Swap
through an interface is important, we could get that back by storing the kay and avoiding the two interface calls to .key()
on every comparison..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 8 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)
compaction_iter.go
line 926 at r4 (raw file):
Previously, RaduBerinde wrote…
Requiring this of the implementor is fragile.. And we must warn against using
advance()
(or really any other part of the API) between thekey()
changing andupdate()
being called because the heap is internally broken at that time. In general, I'd avoid having a data structure where we expect the internal invariants to be broken at times.A more robust way would be to pass the first key of interest in
push()
andupdate()
and havereached()
return the next key of interest. Or change the API so we always remove the frontier when the key is reached and the caller has topush()
it again. I doubt that a Fix instead of a Pop and a Push is that much faster. The API can become as simple aspush(key []byte, reached func())
Heh, I actually started with exactly the push(key []byte, reached func())
interface, but I needed the ability to update an existing frontiers key (eg, the update
operation). With just push(key []byte, reached func())
there's no way to identify which element you want to update during update
(there's no guarantee that key()
s are unique). I've updated the implementation to use the pointer of the frontier
struct for this purpose.
compaction_iter.go
line 1001 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] It would be nice to differentiate between functions that are part of the "interface" and internal-only functions. The former can be uppercased even if the type is not exported. We can also consider moving this to a separate package
I don't think there's enough here to warrant a separate package, but I've uppercased the entry points to made it clear.
compaction_iter.go
line 1003 at r4 (raw file):
Previously, RaduBerinde wrote…
If necessary, we can make this logarithmic by maintaining a
map[frontier]int
(map from frontier to index, updated in Swap). (maybe leave it as a TODO).If linear time is acceptable for
update
andpush
, we might as well maintain a sorted array instead of a heap. Which could actually reduce the number of key comparisons (we don't need any to "pop", we only binary search when we insert).
I suspect using a map would be a regression, given the small n. The linear part of this runtime only performs pointer equality checks, which are much cheaper than key comparisons or memory swaps.
compaction_iter.go
line 1043 at r4 (raw file):
Previously, RaduBerinde wrote…
Why aren't we using Go's stdlib? If the performance difference of calling
Less
andSwap
through an interface is important, we could get that back by storing the kay and avoiding the two interface calls to.key()
on every comparison..
Agreed, we should remove the interface indirection of the frontier. Made that change.
We've seen the interface indirection have a significant enough impact on perf, that we maintain copied concrete heap types for the merging iterator, level checker and keyspan merging iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @bananabrick, @jbowens, and @sumeerbhola)
compaction_iter.go
line 926 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Heh, I actually started with exactly the
push(key []byte, reached func())
interface, but I needed the ability to update an existing frontiers key (eg, theupdate
operation). With justpush(key []byte, reached func())
there's no way to identify which element you want to update duringupdate
(there's no guarantee thatkey()
s are unique). I've updated the implementation to use the pointer of thefrontier
struct for this purpose.
This is much cleaner, thanks!
compaction_iter.go
line 927 at r5 (raw file):
// invocation, through its Update method. type frontier struct { parent *frontiers
[nit] parent
is a bit confusing (my first thought was parent inside the heap), maybe container
or just frontiers
compaction_iter.go
line 940 at r5 (raw file):
// After reached is invoked, the frontier's key is updated to the return // value of `reached`. Note bene, the frontier is permitted to update its // key to a user key ≤ the argument `key`:
[nit] remove :
or reformat the next paragraph as a subblock
compaction_iter.go
line 942 at r5 (raw file):
// key to a user key ≤ the argument `key`: // // If a frontier is set to key k1, and reached(k2) is invoked (k2≥k1),
[nit] space around >= for consistency
7e66c39
to
a008564
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)
compaction_iter.go
line 927 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit]
parent
is a bit confusing (my first thought was parent inside the heap), maybecontainer
or justfrontiers
went with container
compaction_iter.go
line 940 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] remove
:
or reformat the next paragraph as a subblock
Done.
compaction_iter.go
line 942 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] space around >= for consistency
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r1, 2 of 4 files at r5, 1 of 1 files at r6, 1 of 1 files at r7.
Reviewable status: all files reviewed (commit messages unreviewed), 11 unresolved discussions (waiting on @bananabrick, @jbowens, and @RaduBerinde)
compaction.go
line 176 at r7 (raw file):
// point must be ahead of the previous flush split point. limit := lf.limitFunc(key) lf.frontier.Update(limit)
why is this calling Update
-- doesn't that duplicate the work of fixing up the heap that Advance
will do?
compaction.go
line 179 at r7 (raw file):
return limit } lf.frontier.Update(nil)
same question
compaction.go
line 223 at r7 (raw file):
// limit key.) If a wrapped splitter advises a split, it must continue // to advise a split until a new output. type userKeyChangeSplitter struct {
why are userKeyChangeSplitter
and splitterGroup
not using frontiers
? Is that for the next PR?
compaction_iter.go
line 1034 at r7 (raw file):
// This frontier has been reached. Invoke the closure and update with // the next frontier. f.items[0].key = f.items[0].reached(k)
I am confused about what prevents this from looping forever if the situation mentioned in the earlier comment happens:
// If a frontier is set to key k1, and reached(k2) is invoked (k2 ≥ k1),
// reached is permitted to return `k1`. However, it will receive reached(k2)
// calls until it returns nil or a key `k3` such that k2 < k3. This property
// is useful for frontiers that use `reached` invocations to drive iteration
// through InternalKeys which may contain the same user key multiple times.
I thought that the goal was for Advance
to return and be called again when the compaction steps to the next InternalKey
.
testdata/frontiers
line 4 at r7 (raw file):
b e j a p n z
why this empty line?
compaction_iter_test.go
line 282 at r7 (raw file):
var keySets [][][]byte datadriven.RunTest(t, "testdata/frontiers", func(t *testing.T, td *datadriven.TestData) string { switch td.Cmd {
Can you add a comment about what "init" is specifying, so that a casual reader does not need to read initTestFrontier
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 5 files reviewed, 11 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)
compaction.go
line 176 at r7 (raw file):
Previously, sumeerbhola wrote…
why is this calling
Update
-- doesn't that duplicate the work of fixing up the heap thatAdvance
will do?
The max-grandparent-overlap splitter needs to update its frontier key when a new output is reached, even if it didn't request splitting to a new output. This is outside the context of a reached
call. Advance
will only need to fix up this frontier's position if this splitter individually did request the split, in which case we'll need to pop the heap root during reached
and then re-insert using Update
during onNewOutput
.
I suppose we could reduce the amount of work when this splitter requests a split by optimistically calculating the next overlap key in reached
assuming our split request will be immediately respected. The userKeyChangeSplitter
complicates all this because a splitNow
request may be ignored if not at a new user key. I think we should take a pass through cleaning up the compactionOutputSplitter
interface once we're able to remove support for split user keys altogether.
compaction.go
line 223 at r7 (raw file):
Previously, sumeerbhola wrote…
why are
userKeyChangeSplitter
andsplitterGroup
not usingfrontiers
? Is that for the next PR?
The user key change splitter would need to materialize an immediate successor key to use as a frontier. This would beed to be on every key. Currently, we only perform a key comparison when the wrapped splitter requests a split. Also, separately, the user key change splitter is obsolete but we can’t remove it until we drop support for the format major versions that do allow split user keys. I’d like to do that and remove it, but we also need documentation so that non-Cockroach users understand what they need to do to migrate to master.
The splitterGroup doesn’t perform any key comparisons per-compaction key itself. The only time it needs to do any is in onNewOutput. (An aside, I think this interface can be simplified especially once we remove userKeyChangeSplitter. It's wonky that onNewOutput
returns the splitterSuggestion
for the new file.)
compaction_iter.go
line 1034 at r7 (raw file):
Previously, sumeerbhola wrote…
I am confused about what prevents this from looping forever if the situation mentioned in the earlier comment happens:
// If a frontier is set to key k1, and reached(k2) is invoked (k2 ≥ k1), // reached is permitted to return `k1`. However, it will receive reached(k2) // calls until it returns nil or a key `k3` such that k2 < k3. This property // is useful for frontiers that use `reached` invocations to drive iteration // through InternalKeys which may contain the same user key multiple times.
I thought that the goal was for
Advance
to return and be called again when the compaction steps to the nextInternalKey
.
The code setting the frontier may not be a pure function of the input key. In particular, this is intended for the grandparent boundary heuristic. As long as we still need to support split user keys, there may be multiple grandparents with smallest keys with the same user key. The reached
invocations will drive iteration through the grandparents. While the may return the same user key multiple times, it’s limited to the number of times the user key is a smallest key of a grandparent file.
testdata/frontiers
line 4 at r7 (raw file):
Previously, sumeerbhola wrote…
why this empty line?
it configures an empty frontier (ensuring an Init(_, nil, _) does not add the frontier to the heap). Added a comment up above.
compaction_iter_test.go
line 282 at r7 (raw file):
Previously, sumeerbhola wrote…
Can you add a comment about what "init" is specifying, so that a casual reader does not need to read
initTestFrontier
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r8, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @bananabrick, @jbowens, and @RaduBerinde)
compaction.go
line 176 at r7 (raw file):
I think we should take a pass through cleaning up the compactionOutputSplitter interface once we're able to remove support for split user keys altogether.
Why would we remove userKeyChangeSplitter
? Won't we always need it to convert file size splitting into splitting at user key boundaries?
I missed some subtleties in what is happening here, in that frontiers.Advance
happens inside the compactionIter
when it steps and before we ask the compactionOutputSplitter
about splitting. Of course it can't be after, for correctness reasons.
- Will there ever be a
frontier.reached
that does not return nil? If not, why complicate the interface?
splitter requests a split by optimistically calculating the next overlap key in reached assuming our split request will be immediately respected
I now see the difficulty here.
This is a rough thought:
- We have 2 kinds of splitters: based on (a) key frontier, (b) some other characteristic
- (a) only wants to be queried about
shouldSplitBefore
when the key frontier is reached. (b) wants to be queried for each key. - both (a) and (b) want to be told about a split via
onNewOutput
.
Instead of rolling all of these needs into compactionOutputSplitter
, you have split the frontier part into a separate interface where the decision for (a) in shouldSplitBefore
is moved into reached
, and the decision is just returned in shouldSplitBefore
. This is quite clean -- I can't think of any obvious improvement.
A code comment along those lines would help a reader understand the why of this dance between the compactionOutputSplitter
and frontier
.
compaction.go
line 223 at r7 (raw file):
The user key change splitter would need to materialize an immediate successor key to use as a frontier. This would beed to be on every key. Currently, we only perform a key comparison when the wrapped splitter requests a split.
This could use a code comment too. If we did this, I suppose it would function something like:
- nil frontier for user key change splitter for most of the keys
- occasional shouldSplitBefore's will call frontier.Update with the immediate successor of the current user key that will be likely be reached very soon
- The number of comparisons we will save is small since this key will be reached soon (and likely none since it will be the top of the heap).
compaction_iter.go
line 1034 at r7 (raw file):
As long as we still need to support split user keys, there may be multiple grandparents with smallest keys with the same user key. The reached invocations will drive iteration through the grandparents. While the may return the same user key multiple times, it’s limited to the number of times the user key is a smallest key of a grandparent file.
Thanks for the explanation. Can you add a code comment stating why this feature is being supported, so we know when we could in theory remove it.
I suppose the new grandparent output splitter will only set a frontier key via Update once a certain output file size is reached, and then needs to keep changing it every time we move past the current grandparent file. I see now why it is desirable to return a key in reached
, though it still seems like a shorthand to call Update
within the reached
implementation, except for the part that we don't need to iterate over the elements of the heap to find the relevant entry.
Since we are writing our own heap logic, we could add a frontier.heapIndex
and eliminate this search. I don't have a strong opinion, so your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)
compaction.go
line 176 at r7 (raw file):
Why would we remove userKeyChangeSplitter? Won't we always need it to convert file size splitting into splitting at user key boundaries?
Sorry, my brain broke. You're right, we can't remove userKeyChangeSplitter
.
Will there ever be a
frontier.reached
that does not return nil? If not, why complicate the interface?
Yeah, the file size splitter for implementing #2156 will know about grandparents, and iteration through the grandparents will be driven by the reached
calls. Every reached
call will return the next grandparent's smallest key. But the same file size splitter may need to split at a key before the next reached
call if the current output is getting too big and no grandparent boundaries have been encountered.
I think we should revisit the compactionOutputSplitter
in that PR, when we can see all the new usages laid out in front of us and see what we can tease out.
compaction.go
line 223 at r7 (raw file):
Previously, sumeerbhola wrote…
The user key change splitter would need to materialize an immediate successor key to use as a frontier. This would beed to be on every key. Currently, we only perform a key comparison when the wrapped splitter requests a split.
This could use a code comment too. If we did this, I suppose it would function something like:
- nil frontier for user key change splitter for most of the keys
- occasional shouldSplitBefore's will call frontier.Update with the immediate successor of the current user key that will be likely be reached very soon
- The number of comparisons we will save is small since this key will be reached soon (and likely none since it will be the top of the heap).
added a comment
compaction_iter.go
line 1034 at r7 (raw file):
Previously, sumeerbhola wrote…
As long as we still need to support split user keys, there may be multiple grandparents with smallest keys with the same user key. The reached invocations will drive iteration through the grandparents. While the may return the same user key multiple times, it’s limited to the number of times the user key is a smallest key of a grandparent file.
Thanks for the explanation. Can you add a code comment stating why this feature is being supported, so we know when we could in theory remove it.
I suppose the new grandparent output splitter will only set a frontier key via Update once a certain output file size is reached, and then needs to keep changing it every time we move past the current grandparent file. I see now why it is desirable to return a key in
reached
, though it still seems like a shorthand to callUpdate
within thereached
implementation, except for the part that we don't need to iterate over the elements of the heap to find the relevant entry.
Since we are writing our own heap logic, we could add afrontier.heapIndex
and eliminate this search. I don't have a strong opinion, so your choice.
I adjusted the code comment on the reached
field to try to make this a little clearer. I removed the language about "reached is permitted to return k1
" since it seems to confuse more than anything. I think the interface of receiving reached(k2) calls until reached returns a key k3 such that k2 < k3 is clean and will still be necessary even when we have a guarantee that there are no split user keys (because there may be multiple grandparents smallest keys < k3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r9, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @bananabrick, @jbowens, and @RaduBerinde)
compaction.go
line 246 at r9 (raw file):
// // We could implement this splitter using frontiers: When the inner splitter // requests a split beofre key `k`, we'd update a frontier to be
nit: before
Introduce a new type `frontiers`, designed to monitor several different user key frontiers during a compaction. When a user key is encountered that equals or exceeds the configured frontier, the code that specified the frontier is notified and given an opportunity to set a new frontier. Internally, `frontiers` uses a heap (code largely copied from the merging iterator's heap) to avoid N key comparisons for every key. This commit refactors the `limitFuncSplitter` type to make use of `frontiers`. The `limitFuncSplitter` type is used to split flushes to L0 flush split keys, and to split both flushes and compactions to avoid excessive overlap with grandparent files. This change is motivated by cockroachdb#2156, which will introduce an additional compaction-output splitter that must perform key comparisons against the next key to decide when to split a compaction. Additionally, the `frontiers` type may also be useful for other uses, such as applying key-space-dependent logic during a compaction (eg, compaction-time GC, disaggregated storage locality policies, or keyspan-restricted snapshots cockroachdb#1810).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: 4 of 5 files reviewed, 12 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)
Introduce a new type
frontiers
, designed to monitor several different userkey frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
frontiers
uses a heap (code largely copied from the merging iterator's heap)to avoid N key comparisons for every key.
This commit refactors the
limitFuncSplitter
type to make use offrontiers
.The
limitFuncSplitter
type is used to split flushes to L0 flush split keys,and to split both flushes and compactions to avoid excessive overlap with
grandparent files.
This change is motivated by #2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the
frontiers
typemay also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots #1810).