-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: Use L0SubLevels to pick base/intra-L0 compactions #670
*: Use L0SubLevels to pick base/intra-L0 compactions #670
Conversation
@sumeerbhola This is the simpler integration you mentioned in #642. This plus flush splits plus updated preIngestDelay in Cockroach is enough to achieve the same sub-90 minute TPCC import that you observed. Save for more experimentation, I'm convinced that this is the solution we should go for. I've also left the old compaction picking code for now, but it's dead code. I recall @petermattis talking about preferring a knob to just be able to switch back to the old L0 compaction picking logic, so for now I've left it in. Basically, any code rendered dead by a Finally, testing this seems a bit wonky - I see there's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plus flush splits plus updated preIngestDelay in Cockroach is enough to achieve the same sub-90 minute TPCC import that you observed. Save for more experimentation, I'm convinced that this is the solution we should go for.
I'm quite surprised that compactions involving lower levels did not cause L0 and Lbase to build up. One of the reasons for those heuristics was to use more compaction bandwidth on L0.
Do you have a MANIFEST from one of your runs? And do you know what was the peak-sublevel count? The import experiment has a very high setting for back-pressuring so a difference in (say) a sub-level count of 20 and 50 may not amount to anything for import speed but in the real world would be important.
Reviewable status: 0 of 5 files reviewed, all discussions resolved (waiting on @jbowens and @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is highlighting the tight integration of the existing compaction code. The sprinkling of new conditionals in the code has me worried about the long term maintainability. I'd like to use this as a forcing function for introducing new abstractions around compactions. This can be done in a follow-on PR (though we should file an issue about it).
Finally, testing this seems a bit wonky - I see there's TestPickCompaction but it doesn't test pickAuto at all, it seems. I'd like to update that to use the main compaction picker as opposed to the mock one it uses right now. That's a TODO for me.
TestPickCompaction
is a holdover from the go-leveldb days. I have no attachment to it and could see it being replaced with something better.
Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 268 at r1 (raw file):
// whether the compaction was automatically scheduled or user initiated. func (c *compaction) setupInputs() { // Expand the initial inputs to a clean cut. Does nothing for L0.
It isn't obvious to me why it does nothing for L0. Are fields of compaction
not populated for L0 compactions at this point?
compaction.go, line 280 at r1 (raw file):
// Grow the sstables in c.startLevel as long as it doesn't affect the number // of sstables included from c.outputLevel. if c.lcf != nil && c.startLevel == 0 && c.outputLevel != 0 {
There is a messiness to sprinkling these conditionals in the code. While this seems manageable right now, I'm worried that it is going to lead us to a bad place. I suspect we're missing some abstraction here. Perhaps compaction
should become an interface, and then intra-L0, L0->Lbase and Ln could be different implementations of that interface which utilize a common set of helper routines.
compaction.go, line 1030 at r1 (raw file):
d.mu.versions.obsoleteTables = append(d.mu.versions.obsoleteTables, pendingOutputs...) } }
We're no longer removing the in-progress compaction (flush) on error. I imagine this would cause metamorphic test failures when error injection is present.
compaction.go, line 1204 at r1 (raw file):
info.Output.Tables = append(info.Output.Tables, e.Meta.TableInfo()) } }
We're no longer removing the in-progress compaction on error. I imagine this would cause metamorphic test failures when error injection is present.
compaction_picker.go, line 453 at r1 (raw file):
} if info.level == 0 && p.vers.L0SubLevels != nil {
This is a hefty amount of new code to add to this function. It would be cleaner to put it in a separate method:
if info.level == 0 {
if c = p.pickL0(); c != nil {
return c
}
}
compaction_picker.go, line 461 at r1 (raw file):
p.opts.L0CompactionThreshold, p.vers.Files[p.baseLevel]) if err != nil { p.opts.Logger.Infof("error when picking base compaction: %s", err.Error())
Nit: s/err.Error()/err/g
compaction_picker.go, line 463 at r1 (raw file):
p.opts.Logger.Infof("error when picking base compaction: %s", err.Error()) continue } else if lcf != nil {
Nit: no need for the else
because the if-branch always does a continue
.
compaction_picker.go, line 499 at r1 (raw file):
lcf, err = p.vers.L0SubLevels.PickIntraL0Compaction(env.earliestUnflushedSeqNum, p.opts.L0CompactionThreshold) if err != nil { p.opts.Logger.Infof("error when picking base compaction: %s", err.Error())
Nit: s/err.Error()/err/g
compaction_picker.go, line 501 at r1 (raw file):
p.opts.Logger.Infof("error when picking base compaction: %s", err.Error()) continue } else if lcf != nil {
Nit: no need for the else
because the if-branch always does a continue
.
options.go, line 284 at r1 (raw file):
// EventListener provides hooks to listening to significant DB events such as // flushes, compactions, and table deletion. EventListener EventListener
Per in-person discussion, let's add:
// Experimental contains experimental options which are off by default. These options are
// temporary and will eventually either be deleted, moved out of the experimental group,
// or made the non-adjustable default. These options may change at any time, so do not rely
// on them.
Experimental struct {
L0SublevelsCompactions bool
}
da45202
to
af82ff6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left TestPickCompaction
in place since it seems to test pickAutoHelper
pretty well. Instead I added a TestPickL0Compaction
which uses the actual compactionPickerByScore
and stresses some of these cases (the others are stressed in L0SubLevels' datadriven cases).
Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 268 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
It isn't obvious to me why it does nothing for L0. Are fields of
compaction
not populated for L0 compactions at this point?
I'll delete the comment since it seems like it's confusing in its current state. It's referring to expandInputs, not all of setupInputs.
compaction.go, line 280 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
There is a messiness to sprinkling these conditionals in the code. While this seems manageable right now, I'm worried that it is going to lead us to a bad place. I suspect we're missing some abstraction here. Perhaps
compaction
should become an interface, and then intra-L0, L0->Lbase and Ln could be different implementations of that interface which utilize a common set of helper routines.
I like the interface idea. There might be some difficulties in untangling all common and specialized code, but it should be doable.
compaction.go, line 1030 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
We're no longer removing the in-progress compaction (flush) on error. I imagine this would cause metamorphic test failures when error injection is present.
Good catch, meant to address this but missed it. Done. Added it as an "else"
compaction.go, line 1204 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
We're no longer removing the in-progress compaction on error. I imagine this would cause metamorphic test failures when error injection is present.
Done.
options.go, line 284 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Per in-person discussion, let's add:
// Experimental contains experimental options which are off by default. These options are // temporary and will eventually either be deleted, moved out of the experimental group, // or made the non-adjustable default. These options may change at any time, so do not rely // on them. Experimental struct { L0SublevelsCompactions bool }
Done.
af82ff6
to
c3c8959
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that we capitalize sublevels
as SubLevels
in some places, but I really think it should be Sublevels
as you've done for the new option. Not for this PR, but it would be good to make that consistent in a follow-up.
This looks good to go modulo a few nits and my complaint about the new test not being datadriven.
Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go, line 280 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I like the interface idea. There might be some difficulties in untangling all common and specialized code, but it should be doable.
I'm not sure if compaction
should become an interface, or if we should keep compaction
as a struct but removing all methods and fields that are related to compaction picking and create new structs that are specialized for compaction picking.
compaction.go, line 1004 at r2 (raw file):
Err: err, } d.removeInProgressCompaction(c)
Perhaps this should be moved inside runCompaction
as part of one of the defer func()
paths.
compaction_test.go, line 502 at r2 (raw file):
} func TestPickL0Compaction(t *testing.T) {
This test really wants to be datadriven. The test cases are super hard to read. I'm guessing you copied the structure of TestPickCompaction
, but note that test was inherited from go-leveldb
. Both tests should really be datadriven.
options.go, line 310 at r2 (raw file):
// The number of L0 files necessary to trigger an L0 compaction. Or, if // L0SublevelCompactions = true, the number of overlapping L0 files // necessary to trigger an L0 compaction.
Is "number of overlapping L0 files" equivalent to "number of L0 sublevels"? Or is it equivalent to "L0 read amplification"?
Rather than changing these comments now, I'd move the comment additions to next to the L0SublevelCompactions
field. Something like:
// L0SublevelCompactions enables the use of L0 sublevel-based compaction
// picking logic. Defaults to false for now. This logic will become
// a non-configurable default once it's better tuned.
//
// This option changes the interpretation of L0CompactionThreshold and
// L0StopWritesThreshold to be based on the number of L0 sublevels
// rather than the total number of files in L0.
L0SublevelCompactions bool
c3c8959
to
cc1d1e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: 0 of 7 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 1004 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Perhaps this should be moved inside
runCompaction
as part of one of thedefer func()
paths.
It needs to hold a lock on the manifest, which makes it more applicable here?
compaction_picker.go, line 453 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This is a hefty amount of new code to add to this function. It would be cleaner to put it in a separate method:
if info.level == 0 { if c = p.pickL0(); c != nil { return c } }
Done.
compaction_test.go, line 502 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This test really wants to be datadriven. The test cases are super hard to read. I'm guessing you copied the structure of
TestPickCompaction
, but note that test was inherited fromgo-leveldb
. Both tests should really be datadriven.
Done. I'll leave the other one in place as it's just testing the helper which seems a bit pointless. But I've datadriven-up this one.
options.go, line 310 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Is "number of overlapping L0 files" equivalent to "number of L0 sublevels"? Or is it equivalent to "L0 read amplification"?
Rather than changing these comments now, I'd move the comment additions to next to the
L0SublevelCompactions
field. Something like:// L0SublevelCompactions enables the use of L0 sublevel-based compaction // picking logic. Defaults to false for now. This logic will become // a non-configurable default once it's better tuned. // // This option changes the interpretation of L0CompactionThreshold and // L0StopWritesThreshold to be based on the number of L0 sublevels // rather than the total number of files in L0. L0SublevelCompactions bool
For compactions it's closer to MaxDepthAfterOngoingCompactions
and for stopping writes it's L0 read amplification. I've updated the comment accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go, line 1004 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
It needs to hold a lock on the manifest, which makes it more applicable here?
Ah, that changed from an earlier version. Can you elaborate on why we need to hold the manifest lock while doing the removal? We weren't doing that before this PR.
compaction_picker_test.go, line 467 at r3 (raw file):
opts := (*Options)(nil).EnsureDefaults() opts.Experimental.L0SublevelCompactions = true opts.L0CompactionThreshold = 2
Rather than hardcoding this value, I think it should be configurable for tests. Some of the test cases initially surprised me because I expected the default value of 4.
options.go, line 310 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
For compactions it's closer to
MaxDepthAfterOngoingCompactions
and for stopping writes it's L0 read amplification. I've updated the comment accordingly.
Got it. Looks good.
testdata/compaction_picker_L0, line 26 at r3 (raw file):
000200:[f#51,SET-l#52,SET] pick-auto
Would be nice to see a test case of L0CompactionThreshold=1
. This is true for some of the other scenarios where pick-auto
is not selecting a compaction.
testdata/compaction_picker_L0, line 67 at r3 (raw file):
pick-auto ---- 100,110 200
Nits about this format:
- Format the filenums using
%s
so we get the standard%06d
formatting. - Include the levels. Something like:
L0 000100 000110
L6 000200
Should also indicate the output level for the compaction, to make the L0->L0 compactions obvious.
cc1d1e6
to
d183b4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 1004 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Ah, that changed from an earlier version. Can you elaborate on why we need to hold the manifest lock while doing the removal? We weren't doing that before this PR.
Yep, it's because we read the f.Compacting
and related flags in sublevel creation (which happens in logAndApply
). The add/removeInProgressCompaction methods modify those flags. The race detector complains about a race between the two if two compactions terminate at the same time.
compaction_picker_test.go, line 467 at r3 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Rather than hardcoding this value, I think it should be configurable for tests. Some of the test cases initially surprised me because I expected the default value of 4.
Done.
testdata/compaction_picker_L0, line 26 at r3 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Would be nice to see a test case of
L0CompactionThreshold=1
. This is true for some of the other scenarios wherepick-auto
is not selecting a compaction.
Done.
testdata/compaction_picker_L0, line 67 at r3 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nits about this format:
- Format the filenums using
%s
so we get the standard%06d
formatting.- Include the levels. Something like:
L0 000100 000110 L6 000200
Should also indicate the output level for the compaction, to make the L0->L0 compactions obvious.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there. Just need to get to the bottom of the removeInProgressCompaction
locking.
Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go, line 1004 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Yep, it's because we read the
f.Compacting
and related flags in sublevel creation (which happens inlogAndApply
). The add/removeInProgressCompaction methods modify those flags. The race detector complains about a race between the two if two compactions terminate at the same time.
I'm confused about this. FileMetadata.Compacting
is supposed to be protected by DB.mu
. Even if two compactions finish at the same time, they will install their results sequentially (i.e. the code running here, up until logAndApply
is protected). The reason for the MANIFEST lock is that logAndApply
internally releases DB.mu
temporarily while doing the MANIFEST write.
testdata/compaction_picker_L0, line 84 at r4 (raw file):
L6: 000200 pick-auto l0_compaction_threshold=2
Is it true that if a compaction is selected at l0_compaction_threshold=n
, then it will also be selected at l0_compaction_threshold=n-1
? If yes, then I think you can remove the scenario above where l0_compaction_threshold=1
.
d183b4c
to
1059793
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 1004 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I'm confused about this.
FileMetadata.Compacting
is supposed to be protected byDB.mu
. Even if two compactions finish at the same time, they will install their results sequentially (i.e. the code running here, up untillogAndApply
is protected). The reason for the MANIFEST lock is thatlogAndApply
internally releasesDB.mu
temporarily while doing the MANIFEST write.
Ah. We're basically reading FileMetadata.Compacting
in NewL0Sublevels
while logAndApply
has held the manifest lock but released DB.mu
. I'm wondering if we can move the bve.Apply()
call in logAndApply
to be before we release DB.mu
. That would still achieve the main purpose there (of releasing DB.mu
while we do the file write), and restore the invariant of always reading Compacting
with that mutex held.
testdata/compaction_picker_L0, line 84 at r4 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Is it true that if a compaction is selected at
l0_compaction_threshold=n
, then it will also be selected atl0_compaction_threshold=n-1
? If yes, then I think you can remove the scenario above wherel0_compaction_threshold=1
.
Yes. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go, line 1004 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Ah. We're basically reading
FileMetadata.Compacting
inNewL0Sublevels
whilelogAndApply
has held the manifest lock but releasedDB.mu
. I'm wondering if we can move thebve.Apply()
call inlogAndApply
to be before we releaseDB.mu
. That would still achieve the main purpose there (of releasingDB.mu
while we do the file write), and restore the invariant of always readingCompacting
with that mutex held.
The bve.Apply()
call may not be cheap, which makes it nice to perform without holding DB.mu
. Perhaps the sublevel initialization can be split up so that part of it is performed initially to form the sublevels, then a second portion is performed to initialize the compaction-specific state.
1059793
to
527c782
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 1004 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
The
bve.Apply()
call may not be cheap, which makes it nice to perform without holdingDB.mu
. Perhaps the sublevel initialization can be split up so that part of it is performed initially to form the sublevels, then a second portion is performed to initialize the compaction-specific state.
Done. Added an InitCompactingFileInfo
. PTAL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go, line 1004 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Done. Added an
InitCompactingFileInfo
. PTAL!
Now that we don't need to hold versions.logLock
when calling removeInProgressCompaction
, can we revisit my idea to move this call into a defer inside of runCompaction
? Or does there remain a reason why you want to perform it on the success path with versions.logLock
held? If yes, it would be useful to explain why in a comment because it isn't obvious.
internal/manifest/l0_sublevels.go, line 334 at r5 (raw file):
// InitCompactingFileInfo initializes internal flags relating to compacting // files. Must be called after sublevel initialization.
Let's add a comment to NewL0SubLevels
that it is called without DB.mu
held and thus can't access fields of FileMetadata
that are protected by DB.mu
such as Compacting
and IsIntraL0Compacting
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 1004 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Now that we don't need to hold
versions.logLock
when callingremoveInProgressCompaction
, can we revisit my idea to move this call into a defer inside ofrunCompaction
? Or does there remain a reason why you want to perform it on the success path withversions.logLock
held? If yes, it would be useful to explain why in a comment because it isn't obvious.
In theory that should work. I've been digging into why moving this removeInProgressCompaction
line to right before logLock()
causes sstable refcount not 0
errors in TestDBConcurrentCompactClose
. Don't have a full explanation yet, but on it.
1942f22
to
e3651dd
Compare
This change updates the compaction picker to call into L0SubLevels for picking base and intra-L0 compactions, instead of the previous method of picking compactions. The definition of L0StopWritesThreshold and L0CompactionThreshold has now been updated to refer to the number of overlapping L0 files as opposed to the previous definition (number of L0 files in all).
e3651dd
to
4a47e2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 1004 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
In theory that should work. I've been digging into why moving this
removeInProgressCompaction
line to right beforelogLock()
causessstable refcount not 0
errors inTestDBConcurrentCompactClose
. Don't have a full explanation yet, but on it.
I think I've figured it out now; the view of in progress compactions inside logAndApply
was inaccurate, since we were calling removeInProgressCompaction
too early. Now that we have InitCompactionFileInfo
we no longer need to move the removeInProgressCompaction
call early to begin with, as long as we call it again after changing the Compacting
flags.
internal/manifest/l0_sublevels.go, line 334 at r5 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Let's add a comment to
NewL0SubLevels
that it is called withoutDB.mu
held and thus can't access fields ofFileMetadata
that are protected byDB.mu
such asCompacting
andIsIntraL0Compacting
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
compaction.go, line 1004 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I think I've figured it out now; the view of in progress compactions inside
logAndApply
was inaccurate, since we were callingremoveInProgressCompaction
too early. Now that we haveInitCompactionFileInfo
we no longer need to move theremoveInProgressCompaction
call early to begin with, as long as we call it again after changing theCompacting
flags.
Got it. Glad you tracked this down.
TFTR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)
This change updates the compaction picker to call into L0SubLevels
for picking base and intra-L0 compactions, instead of the previous
method of picking compactions.
The definition of L0StopWritesThreshold and L0CompactionThreshold
has now been updated to refer to the number of overlapping L0 files
as opposed to the previous definition (number of L0 files in all).