-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: refactor obsolete file deletion code #2641
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Reviewed 11 of 17 files at r1, all commit messages.
Reviewable status: 11 of 17 files reviewed, 8 unresolved discussions (waiting on @RaduBerinde)
cleaner.go
line 50 at r1 (raw file):
// In practice, we should rarely have more than a couple of jobs (in most cases // we Wait() after queueing a job). const jobsChLen = 10000
is this maybe too big? With 8-byte words, 80KB?
cleaner.go
line 103 at r1 (raw file):
// Note that logs in obsoleteFiles might be recycled instead of deleted. // // Returns a channel that is closed when the job completes.
Looks like this comment is out-of-date.
cleaner.go
line 120 at r1 (raw file):
} // Something is terribly wrong... Just drop the job. cm.opts.Logger.Infof("cleanup jobs queue full")
this makes me nervous even though it shouldn't happen. since enqueues are already mutex-protected, should we use a preallocated slice instead of a channel? if the slice becomes full, we can always suffer the allocation to grow it.
compaction.go
line 2636 at r1 (raw file):
d.updateTableStatsLocked(ve.NewFiles) } d.deleteObsoleteFiles(jobID, true /* waitForCompletion */)
I think this is a behavior change. Previously, with deletion pacing enabled,waitForOngoing
would determine whether or not we should wait for our turn to call doDeleteObsoleteFiles
, but that function wouldn't wait for the files to physically be deleted.
compaction.go
line 3488 at r1 (raw file):
d.mu.Unlock() defer d.mu.Lock() d.cleanupManager.Wait()
I didn't follow this. why do we wait for the cleaner but only if d.mu.disableFileDeletions <= 1
? couldn't another goroutine have just incremented disableFileDeletions
and dropped the mu to wait for the cleaner. Then when the next goroutine increments, it avoids waiting. Shouldn't they receive the same treatment?
compaction.go
line 3569 at r1 (raw file):
// Release d.mu while doing I/O // Note the unusual order: Unlock and then Lock.
this function no longer does I/O directly, right? Should we update this comment?
compaction.go
line 3583 at r1 (raw file):
} _, noRecycle := d.opts.Cleaner.(base.NeedsFileContents) filesToDelete := make([]obsoleteFile, 0, len(files))
looking at this existing code, is there any reason for this files
array? couldn't we populate a single filesToDelete
slice up above, and then down here perform all the same logic we do within the below for loop?
relatedly, I think this capacity for the filesToDelete
slice wasn't the intended capacity. len(files)
is always 4.
event_listener_test.go
line 115 at r1 (raw file):
defer d.mu.Unlock() d.enableFileDeletions() d.TestOnlyWaitForCleaning()
It looks d.mu
is held here which is prohibited when calling cleanupManager.Wait(), right?
07877ee
to
a01ee60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 17 files reviewed, 9 unresolved discussions (waiting on @jbowens)
cleaner.go
line 50 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
is this maybe too big? With 8-byte words, 80KB?
Reduced to 1000
cleaner.go
line 103 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Looks like this comment is out-of-date.
Done.
cleaner.go
line 120 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
this makes me nervous even though it shouldn't happen. since enqueues are already mutex-protected, should we use a preallocated slice instead of a channel? if the slice becomes full, we can always suffer the allocation to grow it.
I'm hesitant to stop using the channel because the synchronization is much easier.. We could also do a blocking send, so that we wait for some jobs to finish? We already call EnqueueJob without holding the mutex.
compaction.go
line 2636 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I think this is a behavior change. Previously, with deletion pacing enabled,
waitForOngoing
would determine whether or not we should wait for our turn to calldoDeleteObsoleteFiles
, but that function wouldn't wait for the files to physically be deleted.
But it would wait for any deletion that was just queued (e.g. from maybeScheduleObsoleteTableDeletion
) to physically delete files.. If we accept that maybeScheduleObsoleteTableDeletion
can run at any time, then we should treat both cases the same..
One could argue that the waiting was just an artifact of the poor design (no queue) and we really don't want to wait anywhere.. But I think we want some kind of backpressure - with a low pacing limit, the deletions could become the bottleneck and we don't want them to fall behind indefinitely..
compaction.go
line 3488 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I didn't follow this. why do we wait for the cleaner but only if
d.mu.disableFileDeletions <= 1
? couldn't another goroutine have just incrementeddisableFileDeletions
and dropped the mu to wait for the cleaner. Then when the next goroutine increments, it avoids waiting. Shouldn't they receive the same treatment?
Nice catch, fixed.
compaction.go
line 3569 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
this function no longer does I/O directly, right? Should we update this comment?
Done.
compaction.go
line 3583 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
looking at this existing code, is there any reason for this
files
array? couldn't we populate a singlefilesToDelete
slice up above, and then down here perform all the same logic we do within the below for loop?relatedly, I think this capacity for the
filesToDelete
slice wasn't the intended capacity.len(files)
is always 4.
We'd have to do the "conversion" from fileInfo
to obsoleteFile
in each place (or at least somehow associate the fileType
with each entry), and in the loop we'd have to delete recycled logs from the slice. Not sure it ends up better so I'll leave it as is. I fixed the capacity.
compaction.go
line 2118 at r2 (raw file):
} d.deleteObsoleteFiles(jobID, true /* waitForCompletion */)
Note that I switched this call to wait so that the comment below makes sense.. Not that it made sense with the previous code though (with pacing enabled)..
event_listener_test.go
line 115 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
It looks
d.mu
is held here which is prohibited when calling cleanupManager.Wait(), right?
Done.
a01ee60
to
7ef9fbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just have an open question of what to do to avoid backpressuring compactions.
Reviewed 3 of 5 files at r3, all commit messages.
Reviewable status: 12 of 17 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)
cleaner.go
line 120 at r1 (raw file):
Previously, RaduBerinde wrote…
I'm hesitant to stop using the channel because the synchronization is much easier.. We could also do a blocking send, so that we wait for some jobs to finish? We already call EnqueueJob without holding the mutex.
Yeah, I think a blocking send is a good option.
compaction.go
line 2636 at r1 (raw file):
Previously, RaduBerinde wrote…
But it would wait for any deletion that was just queued (e.g. from
maybeScheduleObsoleteTableDeletion
) to physically delete files.. If we accept thatmaybeScheduleObsoleteTableDeletion
can run at any time, then we should treat both cases the same..One could argue that the waiting was just an artifact of the poor design (no queue) and we really don't want to wait anywhere.. But I think we want some kind of backpressure - with a low pacing limit, the deletions could become the bottleneck and we don't want them to fall behind indefinitely..
If the pacing is limiting the rate of deletions, I think the disk-space utilization heuristic will at least prevent us from exhausting the storage. The deletion pacing exists to protect foreground write latency by avoiding sudden bursts of deletions. I don't think we actually want to backpressure compactions which can degrade foreground read performance.
IMO our current deletion pacing is a little lacking, because it doesn't adapt to disk throughput. We want to smooth the deletions over time but not outright limit them. We could consider adapting the rate according to the deletions requested over a recent long window, so we allow the rate to increase if we're deleting more files, but we still smooth it. But I think that's for future work.
compaction.go
line 3583 at r1 (raw file):
Previously, RaduBerinde wrote…
We'd have to do the "conversion" from
fileInfo
toobsoleteFile
in each place (or at least somehow associate thefileType
with each entry), and in the loop we'd have to delete recycled logs from the slice. Not sure it ends up better so I'll leave it as is. I fixed the capacity.
Got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 17 files reviewed, 3 unresolved discussions (waiting on @jbowens)
cleaner.go
line 120 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Yeah, I think a blocking send is a good option.
Done.
compaction.go
line 2636 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
If the pacing is limiting the rate of deletions, I think the disk-space utilization heuristic will at least prevent us from exhausting the storage. The deletion pacing exists to protect foreground write latency by avoiding sudden bursts of deletions. I don't think we actually want to backpressure compactions which can degrade foreground read performance.
IMO our current deletion pacing is a little lacking, because it doesn't adapt to disk throughput. We want to smooth the deletions over time but not outright limit them. We could consider adapting the rate according to the deletions requested over a recent long window, so we allow the rate to increase if we're deleting more files, but we still smooth it. But I think that's for future work.
Ok, I can try to switch to not wait in the compaction and ingest code paths. I'll see how I can make the tests deterministic..
Should an explicit db.Compact()
call wait for cleanup?
Previously, RaduBerinde wrote…
Hm, I guess maybe not, because we may be compacting a tiny range but there might be background cleanup across the entire key space |
7ef9fbe
to
c0b26f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 19 files reviewed, 3 unresolved discussions (waiting on @jbowens)
compaction.go
line 2636 at r1 (raw file):
Previously, RaduBerinde wrote…
Hm, I guess maybe not, because we may be compacting a tiny range but there might be background cleanup across the entire key space
Ok, I think I got it working. For one test I had to use an "always wait" hack but for others waiting for cleanup after Compact calls and such was sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 19 files reviewed, 2 unresolved discussions (waiting on @RaduBerinde)
cleaner.go
line 120 at r1 (raw file):
Previously, RaduBerinde wrote…
Done.
Thanks! #2662 validates that this is the right move, because our fixed deletion pacing rate can cause us to fall behind unboundedly on file deletions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 19 files reviewed, 2 unresolved discussions (waiting on @jbowens)
compaction.go
line 2118 at r2 (raw file):
Previously, RaduBerinde wrote…
Note that I switched this call to wait so that the comment below makes sense.. Not that it made sense with the previous code though (with pacing enabled)..
I'm still not very sure about this wait here. Maybe it doesn't matter much anyway? AFAICT we only use Flush()
for testing in CRDB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 19 files reviewed, 2 unresolved discussions (waiting on @RaduBerinde)
compaction.go
line 2118 at r2 (raw file):
Previously, RaduBerinde wrote…
I'm still not very sure about this wait here. Maybe it doesn't matter much anyway? AFAICT we only use
Flush()
for testing in CRDB
Thanks for calling this out again. This function will get called for non-explicit flushes when the memtable becomes full too. Waiting will prevent new flushes from scheduling. I don't think we do want to wait here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 19 files reviewed, 2 unresolved discussions (waiting on @jbowens)
cleaner.go
line 120 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Thanks! #2662 validates that this is the right move, because our fixed deletion pacing rate can cause us to fall behind unboundedly on file deletions.
Makes sense.. The only remaining place where we wait is during Open()
.. I don't think we want to wait there either, do we?
The code around obsolete file deletion is very convoluted and there is one known race (described in cockroachdb#2630). This change reworks the deletion code as follows: - we add a cleanup manager which has a background goroutine. - all file and object deletions happen in the manager's background goroutine. - callers can enqueue cleanup jobs with the manager - tests can wait for outstanding cleanup jobs to finish We also add a missing call to `deleteObsoleteObjects` from the ingest path (which now can cause files to be removed through excising). Finally, we fix an incorrect `Remove` call in the error path of `Checkpoint()` (which was missing a `PathJoin`). Fixes cockroachdb#2630.
c0b26f2
to
be00bcc
Compare
I removed the wait flag altogether; we now never wait for cleanup except for some tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 11 files at r4, 7 of 7 files at r5, all commit messages.
Reviewable status: 19 of 22 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde)
TFTR! |
Before cockroachdb#2641, we had a conditional to explicitly schedule deletion of obsolete files in db.Close(). We took it out as part of the deletion refactor as the comment on it made it seem no longer necessary. However, the code for readState.unrefLocked(), which we call earlier in Close(), explicitly states that it does not schedule obsolete file deletion and that it expects the caller to schedule that. Since we're calling readState.unrefLocked() here, we should be doing said deletion. Fixes cockroachdb#2708.
Before #2641, we had a conditional to explicitly schedule deletion of obsolete files in db.Close(). We took it out as part of the deletion refactor as the comment on it made it seem no longer necessary. However, the code for readState.unrefLocked(), which we call earlier in Close(), explicitly states that it does not schedule obsolete file deletion and that it expects the caller to schedule that. Since we're calling readState.unrefLocked() here, we should be doing said deletion. Fixes #2708.
The code around obsolete file deletion is very convoluted and there is
one known race (described in #2630).
This change reworks the deletion code as follows:
goroutine.
We also add a missing call to
deleteObsoleteObjects
from the ingestpath (which now can cause files to be removed through excising).
Finally, we fix an incorrect
Remove
call in the error path ofCheckpoint()
(which was missing aPathJoin
).Fixes #2630.