-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement WAL recycling #95
Conversation
I need to add some dedicated tests of this functionality, but it works in manual testing. |
6bd28fe
to
9de1664
Compare
Added some tests of this functionality, and also stumbled over a bug where This is ready for a look. |
WAL recycling is an important performance optimization as it is faster to sync a file that has already been written, than one which is being written for the first time. This is due to the need to sync file metadata when a file is being written for the first time. Note this is true even if file preallocation is performed (e.g. fallocate). Fixes #41
34dc41a
to
d50375f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 7 files at r1, 4 of 4 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @petermattis)
compaction.go, line 993 at r2 (raw file):
} // We sort to make the order of deletions deterministic, which is nice for // tests.
Does the logic for recycling only files with the highest number seen so far also depend on this sort now?
log_recycler.go, line 31 at r2 (raw file):
// The log file number was already considered for recycling. Don't consider // it again. This avoids a race between adding the same log file for // recycling multiple times, and removing the log file for actual
Does it still get added multiple times now that the obsolete file deletion is serialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just a couple questions.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ajkr)
compaction.go, line 993 at r2 (raw file):
Previously, ajkr (Andrew Kryczka) wrote…
Does the logic for recycling only files with the highest number seen so far also depend on this sort now?
Kind of. If we removed the sort we could see multiple WALs out of file num order and decide to only recycle the first. That doesn't seem like a huge problem. Additionally, normal steady state is that there is only one WAL available to be deleted at a time.
Note that the code here is changing in https://github.com/petermattis/pebble/pull/99.
log_recycler.go, line 31 at r2 (raw file):
Previously, ajkr (Andrew Kryczka) wrote…
Does it still get added multiple times now that the obsolete file deletion is serialized?
Yes. The scenario I saw was that a flush would finish and call deleteObsoleteFiles
. That would add the WAL to the recycler. The recycled WAL would then be selected for reuse. Before the WAL was renamed, a compaction would finish, call deleteObsoleteFiles
, and again add the WAL to the recycler. The WAL rename would complete, but now we're in a situation where the recycler contains a WAL file num which no longer exists on disk.
WAL recycling is an important performance optimization as it is faster
to sync a file that has already been written, than one which is being
written for the first time. This is due to the need to sync file
metadata when a file is being written for the first time. Note this is
true even if file preallocation is performed (e.g. fallocate).
Fixes #41