Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement WAL recycling #95

Merged
merged 2 commits into from
May 3, 2019
Merged

Implement WAL recycling #95

merged 2 commits into from
May 3, 2019

Conversation

petermattis
Copy link
Collaborator

WAL recycling is an important performance optimization as it is faster
to sync a file that has already been written, than one which is being
written for the first time. This is due to the need to sync file
metadata when a file is being written for the first time. Note this is
true even if file preallocation is performed (e.g. fallocate).

Fixes #41

@petermattis petermattis requested a review from ajkr April 28, 2019 00:09
@petermattis
Copy link
Collaborator Author

This change is Reviewable

@petermattis
Copy link
Collaborator Author

I need to add some dedicated tests of this functionality, but it works in manual testing.

@petermattis petermattis force-pushed the pmattis/log-recycle branch 2 times, most recently from 6bd28fe to 9de1664 Compare April 28, 2019 15:50
@petermattis petermattis changed the title [WIP] Implement WAL recycling Implement WAL recycling Apr 28, 2019
@petermattis
Copy link
Collaborator Author

Added some tests of this functionality, and also stumbled over a bug where DB.Flush when there was an empty memtable would hang forever. Fixed up the problematic code which was an optimization for handling empty memtables that was invalid.

This is ready for a look.

WAL recycling is an important performance optimization as it is faster
to sync a file that has already been written, than one which is being
written for the first time. This is due to the need to sync file
metadata when a file is being written for the first time. Note this is
true even if file preallocation is performed (e.g. fallocate).

Fixes #41
@petermattis petermattis force-pushed the pmattis/log-recycle branch from 34dc41a to d50375f Compare May 1, 2019 11:24
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 7 files at r1, 4 of 4 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @petermattis)


compaction.go, line 993 at r2 (raw file):

	}
	// We sort to make the order of deletions deterministic, which is nice for
	// tests.

Does the logic for recycling only files with the highest number seen so far also depend on this sort now?


log_recycler.go, line 31 at r2 (raw file):

		// The log file number was already considered for recycling. Don't consider
		// it again. This avoids a race between adding the same log file for
		// recycling multiple times, and removing the log file for actual

Does it still get added multiple times now that the obsolete file deletion is serialized?

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Looks great, just a couple questions.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @petermattis)

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ajkr)


compaction.go, line 993 at r2 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

Does the logic for recycling only files with the highest number seen so far also depend on this sort now?

Kind of. If we removed the sort we could see multiple WALs out of file num order and decide to only recycle the first. That doesn't seem like a huge problem. Additionally, normal steady state is that there is only one WAL available to be deleted at a time.

Note that the code here is changing in https://github.com/petermattis/pebble/pull/99.


log_recycler.go, line 31 at r2 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

Does it still get added multiple times now that the obsolete file deletion is serialized?

Yes. The scenario I saw was that a flush would finish and call deleteObsoleteFiles. That would add the WAL to the recycler. The recycled WAL would then be selected for reuse. Before the WAL was renamed, a compaction would finish, call deleteObsoleteFiles, and again add the WAL to the recycler. The WAL rename would complete, but now we're in a situation where the recycler contains a WAL file num which no longer exists on disk.

@petermattis petermattis merged commit c3ffc29 into master May 3, 2019
@petermattis petermattis deleted the pmattis/log-recycle branch May 3, 2019 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf: investigate WAL file reuse
2 participants