-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/rawdb: fsync the index file after each freezer write #28483
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rjl493456442
force-pushed
the
freezer-poweroff-fix
branch
from
November 8, 2023 05:49
8b19fe9
to
7072af7
Compare
rjl493456442
changed the title
core/rawdb: fsync the index and data file after each freezer write
core/rawdb: fsync the index file after each freezer write
Nov 9, 2023
triage discussion: let's benchmark fsyncing the head-file too, since otherwise we might up with corrupted data (all zeroes) in the actual data content, even if the index is correct. |
holiman
approved these changes
Nov 10, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 💯
devopsbo3
pushed a commit
to HorizenOfficial/go-ethereum
that referenced
this pull request
Nov 10, 2023
…8483) * core/rawdb: fsync the index and data file after each freezer write * core/rawdb: fsync the data file in freezer after write
devopsbo3
added a commit
to HorizenOfficial/go-ethereum
that referenced
this pull request
Nov 10, 2023
…hereum#28483)" This reverts commit aee1721.
devopsbo3
added a commit
to HorizenOfficial/go-ethereum
that referenced
this pull request
Nov 10, 2023
…hereum#28483)" This reverts commit aee1721.
Dergarcon
pushed a commit
to specialmechanisms/mev-geth-0x2mev
that referenced
this pull request
Jan 31, 2024
…8483) * core/rawdb: fsync the index and data file after each freezer write * core/rawdb: fsync the data file in freezer after write
maoueh
pushed a commit
to streamingfast/go-ethereum
that referenced
this pull request
Feb 23, 2024
…8483) * core/rawdb: fsync the index and data file after each freezer write * core/rawdb: fsync the data file in freezer after write
colinlyguo
pushed a commit
to scroll-tech/go-ethereum
that referenced
this pull request
Oct 31, 2024
…8483) * core/rawdb: fsync the index and data file after each freezer write * core/rawdb: fsync the data file in freezer after write
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request tries to fix an issue in freezer, raised by #28105
Let's briefly recap the issue: If a power failure occurs during the execution in path model, Geth may not be able to recover in the subsequent restart.
After investigating a bit further, I discovered that the underlying state freezer is corrupted. Specifically, the freezer has experienced corruption in its index files, which now contain indexes with the values {file_num = 0; offset = 0} at the end.
Upon accessing the freezer table, it will initiate a repair process by truncating the excess bytes in the data file. However, due to the "zero" index at the end, the freezer table truncates the entire data file while retaining the complete index file.
Therefore, freezer table enters a very weird status that items indexes are present but their corresponding data is missing.
The root cause is why index file will contain zero indexes at the end after the power failure? It turns out it's relevant with OS file management.
According to Linux [manual][(https://man7.org/linux/man-pages/man2/write.2.html), it said that the file size is expanded when we try to write some items at the end of the file. In another word, the file offset will be moved even without flushing the associated data from OS cache from underlying disk.
This feature is used to ensure concurrent file writers won't overwrite with each other by atomically increasing the file offset.
Also from the documentation of sqlite, it also points out this issue at 6.2
We can conclude that the issue is relevant with the async file write.
How to fix the issue in order to survive the power failure?
Detect the garbage index data and truncate them in the repair stage
It's a straightforward idea that we can blindly truncate the index items with {0,0} values at the end of the file. However, due to the fact that freezer might store 0-size items, {0, 0} might be a valid index if all previous items + itself are all zero size.
So the opening question is left here how can we avoid this special scenario and correctly detect the garbage data?
Sync the file after each freezer write
We can minimize the possibility of having "garbage" zero value index items by SYNC index file after each write.
Once the index file is sync'd, it will ensure the stored index item is complete. But what if the power failure occurs
between the file.Write() and file.Sync()?
Apparently, sqlite uses another marker to track the item number in the index file, and use a two step SYNC approach
to address it. Essentially, it flushes the index file as the first step and then update this marker as the second step.
In this approach, the marker will be the indicator how many index items we expect and truncate all other above.
This approach is a bit over-complicated and might degrade the performance.