-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix manifest corruption #1756
Fix manifest corruption #1756
Conversation
…ncy on os.file.Sync()
…y on os.file.Sync()
4f2f2cf
to
bf9578f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome find @iluminae! Thanks for fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @iluminae and @manishrjain)
a discussion (no related file):
Awesome @iluminae, this is a nice catch.
Thanks @jarifibrahim for the review. :)
fyi @billprovince @joshua-goldstein - +1 from me to get this in |
…rrent compactions (#1756) `manifestFile.rewrite` closes the file due to which `manifest.fp.Sync()` can fail. This leads to an updated manifest but non-updated tables set. Hence, the compaction process enters into an infinite loop failing with the `MANIFEST removes non-existing table` error. This PR fixes it. (cherry picked from commit 6ed45ae)
…rrent compactions (#1756) `manifestFile.rewrite` closes the file due to which `manifest.fp.Sync()` can fail. This leads to an updated manifest but non-updated tables set. Hence, the compaction process enters into an infinite loop failing with the `MANIFEST removes non-existing table` error. This PR fixes it. (cherry picked from commit 6ed45ae)
…rrent compactions (#1756) `manifestFile.rewrite` closes the file due to which `manifest.fp.Sync()` can fail. This leads to an updated manifest but non-updated tables set. Hence, the compaction process enters into an infinite loop failing with the `MANIFEST removes non-existing table` error. This PR fixes it. (cherry picked from commit 6ed45ae)
…rrent compactions (#1756) (#1820) This PR is a cherry pick of #1756. Addresses [this issue](https://discuss.dgraph.io/t/badgerdb-manifest-corruption-issue-solution/15915) on Discuss and[ this issue](#1819) on Badger. Co-authored-by: Kenan Kessler <[email protected]>
…rrent compactions (#1756) (#1820) This PR is a cherry pick of #1756. Addresses [this issue](https://discuss.dgraph.io/t/badgerdb-manifest-corruption-issue-solution/15915) on Discuss and[ this issue](#1819) on Badger. Co-authored-by: Kenan Kessler <[email protected]>
Update Badger from `7677fcb` to `6ed45ae` (on main-deprecated branch). This fixes dgraph-io/badger#1752 and dgraph-io/badger#1756. First CI failed [here](https://github.com/dgraph-io/dgraph/actions/runs/4208376321/jobs/7304322951). This issue was resolved by bumping up the Go version used in the container.
…rrent compactions (dgraph-io#1756) `manifestFile.rewrite` closes the file due to which `manifest.fp.Sync()` can fail. This leads to an updated manifest but non-updated tables set. Hence, the compaction process enters into an infinite loop failing with the `MANIFEST removes non-existing table` error. This PR fixes it.
This fixes the manifest corruption posted about here
I was able to reproduce the race condition every time by injecting a sleep into the sync call during tests only. I am not sure if you guys would rather have this or a test that is much longer and may or may not reproduce based on high concurrency (but does not inject the sleep) or no test at all.
Let me know - I made it so we can just revert the test commit and it should be good to go.
This change is