-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rwlock: flush and fsync #3076
rwlock: flush and fsync #3076
Conversation
It is important that we ensure that rwlocked contents are flushed to disk, because otherwise other rwlock readers might get outdated state. Kudos @shcheklein for catching this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adding a link to the issue, @efiop ?
This one is hard to understand at first, even with the comment 😅
@MrOutis There is no issue, it is just a common practice to ensure that data hits the disc. From fsync docs:
|
I don't think this is the issue. You can't get old data from fs when you don't do fsync. It is only needed if you care about disk going power off, not our case. Sure we need to be sure python is passing things to fs, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I suggest dropping it.
@Suor Imagine some network-fs, where you have local cache and others won't get the changes until you fsync. And well, not breaking the file on power-outage is nice too 🙂 That being said, those concerns apply to regular stage-files as well, as we rely on them to build a DAG and check for conflicts. But we are not protecting them with fsync, so this PR solves potential issues, but only partially. So far we haven't seen any related issues being reported, but usually, those types of issues are really hard to debug and often only a skilled developer that knows what to look for would think about |
@efiop Power outage only matter for databases that serve users not for our case. I.e. a user gets ok, thinks his change was written, acts on it, but then the change is lost later because data have not been actually written to disc before power out. If there is no user then there is no discrepancy and we shouldn't protect from it. So this is a non-issue for us. If that is not enough then this fix won't fix it anyway since we don't flush and fsync all the other things like stage files and user code doesn't flush and fsync their outs. |
@Suor As I've noted above, power outage is not our main concern, but rather wonky filesystems. And I would rather have this fsync than not. Just think of it as another level of protection. |
This doesn't add any new level of protection while tricking people into thinking it does. A dangerous situation. |
@Suor it only protects rwlock from wonky filesystems, not other things. Not dangerous by itself. |
@efiop file close causes a commit on NFS, same as flush, so it doesn't add any new protection for |
@Suor If it was that way, no one would be doing fsync manually.
I don't think that is true. flock that we use for .dvc/lock doesn't suffer from those issues AFAIK, and rwlock is now covered, so the only place where we are not covered is stage file dumps. So I agree that this PR is not crucial, but it doesn't hurt either. Please feel free to create a PR to revert it if you disagree. |
It is important that we ensure that rwlocked contents are flushed to
disk, because otherwise other rwlock readers might get outdated state.
Kudos @shcheklein for catching this one.
❗ Have you followed the guidelines in the Contributing to DVC list?
📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏