-
-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create: better handling "file changed" warnings, mitigation by retries and/or reflinks #6622
Comments
reflink: Guess we could only use that as an optional feature and fall back to normal open if it is not present. But I am not really sure whether it could be implemented like that / how much effort that would be. We use OS-level FDs, Python file objects and for some stuff also "by filename" access to files. If somebody wants to try, just make a PR. |
Would you be more keen to accept a PR with retry on file changed? If so I could look into adding it. |
Well, all methods have their pros and cons. Doing nothing is the easiest, people wanting to have stable files can still use fs or block level snapshotting (IF they have it). reflink has the platform and fs support issue, people could use it if we support that and IF they have it. Retrying might work, but we can't be sure (if there are many ongoing writes, it can well be that all the retries fail). If a PR would introduce a rather simple change that works for most cases, I guess it could be acceptable. |
BTW, we must not create multiple borg metadata stream items for the same fs item when "retrying". |
Well, the world is not black and white. A lot of software only differentiates between all-good and error. borg has all-good (0), definitely an error (2) and warning (1). the latter means that you have to look at the logs. it might be something harmless or something in a file you do not really care about, but it might also be your important sqlite db which has changed while we backed it up. |
I'd just like to pitch in on this feature request. I have some ~200TiB of data that is backed up every 5 minutes. 2 out of 3 attempts results in RC 1. I want to be sure that I got the whole file, so I set it up to retry the No biggie. But it is some extra load on the system. Having a retry-if-changed on the file level would be great to solve this. This is CephFS. I don't think there's a "reflink" concept. |
https://gitlab.com/rubdos/pyreflink separate python/cpython#81338 reflink support ticket in Python's issue tracker.
BTW, we have this consistency/snapshot problem not only within the file contents, but also in the file metadata (like xattrs, ACLs, etc.). |
Guess pyreflink is a bit problematic:
Can we close this in favour of #7346? |
Until we have something else, can we have a simple flag "--no-file-changed-warning" that exits with code 0 instead of a warning? Because now scripts like
will not work anymore. |
@gellnerm if you want to handle errors, you need to check for rc 2. rc 1 is warnings, rc 0 is "all ok". |
Thanks, now I'm using
|
Superseded by #7346. |
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
ISSUE
System information. For client/server mode post info for both machines.
Gentoo, ~amd64.
Your borg version (borg -V).
borg 1.2.0
Operating system (distribution) and version.
Hardware / network configuration, and filesystems used.
How much data is handled by borg?
~20 GB
Full borg commandline that lead to the problem (leave away excludes and passwords)
borg create --compression zstd,12 -v --progress --stats --one-file-system --exclude-if-present .borg-exclude-dir /mnt/dropzone/backups/borgbackup/home::{hostname}home_piotr{now:%Y-%m-%d_%H:%M:%S} /home/piotr
Describe the problem you're observing.
For quite a while I am facing issues with 'borg create' failing with non zero exit code when it fails to backup some .sqlite or .sqlite-wal files in ~/.mozilla/firefox as they change while read. Currently there's no retry feature in create command which means I need to do another full tree backup up until it finally can back those files.
The quick solution to it would be to add a feature to 'create' command to retry up to N times with X backoff period in between retries in case it cannot backup file because it changed, like '--file-changed-retry 5,1 to retry up to 5 times with 1s sleep in between.
The another idea is using reflink.
Reflinks are more and more popular, latest versions of coreutils and midnight commander actually defaults to use them, In short, reflinks are file-level zero-copy snapshots, Currently XFS, BTRFS and ZFS support them.
The great feature would be if borg, in case of 'file changed during read' would try to use reflink instead for the time of backup and then just close the FD.
Linux years ago got the O_TMPFILE to open(2) call, which allow you to get a unnamed description in given directory (that you can later linkat() once you fill it with data), and the copy_file_range(2) into the file descriptor, then read the fd as backup file. Since kernel 4.5 the copy_file_range() calls actually uses reflink, if possible, what it means it would allow borg to actually backup file that does change during backup, because the fd that borg has is a zero-copy clone.
It would be equal to doing something like this manually (pseudocode):
I use reflinks to make crash consistent backups of my virtual machines that hare hundreds gigabytes of size, by reflinking, backing up the data and then removing them. Creating reflink is instant, it does not matter how long the reflinked file is read, since it will never chnage, and then it is purged. It would be perfect to have reflink support in borg, but I understand it would require tier 3 hackery with CPython and would still be limited to only work on supported file systems (xfs, btrfs, zfs), only on Linux and only if borg process can get write access to the directory that contain file that is being backup (open with O_TMPFILE will require it).
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Half of my backups fails.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: