create: better handling "file changed" warnings, mitigation by retries and/or reflinks #6622

fff7d1bc · 2022-04-18T21:37:47Z

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

ISSUE

System information. For client/server mode post info for both machines.

Gentoo, ~amd64.

Your borg version (borg -V).

borg 1.2.0

Operating system (distribution) and version.

Hardware / network configuration, and filesystems used.

How much data is handled by borg?

~20 GB

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create --compression zstd,12 -v --progress --stats --one-file-system --exclude-if-present .borg-exclude-dir /mnt/dropzone/backups/borgbackup/home::{hostname}home_piotr{now:%Y-%m-%d_%H:%M:%S} /home/piotr

Describe the problem you're observing.

For quite a while I am facing issues with 'borg create' failing with non zero exit code when it fails to backup some .sqlite or .sqlite-wal files in ~/.mozilla/firefox as they change while read. Currently there's no retry feature in create command which means I need to do another full tree backup up until it finally can back those files.

The quick solution to it would be to add a feature to 'create' command to retry up to N times with X backoff period in between retries in case it cannot backup file because it changed, like '--file-changed-retry 5,1 to retry up to 5 times with 1s sleep in between.

The another idea is using reflink.

Reflinks are more and more popular, latest versions of coreutils and midnight commander actually defaults to use them, In short, reflinks are file-level zero-copy snapshots, Currently XFS, BTRFS and ZFS support them.

The great feature would be if borg, in case of 'file changed during read' would try to use reflink instead for the time of backup and then just close the FD.

Linux years ago got the O_TMPFILE to open(2) call, which allow you to get a unnamed description in given directory (that you can later linkat() once you fill it with data), and the copy_file_range(2) into the file descriptor, then read the fd as backup file. Since kernel 4.5 the copy_file_range() calls actually uses reflink, if possible, what it means it would allow borg to actually backup file that does change during backup, because the fd that borg has is a zero-copy clone.

It would be equal to doing something like this manually (pseudocode):

if ! borg create ... /path/to/file.sqlite; then
    cp --reflink=always /path/to/file.sqlite /path/to/file.sqlite.snapshot
    borg create ... /path/to/file.sqlite.snapshot
    rm -f /path/to/file.sqlite.snapshot
fi

I use reflinks to make crash consistent backups of my virtual machines that hare hundreds gigabytes of size, by reflinking, backing up the data and then removing them. Creating reflink is instant, it does not matter how long the reflinked file is read, since it will never chnage, and then it is purged. It would be perfect to have reflink support in borg, but I understand it would require tier 3 hackery with CPython and would still be limited to only work on supported file systems (xfs, btrfs, zfs), only on Linux and only if borg process can get write access to the directory that contain file that is being backup (open with O_TMPFILE will require it).

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Half of my backups fails.

Include any warning/errors/backtraces from the system logs

The text was updated successfully, but these errors were encountered:

ThomasWaldmann · 2022-04-19T13:29:38Z

reflink: Guess we could only use that as an optional feature and fall back to normal open if it is not present.

But I am not really sure whether it could be implemented like that / how much effort that would be. We use OS-level FDs, Python file objects and for some stuff also "by filename" access to files. If somebody wants to try, just make a PR.

fff7d1bc · 2022-04-19T13:40:50Z

Would you be more keen to accept a PR with retry on file changed? If so I could look into adding it.

ThomasWaldmann · 2022-04-19T13:48:51Z

Well, all methods have their pros and cons.

Doing nothing is the easiest, people wanting to have stable files can still use fs or block level snapshotting (IF they have it).

reflink has the platform and fs support issue, people could use it if we support that and IF they have it.

Retrying might work, but we can't be sure (if there are many ongoing writes, it can well be that all the retries fail).

If a PR would introduce a rather simple change that works for most cases, I guess it could be acceptable.

ThomasWaldmann · 2022-04-19T14:04:37Z

BTW, we must not create multiple borg metadata stream items for the same fs item when "retrying".

ThomasWaldmann · 2022-04-19T14:07:39Z

Well, the world is not black and white. A lot of software only differentiates between all-good and error.

borg has all-good (0), definitely an error (2) and warning (1). the latter means that you have to look at the logs.

it might be something harmless or something in a file you do not really care about, but it might also be your important sqlite db which has changed while we backed it up.

mikabytes · 2022-06-15T10:35:02Z

I'd just like to pitch in on this feature request. I have some ~200TiB of data that is backed up every 5 minutes. 2 out of 3 attempts results in RC 1. I want to be sure that I got the whole file, so I set it up to retry the borg create command on any warning. Every backup takes roughly 1 minute, meaning I'm spending 3 out of 5 minutes doing backups.

No biggie. But it is some extra load on the system. Having a retry-if-changed on the file level would be great to solve this.

This is CephFS. I don't think there's a "reflink" concept.

ThomasWaldmann · 2023-02-11T18:59:39Z

https://gitlab.com/rubdos/pyreflink separate pyreflink library, but seems not much maintained recently.

python/cpython#81338 reflink support ticket in Python's issue tracker.

os.copy_file_range (Linux only) in Python 3.8+. See comment there: python/cpython#81338 (comment)

BTW, we have this consistency/snapshot problem not only within the file contents, but also in the file metadata (like xattrs, ACLs, etc.).

ThomasWaldmann · 2023-02-13T23:07:45Z

Guess pyreflink is a bit problematic:

only some fs support it
no support for it in CPython yet
not sure if pyreflink is still alive

Can we close this in favour of #7346?

gellnerm · 2023-02-21T09:53:04Z

Until we have something else, can we have a simple flag "--no-file-changed-warning" that exits with code 0 instead of a warning? Because now scripts like

borg create || handle_error

will not work anymore.

ThomasWaldmann · 2023-02-21T23:05:15Z

@gellnerm if you want to handle errors, you need to check for rc 2. rc 1 is warnings, rc 0 is "all ok".

gellnerm · 2023-02-22T08:38:40Z

Thanks, now I'm using

borg create ...
test $? -eq 2 && handle_err

ThomasWaldmann · 2023-02-23T00:49:39Z

Superseded by #7346.

sophie-h mentioned this issue Oct 9, 2022

file changed warning: Add message id and path #7080

Closed

theAkito mentioned this issue Dec 4, 2022

file changed while we backed it up Message a bit broken #7183

Closed

ThomasWaldmann added this to the 2.0.0b5 milestone Feb 10, 2023

ThomasWaldmann mentioned this issue Feb 11, 2023

File changed while we backed it up #6989

Closed

ThomasWaldmann changed the title ~~create: better handling "file changed" errors, mitigation by retries and/or reflinks~~ create: better handling "file changed" warnings, mitigation by retries and/or reflinks Feb 11, 2023

borgbackup deleted a comment from fff7d1bc Feb 11, 2023

ThomasWaldmann self-assigned this Feb 19, 2023

ThomasWaldmann closed this as completed Feb 23, 2023

tux2linux added a commit to Blunix-GmbH/role-borgbackup-client that referenced this issue Feb 16, 2024

ignore exit code 1 (changed files) borgbackup/borg#6622

93f0ec2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create: better handling "file changed" warnings, mitigation by retries and/or reflinks #6622

create: better handling "file changed" warnings, mitigation by retries and/or reflinks #6622

fff7d1bc commented Apr 18, 2022 •

edited

Loading

ThomasWaldmann commented Apr 19, 2022

fff7d1bc commented Apr 19, 2022

ThomasWaldmann commented Apr 19, 2022

ThomasWaldmann commented Apr 19, 2022

ThomasWaldmann commented Apr 19, 2022

mikabytes commented Jun 15, 2022 •

edited

Loading

ThomasWaldmann commented Feb 11, 2023 •

edited

Loading

ThomasWaldmann commented Feb 13, 2023

gellnerm commented Feb 21, 2023

ThomasWaldmann commented Feb 21, 2023

gellnerm commented Feb 22, 2023

ThomasWaldmann commented Feb 23, 2023

create: better handling "file changed" warnings, mitigation by retries and/or reflinks #6622

create: better handling "file changed" warnings, mitigation by retries and/or reflinks #6622

Comments

fff7d1bc commented Apr 18, 2022 • edited Loading

Have you checked borgbackup docs, FAQ, and open Github issues?

Is this a BUG / ISSUE report or a QUESTION?

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

Operating system (distribution) and version.

Hardware / network configuration, and filesystems used.

How much data is handled by borg?

Full borg commandline that lead to the problem (leave away excludes and passwords)

Describe the problem you're observing.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Include any warning/errors/backtraces from the system logs

ThomasWaldmann commented Apr 19, 2022

fff7d1bc commented Apr 19, 2022

ThomasWaldmann commented Apr 19, 2022

ThomasWaldmann commented Apr 19, 2022

ThomasWaldmann commented Apr 19, 2022

mikabytes commented Jun 15, 2022 • edited Loading

ThomasWaldmann commented Feb 11, 2023 • edited Loading

ThomasWaldmann commented Feb 13, 2023

gellnerm commented Feb 21, 2023

ThomasWaldmann commented Feb 21, 2023

gellnerm commented Feb 22, 2023

ThomasWaldmann commented Feb 23, 2023

fff7d1bc commented Apr 18, 2022 •

edited

Loading

mikabytes commented Jun 15, 2022 •

edited

Loading

ThomasWaldmann commented Feb 11, 2023 •

edited

Loading