Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"borg extract" removes empty btrfs subvolumes prior to extraction #4233

Open
kakra opened this issue Dec 29, 2018 · 5 comments
Open

"borg extract" removes empty btrfs subvolumes prior to extraction #4233

kakra opened this issue Dec 29, 2018 · 5 comments
Milestone

Comments

@kakra
Copy link

kakra commented Dec 29, 2018

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Bug

System information. For client/server mode post info for both machines.

btrfs destination, xfs source, local usage

Your borg version (borg -V).

1.1.8

Operating system (distribution) and version.

Gentoo Linux 64-bit

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg extract -v --list path/to/repo::archive -e something/i/dont/want

Describe the problem you're observing.

Prior to extracting the backup to an empty btrfs, I recreate my subvolume structure including nocow and compression flags (chattr) from a script generated during backup. This step is needed because borg doesn't restore btrfs subvolumes as subvolumes but only as directories. Since some kernel version, it is allowed to remove empty subvolumes with just "rmdir". Borg seems to do exactly that: It removes empty directories before restoring content into it. I dind't expect that behavior and there's no command line option to work around this. The result is a btrfs with a lot of subvolumes killed (except they contain nested subvolumes).

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I'm currently letting my script use touch .keep-subvolume for each subvolume it creates. This seems to work around the issue: Borg seems to no longer kill the subvolumes because it only removes empty directories.

It would be good if borg wouldn't kill empty subvolumes. Even better would be if it restored subvolmes as subvolumes instead of plain directories if restoring to a btrfs volume. Maybe it could even store the nocow/compress attribute (lsattr) for subvolumes, tho it could make sense to store these even per file (but I don't use that per file).

@ThomasWaldmann
Copy link
Member

can you reproduce this with normal mount points without using btrfs special features?

@kakra
Copy link
Author

kakra commented Dec 30, 2018

If the subvolume is a mount-point, this won't happen, because borg cannot remove a mounted directory. But there's nothing in btrfs that forces a subvolume to be a mount point. Subvolumes are just ordinary directories just with the difference that internally they use their own set of inode numbers, thus such a directory exposes a different fs-id to user-space. It looks like a mount-point but really it isn't. Borg doesn't cross that boundary because the fs-id differs, not because it's a "mount-point".

The essence is: You cannot remove a directory containing a mount, even if it was empty. But you can do so with btrfs subvolumes unless they are explicitly mounted. In earlier kernel versions, you couldn't rmdir a subvolume - no matter if it results from a mount or from special behavior. But all the time, subvolumes are exposed as ordinary directories in the first place. Mounting those somewhere else is more like a bind-mount. But it's perfectly fine having those in your normal directory structure - and that is how I'm using it. It's one way of preventing snapshots from inheriting into subdirectories.

Reproducable by:

  1. Create a backup of a btrfs root subvolume
  2. Take note of the subvolumes that exists: btrfs sub list --sort=path /mnt/btrfs-subvol-0
  3. Format the btrfs
  4. Re-create snapshots with btrfs sub create ..., mirroring the subvolume list of the original volume
  5. Restore the backup
  6. Compare btrfs sub list --sort=path to the list of step 2

You'll notice the following behavior:

All subvolumes created in step 4 will be gone if they were leaf subvolumes in a tree of subvolumes.

Now redo the above steps, but this time insert a step 4.1 which creates an empty file within each subvolume, i.e. touch .keep-subvolume

Comparing step 6 again will now show that the subvolumes created in step 4 are still there.

Conclusion: borg runs rmdir before extracting a directory entry from the archive. However, rmdir may fail if the directory is a mount-point or contains data. Btrfs subvolumes are not mount-points thus they can be removed with rmdir if empty. Borg should skip removing the directory if the fs-id differs. Maybe it should skip removing directories at all: It removes hidden information which is unexpected, i.e. if some attributes have been set on purpose with chattr prior to extraction. Using chattr on subvolumes/directories in btrfs has some wanted side-effects and with the current behavior one can no longer use that easily.

Applying chattr or recreating subvolumes after extraction is cumbersome: You cannot chattr on non-empty files to change nocow or compression. And you cannot convert existing directories into subvolumes without rewriting all the data contained.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jan 1, 2019

relevant code (1.1.8 release / tag):

borg.archiver.do_extract (lines 749, 760)
borg.archive.extract_item (lines **575**, 629)

so yes, it first kills the dir if it is already there, then re-creates it. not sure why it is done like that, didn't write that code. it also kills existing other non-directory fs items first, before extracting them.

setting of all dir attributes is delayed to until after all content is restored (using the dirs stack), to be able to restore correct timestamps after all dir modifications have happened.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jan 1, 2019

maybe it is to avoid the unfortunate case of not having write permissions to the existing directory and also to have a clean start when adding ACLs and xattrs later.

for non-dir items, it is also to get the file type right and for regular files to start the content from 0.

@kakra
Copy link
Author

kakra commented Jan 2, 2019

Gaining write access to the directory is one obvious requirement. But the implemented solution is quite limited: It doesn't rmdir if the directory already has contents - thus it will fail the exact same problem then. If it needs access to the directory and hasn't, it should maybe just fail instead of doing something unexpected. The user can fix this then. If the user doesn't have write access to the directory, this may have reasons - borg should not try to sneak around this. But setting (possibly restricted) access rights at the end of working in the directory is correct behavior. It should create new directories with minimal rights (maybe u=wx) while extracting and only then apply the final access rights. But it should not try to sneak around explicitly set permissions.

For files this is probably okay... But I'm not sure if it does and if it should delete the destination first and write then. It should write to a temp file, then rename it into place. Otherwise I'm left with no or an incomplete file if I stopped execution. I didn't look at the code, tho. However, starting from a fresh file is important if one wants to use btrfs' capability of inherited attributes: If I mark a directory/subvol as nocow, all (and only) new files and directories will inherit that attribute - which is actually why I don't want it to kill empty subvolumes. I created them before extraction on purpose, the purpose being inheriting attributes.

Setting nocow/compression in btrfs only works on empty files that never had content. Thus, if borg ever implements restoring attributes, it needs to set those before writing the contents. You may be able to set those attributes later but they're not going to have any effect.

The current implementation prevents proper usage of btrfs attributes at the highest cost possible due to how borg behaves.

My restore contained files for which it is critically important to have them restored with the correct attributes because otherwise it exposes buggy behavior and very bad performance when working with those files.

Maybe borg should check if it has CAP_DAC_OVERRIDE capability and if it has, it has full filesystem access rights anyways and could skip that "sneaky" behavior. It could also skip working around corner-cases if it has the CAP_CHOWN capability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants