Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider holes in sparse files when reading #1354

Closed
enkore opened this issue Jul 21, 2016 · 3 comments
Closed

Consider holes in sparse files when reading #1354

enkore opened this issue Jul 21, 2016 · 3 comments

Comments

@enkore
Copy link
Contributor

enkore commented Jul 21, 2016

The current state in borg is that it has simple sparse file support (meaning that it does nothing special on "create", but offers the option to deal with all-zero chunks in 2 ways at "extract" time: a) write zeros to disk (default) b) just "seek" in the output file, creating a hole in a sparse file (--sparse).

(#14)

The scope of this ticket is to avoid reading holes when archives are created. I.e. use SEEK_HOLE/DATA to detect holes and skip over them entirely, adding the correct amount of zero-chunks and feeding just enough data into the chunker at the end of the hole that it retains the same state as if we'd read all the zeroes.

This approach avoids feeding all the zeroes of a hole through the chunker and HMAC'ing them (for the chunk id).

Note:

  • buzhash chunker: the length of the all-zero-chunks the chunker creates by a long stream of zeroes is static (beyond 2 chunks), but depends on the chunker seed.
  • fixed blocksize chunker: always generates same chunk length (except at the end of a data or hole range).
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Dec 2, 2016

The problem why we do not have that yet (and also not #14), is that the buzhash reader/chunker is already rather complex low-level code. Adding this would make it even more complex. If we can solve this, we can also go straight to #14, the rest is easy.

The fixed blocksize chunker has been added to master a while ago (and is much easier and in Cython) and just got sparse file support in #5561.

@ThomasWaldmann
Copy link
Member

#5620 adds a little bit to improve processing all-zero chunks within borg create and the buzhash chunker: it does not support seek_hole/seek_data, but it detects all-zero chunks after reading and then optimizes hashing (with the LRUcache) a lot, which already gives quite a speedup (although not as spectacular as with seek_hole/seek_data).

@ThomasWaldmann
Copy link
Member

The fixed chunker actually supports holes in sparse files and does not read them.

So, while the buzhash chunker still is missing this feature, I will close this as the fixed chunker is preferred anyway for huge stuff like raw disk images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants