Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse map / file map support for fixed size chunker #5561

Merged
merged 3 commits into from
Dec 28, 2020

Conversation

ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Dec 11, 2020

Fixes #5565.

  • create a map of a file: (offset, size, is_data) - is_data==False means hole (zeros only)
  • use the map in the fixed chunker to read the data / seek over the sparse ranges
  • use the same code with a "there is no hole" map if sparse processing is unwanted or not possible
  • either create the map internally inside chunkify or give a ready-made map to chunkify

IDEAS:

  • generalize so that such "maps" can also be used for slightly different purpose, e.g. reading the changed parts of a VM disk image (the other parts are not necessarily sparse then, just unchanged compared to a already known state)
  • for now, it is just avoiding reading the sparse zeros via the fs, but it is still generating the zeros internally
  • the fixed blocksize chunker is usually generating the one same all-zero chunk of the desired blocksize (except last chunk of a range being a different size likely)

@codecov-io
Copy link

codecov-io commented Dec 11, 2020

Codecov Report

Merging #5561 (37a7436) into master (050a705) will decrease coverage by 0.24%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5561      +/-   ##
==========================================
- Coverage   83.23%   82.98%   -0.25%     
==========================================
  Files          38       38              
  Lines       10069    10070       +1     
  Branches     1671     1671              
==========================================
- Hits         8381     8357      -24     
- Misses       1191     1214      +23     
- Partials      497      499       +2     
Impacted Files Coverage Δ
src/borg/archive.py 81.10% <100.00%> (-1.08%) ⬇️
src/borg/archiver.py 79.90% <100.00%> (-0.44%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d46395...37a7436. Read the comment docs.

@ThomasWaldmann
Copy link
Member Author

@enkore @infectormp @milkey-mouse would you like to review this?

@ThomasWaldmann
Copy link
Member Author

going to merge this soon, so if somebody wants to review, do it soon, please.

@ThomasWaldmann
Copy link
Member Author

The macOS ci is failing, looks like the filesystem used for tests there does not support sparse files.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Dec 18, 2020

openindiana64 tmpfs has no sparse support.

freebsd64 tmpfs has 32kiB blocksize(?) with sparse support. Also does not like to end a file with a sparse section, but inserts a data range at the end.

darwin64 tmpfs and also /Users (on hfs) give errno 25 for seek hole/data.

python 3.6.6 on openbsd:

    openbsd64: >   whence = os.SEEK_HOLE                                                                                 
    openbsd64: E   AttributeError: module 'os' has no attribute 'SEEK_HOLE'     

seems like openbsd does not support this api?

@ThomasWaldmann ThomasWaldmann force-pushed the sparse-file-support branch 3 times, most recently from 7c28c29 to be27b06 Compare December 25, 2020 22:29
…ckup#14

a file map can be:

- created internally inside chunkify by calling sparsemap, which uses
  SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a
  seekable sparse file.
  Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ...
  BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize!

- made by some other means and given to the chunkify function.
  this is not used yet, but in future this could be used to only read
  the changed parts and seek over the (known) unchanged parts of a file.

sparsemap: the generate range sizes are multiples of the fs block size.
           the tests assume 4kiB fs block size.
also: do the os.SEEK_(HOLE|DATA) check only once
@ThomasWaldmann ThomasWaldmann merged commit 2851a84 into borgbackup:master Dec 28, 2020
@ThomasWaldmann ThomasWaldmann deleted the sparse-file-support branch December 28, 2020 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve the fixed chunker for better sparse file support
2 participants