Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip over unallocated spaces during send #228

Open
tasket opened this issue Jan 2, 2025 · 3 comments
Open

Skip over unallocated spaces during send #228

tasket opened this issue Jan 2, 2025 · 3 comments
Labels
enhancement New feature or request optimization

Comments

@tasket
Copy link
Owner

tasket commented Jan 2, 2025

Wyng send will currently examine all unallocated portions of a volume under certain conditions, such as during the volume's initial send. It will also examine/compare all portions that have been de-allocated since the previous send, so there is some impact on incremental backups as well. This results in slower access than what is possible.

Cases where this has an impact:

  • Adding large volumes to an archive
  • Deleting large amounts of data from a volume
  • Increasing a volume's size

Optimization could be achieved by creating a twin of the delta map, a zero map, during one of the early stages of the send process including get_delta_digest(). The zero mapping code would have to conform to each storage type, and the reflink version may be able to consume a 'tee' of the fiemap data. (An alternative would be to use SEEK_HOLE and SEEK_DATA, although they're unlikely to work with tlvm.)

The tlvm version might collect any "left-only" references in the case of an incremental send, or else do an extra metadata extraction step using a tlvm command other than thin_delta.

Assuming the result of zero mapping is a per-chunk bitmap like the delta map, the send_volume() function could attempt to skip through 8-bit or larger segments similar to how it handles the delta bmap_list.

One desired result would be the ability to add a mostly empty, terabyte-sized volume to an archive in a matter of seconds or a few minutes. Another result would be incremental send for a volume that had a vast amount of data deleted taking only a fraction of the time it would in the current worst-case scenario.


To illustrate the large difference that delta mapping vs (lack of) unallocated mapping makes:

Adding a new 1TB mostly-empty (1.5MB) volume to an archive took over 14 minutes.

Adding 48MB to that volume and doing an incremental (mapped) send took 9 seconds. So a backup of 32X the data finished in 1/93 the time. (The incremental send didn't have to compare large amounts of zeros because data had not been deleted from the volume, only added.)

@tasket tasket added enhancement New feature or request optimization labels Jan 2, 2025
@tasket
Copy link
Owner Author

tasket commented Jan 3, 2025

For incremental send:

It should be possible to make a segmented bmap_list (as with the delta map) for 'zero' areas, and then break up or adjust those segments to align with the delta bmap segments; but portions of the zero bmap segments that don't align with the delta segments can be discarded/moved, resulting in those areas being scanned normally. When an aligned zero map segment is encountered, zero-chunk entries can be quickly emitted into the new session's manifest without making buffer comparisons.

If the start of a zero bmap segment comes after the start of a delta segment, it may be necessary to prepend null bits to the start of the zero segment to make them match, possibly append to the end as well. Null bits indicate where normal buffer comparisons should occur so there is no risk to accuracy in doing this.

@tlaurion
Copy link
Contributor

tlaurion commented Jan 4, 2025

Parallelization would help here too, no?

@tasket
Copy link
Owner Author

tasket commented Jan 4, 2025

Probably. Simple parallelizing could easily saturate memory bandwidth with buffer comparisons, though. I'm sure it would run in parallel better with the proposed changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimization
Projects
None yet
Development

No branches or pull requests

2 participants