You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the deduplication for send and arch-deduplicate rely on the same best-effort process, which may cap the number of archive chunks that are available to dedup. The capping is done to keep memory usage at reasonable levels (and may be reduced further in the future).
However, dedup that is not working on-the-fly during send could conceivably find all possible matches for any existing data chunk, while still being reasonably efficient. One way to achieve this is to sort all manifests (including vol/session column) by hash and then merge them together; adjacent entries would show where duplicates exist.
Another approach would be to utilize a simple on-disk key-lookup system (even files in a filesystem, if that is faster) to comprehensively map and find chunks by hash value.
Currently the deduplication for
send
andarch-deduplicate
rely on the same best-effort process, which may cap the number of archive chunks that are available to dedup. The capping is done to keep memory usage at reasonable levels (and may be reduced further in the future).However, dedup that is not working on-the-fly during send could conceivably find all possible matches for any existing data chunk, while still being reasonably efficient. One way to achieve this is to sort all manifests (including vol/session column) by hash and then merge them together; adjacent entries would show where duplicates exist.
Another approach would be to utilize a simple on-disk key-lookup system (even files in a filesystem, if that is faster) to comprehensively map and find chunks by hash value.
Related: #224
The text was updated successfully, but these errors were encountered: