Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make arch-deduplicate comprehensive #227

Open
tasket opened this issue Dec 23, 2024 · 0 comments
Open

Make arch-deduplicate comprehensive #227

tasket opened this issue Dec 23, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@tasket
Copy link
Owner

tasket commented Dec 23, 2024

Currently the deduplication for send and arch-deduplicate rely on the same best-effort process, which may cap the number of archive chunks that are available to dedup. The capping is done to keep memory usage at reasonable levels (and may be reduced further in the future).

However, dedup that is not working on-the-fly during send could conceivably find all possible matches for any existing data chunk, while still being reasonably efficient. One way to achieve this is to sort all manifests (including vol/session column) by hash and then merge them together; adjacent entries would show where duplicates exist.

Another approach would be to utilize a simple on-disk key-lookup system (even files in a filesystem, if that is faster) to comprehensively map and find chunks by hash value.

Related: #224

@tasket tasket added the enhancement New feature or request label Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant