Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new id hash, coexistence of id hash algos #1562

Closed
ThomasWaldmann opened this issue Sep 2, 2016 · 2 comments
Closed

new id hash, coexistence of id hash algos #1562

ThomasWaldmann opened this issue Sep 2, 2016 · 2 comments

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 2, 2016

Could we use sha512-256 or blake2b-256 (after we require OpenSSL 1.1 or otherwise make sure we have it - maybe via Python 3.6?) as ID hash?

Usage of ID:

  • write: id = H(data), put(id, data)
  • read: we already know the id (e.g. is in some chunks list), data = get(id); verify(id, data)

In both cases, it does not really matter whether we use sha256, sha512-256 or blake2b-256 (or a mix of them in same repo as long as we know which was used for some specific data). I'ld guess sha256(A)-blake2b(B) collisions should be about as likely as sha256(a)-sha256(b) collisions.

Of course one loses dedup between chunks stored using different id hashes. borg diff also loses some functionality as it asserts identical file contents based on identicals chunk id lists (and vice versa). But that is not much different from switching chunksize, which we also support in same repo.

For chosing the right hash (mac) algorithm to verify data integrity (authenticity), we ofc. need to know which algo was used for some specific storage object.

Old way: use the type byte of the chunk.

New way: Use DKID, DEKs, ciphersuites. We could add the id hash/mac to the ciphersuite. When using DEKs, we store the ciphersuite name together with key material - then we could also use blake2b as a id MAC instead of just a id HASH (even in modes that do not use encryption).

@ThomasWaldmann ThomasWaldmann changed the title blake2b as id hash, coexistence of id hash algos new id hash, coexistence of id hash algos Sep 2, 2016
@enkore enkore mentioned this issue Nov 3, 2016
7 tasks
@enkore
Copy link
Contributor

enkore commented May 24, 2017

I'd say that this adds a lot of complexity and makes deduplication even less predictable. Since the only difference between the id hashes is their performance; just create a new repository if it's important.

Close?

@enkore enkore closed this as completed May 24, 2017
@ThomasWaldmann
Copy link
Member Author

Well, one result of the thought experiment was that it does not add a lot of complexity - one just has to know the hash/mac algorithm that has been used to verify data.

But I agree just starting a new repo is simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants