You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could we use sha512-256 or blake2b-256 (after we require OpenSSL 1.1 or otherwise make sure we have it - maybe via Python 3.6?) as ID hash?
Usage of ID:
write: id = H(data), put(id, data)
read: we already know the id (e.g. is in some chunks list), data = get(id); verify(id, data)
In both cases, it does not really matter whether we use sha256, sha512-256 or blake2b-256 (or a mix of them in same repo as long as we know which was used for some specific data). I'ld guess sha256(A)-blake2b(B) collisions should be about as likely as sha256(a)-sha256(b) collisions.
Of course one loses dedup between chunks stored using different id hashes. borg diff also loses some functionality as it asserts identical file contents based on identicals chunk id lists (and vice versa). But that is not much different from switching chunksize, which we also support in same repo.
For chosing the right hash (mac) algorithm to verify data integrity (authenticity), we ofc. need to know which algo was used for some specific storage object.
Old way: use the type byte of the chunk.
New way: Use DKID, DEKs, ciphersuites. We could add the id hash/mac to the ciphersuite. When using DEKs, we store the ciphersuite name together with key material - then we could also use blake2b as a id MAC instead of just a id HASH (even in modes that do not use encryption).
The text was updated successfully, but these errors were encountered:
ThomasWaldmann
changed the title
blake2b as id hash, coexistence of id hash algos
new id hash, coexistence of id hash algos
Sep 2, 2016
I'd say that this adds a lot of complexity and makes deduplication even less predictable. Since the only difference between the id hashes is their performance; just create a new repository if it's important.
Well, one result of the thought experiment was that it does not add a lot of complexity - one just has to know the hash/mac algorithm that has been used to verify data.
Could we use sha512-256 or blake2b-256 (after we require OpenSSL 1.1 or otherwise make sure we have it - maybe via Python 3.6?) as ID hash?
Usage of ID:
In both cases, it does not really matter whether we use sha256, sha512-256 or blake2b-256 (or a mix of them in same repo as long as we know which was used for some specific data). I'ld guess sha256(A)-blake2b(B) collisions should be about as likely as sha256(a)-sha256(b) collisions.
Of course one loses dedup between chunks stored using different id hashes. borg diff also loses some functionality as it asserts identical file contents based on identicals chunk id lists (and vice versa). But that is not much different from switching chunksize, which we also support in same repo.
For chosing the right hash (mac) algorithm to verify data integrity (authenticity), we ofc. need to know which algo was used for some specific storage object.
Old way: use the type byte of the chunk.
New way: Use DKID, DEKs, ciphersuites. We could add the id hash/mac to the ciphersuite. When using DEKs, we store the ciphersuite name together with key material - then we could also use blake2b as a id MAC instead of just a id HASH (even in modes that do not use encryption).
The text was updated successfully, but these errors were encountered: