-
-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add flexible encryption chunk header #1031
Comments
Mmmhh, as I mentioned in that thing it imho makes more sense to think in terms of cipher suites that provide AEAD in a black-box fashion, without depending on having separable MAC and ciphers. Which also removes possible issues with what's MACed and what is not. For example, in the example you give the IV looks un-MACed, which is very problematic for CBC. I dunno if I wrote it correctly in the above link, but in a format like |
Yes, so why not push all that (IV, envelope MAC) out of the chunk format into blackbox-ciphersuite-do-it-yourself? ;) Adding a field (not necessarily using it right now) to identify different keys (associated with a ciphersuite, since reusing a key for different ciphersuites is potentially dangerous[1]) makes sense also (because otherwise we wouldn't really change anything) [1] ... using the "master" key for any other crypto than what's there right now is IMHO unsafe. So we actually really need separate keys to go with crypto flexibility. So, yeah. |
cipher, hmac type could be stored with the key, right, but the IV needs to be stored with the chunk as it is different starting value per chunk (either a counter or random). we maybe could also generate a new random "chunk key" per chunk, always start with IV=0 at chunk start and store the key with the chunk (encrypted by the master key), no need to store the IV in this case. No counter/IV management problems, even with multithreading. |
Absolutely. I'm just saying that that should be the responsibility of the ciphersuite, not some metadata prepended to the encrypted payload (output of the ciphersuite). I.e. only have @ new key per chunk: would need lots of high quality randomness though. (so I think this would wander very quickly into unsafe territory). Would only work with ciphersuites who have key length == master cipher block length; otherwise would have to keep the IV for that encrypted block as well. It would also need to be authenticated. I also don't really see a good reason to do this, even the most "demanding" ideas for crypto don't want per-chunk key management. I don't see any problems with IV management for multithreading. If you are able to use multiple keys, then each thread would allocate one key (no problems); if you don't only one thread can encrypt (or some API gymnastics with a shared CTR and you update that before you encrypt some data and it won't work and is probably utterly unsafe). |
I updated the link above with some ideas from here.
w.r.t. to algorithms (offtopic to the proposal):
|
hmm, not sure if i like msgpack for the keys - something readable feels better there. json? key IDs: yes, random IDs are easier than "finding next free number", esp. when dealing with multiple clients. alternatively, if could be a function of the key, like crc32 (maybe we want something short?) or md5. One could use that to find bitflips in keys. i agree about the algos. |
Since the "DKID" is in plain-text I don't feel to comfy deriving them directly from the crypto keys. Alternatively: ID = HMAC_of_master_key(packed_dek). We use that function all the time anyway. msgpack vs readable: For the keyblob it doesn't matter, since it's an encrypted|authenticated blob anyway. When we (later) want to be able to import/export them JSON makes sense (most of it would still be a blob, but the ID + cipher suite would be readily readable, and it's much easier to integrate with higher-level tools). While JSON is not injective (<-- there is a correct term for this specific property, that there is only one exact encoding for some data [=injective] or multiple encodings [=not injective]) our usual way of packing things with msgpack is, so an imported DEK can still be validated (pack JSON, HMAC_of_master_key, done). |
btw, with a "none" ciphersuite (no encryption, no mac), Plaintext mode could be integrated into this mechanism. |
w.r.t. asymmetric crypto I think that the approach I took in the link is sufficient, i.e. have Borg not support it directly, but having create-and-exportable keys that can be used with any higher-level key management (GPG, walletd, whatever). |
Yes, I think that makes sense. In the mean time it shouldn't be used by default, though, keeping compatibility with old versions. I'd also say that by default the archive meta chunk and the items chunk stream should be encrypted with the master key unless explicitly told to not do that, since that also keeps compatibility (for For 2.0 we can then remove the "legacy" way. |
there could be some predefined DKIDs, like: |
Hm. Having deterministic, repo-independent DKIDs can have some nice uses (e.g. fast copies between repos sharing the same set of keys). So, let's say we make them deterministic. We don't set the scheme, but have CipherSuite.create (referring to the draft spec) do it. I don't see any problems with it; when sharing keys it can be seen in the data, but that's okay I'd say. Contract:
AES-CTR...HMAC... could then do something like mentioned above: Pack the DEK, do the normal payload encryption with IV=very_very_large (remember that the IV has 16 bytes, but only 8 are stored in normal chunks -- at least currently, but I see no reason to change that -- so using a very large, well-known IV deters IV reuse attacks if they would even be possible somehow here), then the DKID = envelope HMAC. For other cipher suites this might look totally different (aes-gcm, chacha+poly is pretty much the same story as aes+hmac). |
That would also solve the "predefined DEKs" problem. For encrypted repos the DKID of a (converted) master key would be random (no problem there), things like none/plaintext have the same DKID everywhere (since it's the same key). Bingo. |
I believe storing changing data like last_iv is the keyblob is a very bad idea. Users are supposed to have backups of the keyblobs to be able to restore. While a outdated last_iv will not prevent a restore this still makes the keyblob a mutable data structure which means users can’t even reliably check if their offsite keyblob backups are ok. |
https://gist.github.com/enkore/d16849b9e2eecdab0903bcd37bd0ee27 Requirements(2016-09-02), R1...R5
Thus
However:
I think only Chacha20-Poly1305 should be added (for now):
|
It's not just about max. possible througput, but also cpu load. I'ld guess hw accelerated AES-GCM/OCB produces much less load than (non-accelerated) cacha/poly. |
It's the same metric. [1] Which is simply not possible for Borg with the anticipated 1.2 design with any of them. Therein lies the rub. All the other stuff, be it ID hashes, compressors or buzhash has only a fraction of the performance of any of these AEAD constructions. The overall CPU% spent on them will be very minor, not very different from the AES-CPU% we see now. And that percentage is tiny. (with AES-NI. Bonus points for chapoly for being faster without AES-NI). |
AES: 2.4%, HMAC: 7.8%, Total: 10.2% If we'd use chapoly that would change to approx. Chapoly: 4.8%, HMAC: 0.0%, Total: 4.8% In 1.2 we'll have an entire thread for any of these. (btw. in 1.1 we already reduced the 5% crc32 block to <1 %) |
If I understand this right, the consensus here is, that the best option is to introduce a compatibility breaking change? If yes, I would recommend also changing the Also, I would recommend to remove the |
Would be cool if PR #6463 could get some review. |
I've looked at the experimental branch, thinking about what to get from there and how.
Compression
The flexible compression first done there was meanwhile implemented in borg - a bit differently. The idea here was adding a 2-byte header in front of the compressed data that tells the (de)compression algorithm. Plus the small hack that we can keep existing zlib chunks without header, because we can determine zlib from first 2 zlib-output bytes also. We just must not use zlib-looking 2 type bytes for all other compression algorithms.
This is nice for layering, we just give the decompressor the whole bytestring and it can make up everything from that (choosing the right algo internally). The compressor just prepends the right 2 bytes when returning its output.
Encryption
I'ld like to add flexible encryption in a similar way - see link in next post.
The text was updated successfully, but these errors were encountered: