-
-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use randomized key derivation instead of relying on stateful nonces #3814
Comments
This is likely a CVE-worthy security vulnerability, but as it was already documented I am reporting this publicly. This must be fixed before borg can safely be used with QubesOS. |
How hard would this be to fix? Is there a version number in the backup
format?
…On Mon, May 14, 2018, 5:29 PM enkore ***@***.***> wrote:
It's a fairly well-known bug, though (e.g. #1044
<#1044> / #1031
<#1031>). But it is of course
correct, encryption in attic/borg is *conceptually* broken.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3814 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGGWB_T9fNz14k4959pExagcQaq7aCBwks5tyfdBgaJpZM4T8jdO>
.
|
Any plan to fix this? Looks like a the security of borg is broken |
It depends on how you use it, using 1 client per repo is fine (see the docs for more details). |
@DemiMarie , just to clarify, by "random key" this issue means "random Nonce", right? Otherwise it's not clear how to decrypt data later. If I'm correct, then
Or do I misunderstand your idea completely? |
@kroki I do in fact mean “random key”. The random number (salt) and personality string would be stored (unencrypted) along with the ciphertext, so the key can be recomputed for decryption. They need not (and must not) be kept secret. The personality string can be shared amongst all users of Borg, while the random salt must be unpredictable, but is safe to publish. Of course, the master key must be kept secret. Random nonces are not okay because they are usually only 96 bits, which is not enough to ensure that collisions will not happen. An AES-256 key is 256 bits, which is sufficient. If the key derivation function (KDF) provides enough data (which both HKDF and Blake2b do), additional data from the KDF can be used for the nonce. The salt and personality string must, along with their lengths, be included in the computation of the authenticator. If the KDF output is used for authenticated encryption, this is sufficient. MAC-then-encrypt and MAC-and-encrypt are not good choices, as neither are secure in general. They can be secure, but that requires significant extra care. Encrypt-then-MAC just works. AES is not a good choice for key derivation because it is vulnerable to related-key attacks. A true hash function is a far, far better choice. I suggest using libsodium’s Blake2b implementation, which is both portable and very heavily optimized. Alternatively, HKDF can be used. I strongly recommend using libsodium’s crypto_secretstream API. for bulk data encryption. |
^ on point |
@DemiMarie I had a look at libsodium a while ago, back then they did not support accelerated AES256 (something that borg needs). I felt that depending on openssl (for that) and on libsodium (for other stuff) is not desirable. |
Based on the ideas from #1031 / @enkore: I would remove the id_key from the HKDF input With a SHA512 HKDF this should not increase the number of required hash calculations and at the same time reduce the storage overhead. The only case where there could be key & IV reuse would be if the same plaintext is re-encrypted with a different compression (or compression parameters), but as far as I understand, this should not happen in normal operation. Is there any case where this might happen? Or are there other (cryptographic) problems, which I'm not seeing right now? |
Sadly, there are, although they are easy to miss. Consider the case where a system is snapshotted and then restored from the snapshot more than once. Both restored systems will likely perform different operations, so when they run Borg again, they will encrypt different data. However, Borg is not aware of the snapshot, so nonces will be reused. Using sufficient randomness (at least 192 bits) in the key derivation prevents this. The session ID should be increased to 192 bits to prevent such attacks, but it does not need to be stored separately in each chunk. |
@DemiMarie And I don't see a good way to not store the session ID with every chunk. There has to be a place to know, which chunk has been encrypted with which session ID. And the only way to save some space would be some level of indirection, which is difficult to handle and introduces additional points of failure in case of data corruption. |
There is the --recompress argument for borg recreate that allows to recompress existing archives with a different algorithm and parameters. |
@pgerber thank you for the input! @DemiMarie To cope with the recreate issues, it makes sense to drop my approach using the chunk ID in the key generation and move back to an approach closer along the lines of the initial approach I was referring to: As a side-note: I already have two ideas, how to ensure a semi-random nonce without book-keeping overhead |
@ozppupbg Using a keyed MAC as a nonce is known as SIV (Synthetic Initialization Vector) mode, and is considered secure provided the nonce size is large enough. XChaCha20-Poly1305 with random nonces is fine. However, for this security proof to hold, the following must be true:
The biggest reason for using long, random nonces is simplicity. Complexity is the enemy of security, especially in cryptography. Using long, random nonces is simple and correct. Yes, it adds overhead, but compared to the size of a chunk (probably >10KiB) the overhead of a 24 byte nonce is negligible. |
The broken crypto seems to be the main reason why multi-client on a single repo is discouraged in the docs [1]. I guess this should get more priority? |
+1, this is the only reason that holds me back from using borg. |
@DemiMarie After re-reading this and #1031 I think the only question left open here is: Do we want to derive a random key for each chunk (necessitating a sufficiently long random nonce, >192 bits) or do we want to derive a random session key, which wouldn't need as long a nonce (because there are many fewer sessions than there are chunks) and is faster (no per-chunk KDF). I don't see anything immediately wrong with #1031 (comment) (though I would remove the ID key as a KDF input) except that one might feel a random 128 bit session key is too short. On the other hand, Borg has pretty high per-session costs, so even a small session tends to take a few seconds to complete. So even when doing small transactions on a repo 24/7 it'd take a hundred years or so to do 2**32 transactions, at which point the chances of colliding in a 128 bit space are negligible. |
@enkore A random session key is perfectly fine. I would go with a 256 bit session key out of an abundance of caution: the performance and space overheads are negligible, and it fits the general rule that a key should be twice the security parameter. |
https://soatok.blog/2021/11/17/understanding-hkdf/ just stumbled over that. huh? |
Took @enkore's diagram and re-did it:
Unclear to me:
|
From @enkore in #1031: Therefore, adjusting the design slightly from "very compatible [but conditionally blows up with old versions that don't understand new formats and maybe it's hard to tell which things are affected by that]" to "explicitly enable it, locking older versions out". In that case we can add some dedicated IKM to the keyblob, which would allow replication. Also has the advantage that it's completely clear for the entire repository whether it's backwards-compatible. Also allows to just use this scheme for everything incl. the manifest. Yeah, guess we want to explicitly enable this, add the extra ikm key and lock out old clients. Maybe the #747 related changes will require that also when switching over to the better stuff. |
I suggest using 64 bits or more. |
@ThomasWaldmann : no objections from my side.
Why is that even necessary? In the worst-case the ID could be reconstructed from the decrypted data.
I don't seen any issues there. If, for some reason, we need more keys/randomness there is also the option (which I think was not mentioned in the article) to generate a longer HKDF output and split the result into different parts as needed.
Why? Yes, it could overflow, but we are in total control of the counter:
Which "replication" is mentioned here? |
This is a really bad idea. Authentication should be done first, then decryption. (Yes, that means that streaming AEAD APIs are bad!). If unauthenticated plaintext is disclosed to an attacker there is a security vulnerability. While using it to compute a message ID is probably safe, on the grounds that the message ID leaks no information about the data, it is still highly ill-advised. Just stick the message ID in front of the data and treat the AEAD algorithm as a black box. |
@DemiMarie I was not recommending this, just pointing out options. |
@ozppupbg for better efficiency, Also, when compacting the repo (also done server-side), the index must be updated about the new chunks' locations. 32bit overflow: correct, we could detect and start a new session. Or use a bit longer counter and have simpler code. About replication: guess what he meant is being able to efficiently transfer archives (or just chunks) from one repo to another. @DemiMarie yeah, 64bit would solve it for sure, but guess that would be a bit overkill also. But while thinking about layout, alignment and maybe also keeping some bits reserved for later came to my mind. How about:
About the small chunks case: there are other limits related to this, e.g. all the chunks have an entry in the in-memory chunks index hashtable. So, while running against the 32bit boundary is somehow thinkable, that would aready use 4G * sizeof(hashtable_entry) of RAM (== at least 160GB for 40B entries). And you'ld need 4G small chunks / small files for that. |
That’s fine, but be sure to check for |
Hmm, just noticed: we don't have a single bit there which we could use by server-side code, because all is protected by the auth tag and that stuff can only be changed client-side. |
That’s a good thing 😆. Server-side metadata should be separate. |
I agree, what would this be good for? |
Was just a vague idea. Like maybe some flags for bookkeeping of long running operations. OTOH, that would need to modify the segment files in-place, which is something we intentionally do not do (we only append at the end and delete unused stuff, but never modify). |
Would be cool if PR #6463 could get some review. |
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
Bug
System information. For client/server mode post info for both machines.
N/A
Describe the problem you're observing.
Borg currently uses the same AES key for all encryption. This is an anti-pattern and leads to Borg’s weaknesses when it comes to nonces.
A much better solution is to generate a fresh encryption key every time as (say)
blake2b(my_personality_string, random_number, key)
. This has the following advantages:The text was updated successfully, but these errors were encountered: