borgbackup · ThomasWaldmann · Mar 26, 2022 · Mar 17, 2022 · Mar 17, 2022 · Mar 18, 2022
diff --git a/docs/faq.rst b/docs/faq.rst
@@ -89,7 +89,7 @@ Also, you must not run borg against multiple instances of the same repo
   (which is an issue if they happen to be not the same).
   See :issue:`4272` for an example.
 - Encryption security issues if you would update repo and copy-of-repo
-  independently, due to AES counter reuse.
+  independently, due to AES counter reuse (when using legacy encryption modes).
 
 See also: :ref:`faq_corrupt_repo`
 
@@ -246,6 +246,8 @@ then use ``tar`` to perform the comparison:
 My repository is corrupt, how can I restore from an older copy of it?
 ---------------------------------------------------------------------
 
+Note: this is only required for repos using legacy encryption modes.
+
 If your repositories are encrypted and have the same ID, the recommended method
 is to delete the corrupted repository, but keep its security info, and then copy
 the working repository to the same location:
@@ -473,8 +475,11 @@ Security
 
 .. _borg_security_critique:
 
-Isn't BorgBackup's AES-CTR crypto broken?
------------------------------------------
+Isn't BorgBackup's legacy AES-CTR-based crypto broken?
+------------------------------------------------------
+
+Note: in borg 1.3 new AEAD cipher based modes with session keys were added,
+solving the issues of the legacy modes.
 
 If a nonce (counter) value is reused, AES-CTR mode crypto is broken.
 
@@ -713,6 +718,8 @@ Please disclose security issues responsibly.
 How important are the nonce files?
 ------------------------------------
 
+This only applies to repositories using legacy encryption modes.
+
 Borg uses :ref:`AES-CTR encryption <borg_security_critique>`. An
 essential part of AES-CTR is a sequential counter that must **never**
 repeat. If the same value of the counter is used twice in the same repository,
@@ -881,24 +888,24 @@ What's the expected backup performance?
 ---------------------------------------
 
 Compared to simply copying files (e.g. with ``rsync``), Borg has more work to do.
-This can make creation of the first archive slower, but saves time 
+This can make creation of the first archive slower, but saves time
 and disk space on subsequent runs. Here what Borg does when you run ``borg create``:
 
 - Borg chunks the file (using the relatively expensive buzhash algorithm)
-- It then computes the "id" of the chunk (hmac-sha256 (often slow, except 
+- It then computes the "id" of the chunk (hmac-sha256 (often slow, except
   if your CPU has sha256 acceleration) or blake2b (fast, in software))
-- Then it checks whether this chunk is already in the repo (local hashtable lookup, 
-  fast). If so, the processing of the chunk is completed here. Otherwise it needs to 
+- Then it checks whether this chunk is already in the repo (local hashtable lookup,
+  fast). If so, the processing of the chunk is completed here. Otherwise it needs to
   process the chunk:
 - Compresses (the default lz4 is super fast)
 - Encrypts (AES, usually fast if your CPU has AES acceleration as usual
   since about 10y)
 - Authenticates ("signs") using hmac-sha256 or blake2b (see above),
 - Transmits to repo. If the repo is remote, this usually involves an SSH connection
   (does its own encryption / authentication).
-- Stores the chunk into a key/value store (the key is the chunk id, the value 
+- Stores the chunk into a key/value store (the key is the chunk id, the value
   is the data). While doing that, it computes a CRC32 of the data (repo low-level
-  checksum, used by borg check --repository) and also updates the repo index 
+  checksum, used by borg check --repository) and also updates the repo index
   (another hashtable).
 
 Subsequent backups are usually very fast if most files are unchanged and only
@@ -928,14 +935,14 @@ If you feel your Borg backup is too slow somehow, here is what you can do:
 
 - Make sure Borg has enough RAM (depends on how big your repo is / how many
   files you have)
-- Use one of the blake2 modes for --encryption except if you positively know 
+- Use one of the blake2 modes for --encryption except if you positively know
   your CPU (and openssl) accelerates sha256 (then stay with hmac-sha256).
 - Don't use any expensive compression. The default is lz4 and super fast.
   Uncompressed is often slower than lz4.
 - Just wait. You can also interrupt it and start it again as often as you like,
   it will converge against a valid "completed" state (see ``--checkpoint-interval``,
   maybe use the default, but in any case don't make it too short). It is starting
-  from the beginning each time, but it is still faster then as it does not store 
+  from the beginning each time, but it is still faster then as it does not store
   data into the repo which it already has there from last checkpoint.
 - If you don’t need additional file attributes, you can disable them with ``--noflags``,
   ``--noacls``, ``--noxattrs``. This can lead to noticable performance improvements
@@ -945,12 +952,12 @@ If you feel that Borg "freezes" on a file, it could be in the middle of processi
 large file (like ISOs or VM images). Borg < 1.2 announces file names *after* finishing
 with the file. This can lead to displaying the name of a small file, while processing the
 next (larger) file. For very big files this can lead to the progress display show some
-previous short file for a long time while it processes the big one. With Borg 1.2 this 
+previous short file for a long time while it processes the big one. With Borg 1.2 this
 was changed to announcing the filename before starting to process it.
 
 To see what files have changed and take more time processing, you can also add
-``--list --filter=AME --stats`` to your ``borg create`` call to produce more log output, 
-including a file list (with file status characters) and also some statistics at 
+``--list --filter=AME --stats`` to your ``borg create`` call to produce more log output,
+including a file list (with file status characters) and also some statistics at
 the end of the backup.
 
 Then you do the backup and look at the log output:

diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst
@@ -865,6 +865,31 @@ Encryption
 
 .. seealso:: The :ref:`borgcrypto` section for an in-depth review.
 
+AEAD modes
+~~~~~~~~~~
+
+Uses modern AEAD ciphers: AES-OCB or CHACHA20-POLY1305.
+For each borg invocation, a new sessionkey is derived from the borg key material
+and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter
+to our IV, so we'll just count up by 1 per chunk).
+
+The chunk layout is best seen at the bottom of this diagram:
+
+.. figure:: encryption-aead.png
+    :figwidth: 100%
+    :width: 100%
+
+No special IV/counter management is needed here due to the use of session keys.
+
+A 48 bit IV is way more than needed: If you only backed up 4kiB chunks (2^12B),
+the IV would "limit" the data encrypted in one session to 2^(12+48)B == 2.3 exabytes,
+meaning you would run against other limitations (RAM, storage, time) way before that.
+In practice, chunks are usually bigger, for big files even much bigger, giving an
+even higher limit.
+
+Legacy modes
+~~~~~~~~~~~~
+
 AES_-256 is used in CTR mode (so no need for padding). A 64 bit initialization
 vector is used, a MAC is computed on the encrypted chunk
 and both are stored in the chunk. Encryption and MAC use two different keys.
@@ -884,6 +909,9 @@ To reduce payload size, only 8 bytes of the 16 bytes nonce is saved in the
 payload, the first 8 bytes are always zeros. This does not affect security but
 limits the maximum repository capacity to only 295 exabytes (2**64 * 16 bytes).
 
+Both modes
+~~~~~~~~~~
+
 Encryption keys (and other secrets) are kept either in a key file on the client
 ('keyfile' mode) or in the repository config on the server ('repokey' mode).
 In both cases, the secrets are generated from random and then encrypted by a

diff --git a/docs/internals/encryption-aead.odg b/docs/internals/encryption-aead.odg
diff --git a/docs/internals/encryption-aead.png b/docs/internals/encryption-aead.png
diff --git a/docs/internals/security.rst b/docs/internals/security.rst
@@ -124,7 +124,88 @@ prompt is a set BORG_PASSPHRASE. See issue :issue:`2169` for details.
 Encryption
 ----------
 
-Encryption is currently based on the Encrypt-then-MAC construction,
+AEAD modes
+~~~~~~~~~~
+
+Modes: --encryption (repokey|keyfile)-[blake2-](aes-ocb|chacha20-poly1305)
+
+Supported: borg 1.3+
+
+Encryption with these modes is based on AEAD ciphers (authenticated encryption
+with associated data) and session keys.
+
+Depending on the chosen mode (see :ref:`borg_init`) different AEAD ciphers are used:
+
+- AES-256-OCB - super fast, single-pass algorithm IF you have hw accelerated AES.
+- chacha20-poly1305 - very fast, purely software based AEAD cipher.
+
+The chunk ID is derived via a MAC over the plaintext (mac key taken from borg key):
+
+- HMAC-SHA256 - super fast IF you have hw accelerated SHA256.
+- Blake2b - very fast, purely software based algorithm.
+
+For each borg invocation, a new session id is generated by `os.urandom`_.
+
+From that session id, the initial key material (ikm, taken from the borg key)
+and an application and cipher specific salt, borg derives a session key via HKDF.
+
+For each session key, IVs (nonces) are generated by a counter which increments for
+each encrypted message.
+
+Session::
+
+    sessionid = os.urandom(24)
+    ikm = enc_key || enc_hmac_key
+    salt = "borg-session-key-CIPHERNAME"
+    sessionkey = HKDF(ikm, sessionid, salt)
+    message_iv = 0
+
+Encryption::
+
+    id = MAC(id_key, data)
+    compressed = compress(data)
+
+    header = type-byte || 00h || message_iv || sessionid
+    aad = id || header
+    message_iv++
+    encrypted, auth_tag = AEAD_encrypt(session_key, message_iv, compressed, aad)
+    authenticated = header || auth_tag || encrypted
+
+Decryption::
+
+    # Given: input *authenticated* data and a *chunk-id* to assert
+    type-byte, past_message_iv, past_sessionid, auth_tag, encrypted = SPLIT(authenticated)
+
+    ASSERT(type-byte is correct)
+
+    past_key = HKDF(ikm, past_sessionid, salt)
+    decrypted = AEAD_decrypt(past_key, past_message_iv, authenticated)
+
+    decompressed = decompress(decrypted)
+
+    ASSERT( CONSTANT-TIME-COMPARISON( chunk-id, MAC(id_key, decompressed) ) )
+
+Notable:
+
+- More modern and often faster AEAD ciphers instead of self-assembled stuff.
+- Due to the usage of session keys, IVs (nonces) do not need special care here as
+  they did for the legacy encryption modes.
+- The id is now also input into the authentication tag computation.
+  This strongly associates the id with the written data (== associates the key with
+  the value). When later reading the data for some id, authentication will only
+  succeed if what we get was really written by us for that id.
 self.assert_id(id, data) 
 self.assert_id(id, data) 
+
+
+Legacy modes
+~~~~~~~~~~~~
+
+Modes: --encryption (repokey|keyfile)-[blake2]
+
+Supported: all borg versions, blake2 since 1.1
+
+DEPRECATED. We strongly suggest you use the safer AEAD modes, see above.
+
+Encryption with these modes is based on the Encrypt-then-MAC construction,
 which is generally seen as the most robust way to create an authenticated
 encryption scheme from encryption and message authentication primitives.
 
@@ -137,7 +218,7 @@ in the future.
 
 Depending on the chosen mode (see :ref:`borg_init`) different primitives are used:
 
-- The actual encryption is currently always AES-256 in CTR mode. The
+- Legacy encryption modes use AES-256 in CTR mode. The
   counter is added in plaintext, since it is needed for decryption,
   and is also tracked locally on the client to avoid counter reuse.
 
@@ -253,7 +334,7 @@ Implementations used
 We do not implement cryptographic primitives ourselves, but rely
 on widely used libraries providing them:
 
-- AES-CTR and HMAC-SHA-256 from OpenSSL 1.0 / 1.1 are used,
+- AES-CTR, AES-OCB, CHACHA20-POLY1305 and HMAC-SHA-256 from OpenSSL 1.1 are used,
   which is also linked into the static binaries we provide.
   We think this is not an additional risk, since we don't ever
   use OpenSSL's networking, TLS or X.509 code, but only their
@@ -268,7 +349,8 @@ on widely used libraries providing them:
 
 Implemented cryptographic constructions are:
 
-- Encrypt-then-MAC based on AES-256-CTR and either HMAC-SHA-256
+- AEAD modes: AES-OCB and CHACHA20-POLY1305 are straight from OpenSSL.
+- Legacy modes: Encrypt-then-MAC based on AES-256-CTR and either HMAC-SHA-256
   or keyed BLAKE2b256 as described above under Encryption_.
 - Encrypt-and-MAC based on AES-256-CTR and HMAC-SHA-256
   as described above under `Offline key security`_.

diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -387,6 +387,7 @@ For automated backups the passphrase can be specified using the
     A backup inside of the backup that is encrypted with that key/passphrase
     won't help you with that, of course.
 
+    Only applies to repos using legacy encryption modes:
     In case you lose your repository and the security information, but have an
     older copy of it to restore from, don't use that later for creating new
     backups – you would run into security issues (reuse of nonce counter

diff --git a/docs/usage/init.rst b/docs/usage/init.rst
@@ -4,16 +4,19 @@ Examples
 ~~~~~~~~
 ::
 
-    # Local repository, repokey encryption, BLAKE2b (often faster, since Borg 1.1)
-    $ borg init --encryption=repokey-blake2 /path/to/repo
+    # Local repository, recommended repokey AEAD crypto modes
+    $ borg init --encryption=repokey-aes-ocb /path/to/repo
+    $ borg init --encryption=repokey-chacha20-poly1305 /path/to/repo
+    $ borg init --encryption=repokey-blake2-aes-ocb /path/to/repo
+    $ borg init --encryption=repokey-blake2-chacha20-poly1305 /path/to/repo
 
-    # Local repository (no encryption)
+    # Local repository (no encryption), not recommended
     $ borg init --encryption=none /path/to/repo
 
     # Remote repository (accesses a remote borg via ssh)
     # repokey: stores the (encrypted) key into <REPO_DIR>/config
-    $ borg init --encryption=repokey-blake2 user@hostname:backup
+    $ borg init --encryption=repokey-aes-ocb user@hostname:backup
 
     # Remote repository (accesses a remote borg via ssh)
     # keyfile: stores the (encrypted) key into ~/.config/borg/keys/
-    $ borg init --encryption=keyfile user@hostname:backup
+    $ borg init --encryption=keyfile-aes-ocb user@hostname:backup
diff --git a/src/borg/archive.py b/src/borg/archive.py
@@ -1789,7 +1789,7 @@ def mark_as_possibly_superseded(id_):
 
         def add_callback(chunk):
             id_ = self.key.id_hash(chunk)
-            cdata = self.key.encrypt(chunk)
+            cdata = self.key.encrypt(id_, chunk)
             add_reference(id_, len(chunk), len(cdata), cdata)
             return id_
 
@@ -1811,7 +1811,7 @@ def verify_file_chunks(archive_name, item):
             def replacement_chunk(size):
                 chunk = Chunk(None, allocation=CH_ALLOC, size=size)
                 chunk_id, data = cached_hash(chunk, self.key.id_hash)
-                cdata = self.key.encrypt(data)
+                cdata = self.key.encrypt(chunk_id, data)
                 csize = len(cdata)
                 return chunk_id, size, csize, cdata
 
@@ -1998,7 +1998,7 @@ def valid_item(obj):
                 archive.items = items_buffer.chunks
                 data = msgpack.packb(archive.as_dict())
                 new_archive_id = self.key.id_hash(data)
-                cdata = self.key.encrypt(data)
+                cdata = self.key.encrypt(new_archive_id, data)
                 add_reference(new_archive_id, len(data), len(cdata), cdata)
                 self.manifest.archives[info.name] = (new_archive_id, info.ts)
             pi.finish()