Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new AEAD crypto with session keys #6463

Merged
merged 34 commits into from
Mar 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
d3f069c
crypto: fix/update borg version comments
ThomasWaldmann Mar 17, 2022
aff6261
crypto: cleanup, remove references to AES-GCM
ThomasWaldmann Mar 17, 2022
e647360
crypto: better raise NotImplementedError if we have no id_hash
ThomasWaldmann Mar 18, 2022
57479fb
crypto: put the IV into the header, at the end of it
ThomasWaldmann Mar 17, 2022
3473b17
crypto: improve attr naming
ThomasWaldmann Mar 17, 2022
9633273
crypto: simplify api for new crypto, AEAD only needs 1 key
ThomasWaldmann Mar 17, 2022
0f6f278
crypto: AEAD key classes
ThomasWaldmann Mar 18, 2022
5c66fa4
crypto: layout updates, low-level does not deal with IV
ThomasWaldmann Mar 18, 2022
c010800
header_len=0 fits header=b'' default
ThomasWaldmann Mar 18, 2022
bb949b2
EVP_DecryptFinal_ex: fix check for return value
ThomasWaldmann Mar 18, 2022
6c7b499
set aead auth tag directly before EVP_DecryptFinal_ev
ThomasWaldmann Mar 18, 2022
6f2c587
tests: consistently give iv_int to ciphersuite
ThomasWaldmann Mar 18, 2022
41082f5
crypto: add some tests for new key types
ThomasWaldmann Mar 20, 2022
6d6d3ca
avoid losing the key
ThomasWaldmann Mar 20, 2022
0b5a212
avoid losing the key (old crypto)
ThomasWaldmann Mar 20, 2022
74ecb63
fix new crypto benchmarks for api change
ThomasWaldmann Mar 20, 2022
41b8a04
use faster hmac.digest api
ThomasWaldmann Mar 20, 2022
d3b78a6
minor key.encrypt api change/cleanup
ThomasWaldmann Mar 21, 2022
8bd9477
add aad parameter to borg.crypto.low_level api
ThomasWaldmann Mar 21, 2022
c50e112
also authenticate the chunkid when using the AEAD ciphers (AES-OCB/CH…
ThomasWaldmann Mar 21, 2022
f4a6ad0
docs: add new AEAD modes to security docs
ThomasWaldmann Mar 21, 2022
948d67e
crypto.low_level: simplify return code checks (AEAD)
ThomasWaldmann Mar 21, 2022
e1313cc
crypto.low_level: simplify return code checks (legacy)
ThomasWaldmann Mar 21, 2022
ccf0875
EVP_DecryptFinal_ex: fix check for return value
ThomasWaldmann Mar 21, 2022
b3383a4
update borg init docs
ThomasWaldmann Mar 21, 2022
298c5ee
docs: security infos only applying to legacy encryption
ThomasWaldmann Mar 22, 2022
ce24752
docs: update borg init examples
ThomasWaldmann Mar 22, 2022
900a812
crypto: bump API_VERSION to 1.3_01
ThomasWaldmann Mar 22, 2022
e4b65de
crypto: add IV overflow check
ThomasWaldmann Mar 22, 2022
3a0e1a1
crypto: low_level: reduce class inheritance depth
ThomasWaldmann Mar 22, 2022
dd2a054
crypto: key: reduce class inheritance depth
ThomasWaldmann Mar 22, 2022
af26835
delete pointless assert
ThomasWaldmann Mar 22, 2022
10cbdcc
add encryption-aead diagram
ThomasWaldmann Mar 22, 2022
c668265
init olen to avoid some (false positive) compiler warnings
ThomasWaldmann Mar 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 21 additions & 14 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Also, you must not run borg against multiple instances of the same repo
(which is an issue if they happen to be not the same).
See :issue:`4272` for an example.
- Encryption security issues if you would update repo and copy-of-repo
independently, due to AES counter reuse.
independently, due to AES counter reuse (when using legacy encryption modes).

See also: :ref:`faq_corrupt_repo`

Expand Down Expand Up @@ -246,6 +246,8 @@ then use ``tar`` to perform the comparison:
My repository is corrupt, how can I restore from an older copy of it?
---------------------------------------------------------------------

Note: this is only required for repos using legacy encryption modes.

If your repositories are encrypted and have the same ID, the recommended method
is to delete the corrupted repository, but keep its security info, and then copy
the working repository to the same location:
Expand Down Expand Up @@ -473,8 +475,11 @@ Security

.. _borg_security_critique:

Isn't BorgBackup's AES-CTR crypto broken?
-----------------------------------------
Isn't BorgBackup's legacy AES-CTR-based crypto broken?
------------------------------------------------------

Note: in borg 1.3 new AEAD cipher based modes with session keys were added,
solving the issues of the legacy modes.

If a nonce (counter) value is reused, AES-CTR mode crypto is broken.

Expand Down Expand Up @@ -713,6 +718,8 @@ Please disclose security issues responsibly.
How important are the nonce files?
------------------------------------

This only applies to repositories using legacy encryption modes.

Borg uses :ref:`AES-CTR encryption <borg_security_critique>`. An
essential part of AES-CTR is a sequential counter that must **never**
repeat. If the same value of the counter is used twice in the same repository,
Expand Down Expand Up @@ -881,24 +888,24 @@ What's the expected backup performance?
---------------------------------------

Compared to simply copying files (e.g. with ``rsync``), Borg has more work to do.
This can make creation of the first archive slower, but saves time
This can make creation of the first archive slower, but saves time
and disk space on subsequent runs. Here what Borg does when you run ``borg create``:

- Borg chunks the file (using the relatively expensive buzhash algorithm)
- It then computes the "id" of the chunk (hmac-sha256 (often slow, except
- It then computes the "id" of the chunk (hmac-sha256 (often slow, except
if your CPU has sha256 acceleration) or blake2b (fast, in software))
- Then it checks whether this chunk is already in the repo (local hashtable lookup,
fast). If so, the processing of the chunk is completed here. Otherwise it needs to
- Then it checks whether this chunk is already in the repo (local hashtable lookup,
fast). If so, the processing of the chunk is completed here. Otherwise it needs to
process the chunk:
- Compresses (the default lz4 is super fast)
- Encrypts (AES, usually fast if your CPU has AES acceleration as usual
since about 10y)
- Authenticates ("signs") using hmac-sha256 or blake2b (see above),
- Transmits to repo. If the repo is remote, this usually involves an SSH connection
(does its own encryption / authentication).
- Stores the chunk into a key/value store (the key is the chunk id, the value
- Stores the chunk into a key/value store (the key is the chunk id, the value
is the data). While doing that, it computes a CRC32 of the data (repo low-level
checksum, used by borg check --repository) and also updates the repo index
checksum, used by borg check --repository) and also updates the repo index
(another hashtable).

Subsequent backups are usually very fast if most files are unchanged and only
Expand Down Expand Up @@ -928,14 +935,14 @@ If you feel your Borg backup is too slow somehow, here is what you can do:

- Make sure Borg has enough RAM (depends on how big your repo is / how many
files you have)
- Use one of the blake2 modes for --encryption except if you positively know
- Use one of the blake2 modes for --encryption except if you positively know
your CPU (and openssl) accelerates sha256 (then stay with hmac-sha256).
- Don't use any expensive compression. The default is lz4 and super fast.
Uncompressed is often slower than lz4.
- Just wait. You can also interrupt it and start it again as often as you like,
it will converge against a valid "completed" state (see ``--checkpoint-interval``,
maybe use the default, but in any case don't make it too short). It is starting
from the beginning each time, but it is still faster then as it does not store
from the beginning each time, but it is still faster then as it does not store
data into the repo which it already has there from last checkpoint.
- If you don’t need additional file attributes, you can disable them with ``--noflags``,
``--noacls``, ``--noxattrs``. This can lead to noticable performance improvements
Expand All @@ -945,12 +952,12 @@ If you feel that Borg "freezes" on a file, it could be in the middle of processi
large file (like ISOs or VM images). Borg < 1.2 announces file names *after* finishing
with the file. This can lead to displaying the name of a small file, while processing the
next (larger) file. For very big files this can lead to the progress display show some
previous short file for a long time while it processes the big one. With Borg 1.2 this
previous short file for a long time while it processes the big one. With Borg 1.2 this
was changed to announcing the filename before starting to process it.

To see what files have changed and take more time processing, you can also add
``--list --filter=AME --stats`` to your ``borg create`` call to produce more log output,
including a file list (with file status characters) and also some statistics at
``--list --filter=AME --stats`` to your ``borg create`` call to produce more log output,
including a file list (with file status characters) and also some statistics at
the end of the backup.

Then you do the backup and look at the log output:
Expand Down
28 changes: 28 additions & 0 deletions docs/internals/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -865,6 +865,31 @@ Encryption

.. seealso:: The :ref:`borgcrypto` section for an in-depth review.

AEAD modes
~~~~~~~~~~

Uses modern AEAD ciphers: AES-OCB or CHACHA20-POLY1305.
For each borg invocation, a new sessionkey is derived from the borg key material
and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter
to our IV, so we'll just count up by 1 per chunk).

The chunk layout is best seen at the bottom of this diagram:

.. figure:: encryption-aead.png
:figwidth: 100%
:width: 100%

No special IV/counter management is needed here due to the use of session keys.

A 48 bit IV is way more than needed: If you only backed up 4kiB chunks (2^12B),
the IV would "limit" the data encrypted in one session to 2^(12+48)B == 2.3 exabytes,
meaning you would run against other limitations (RAM, storage, time) way before that.
In practice, chunks are usually bigger, for big files even much bigger, giving an
even higher limit.

Legacy modes
~~~~~~~~~~~~

AES_-256 is used in CTR mode (so no need for padding). A 64 bit initialization
vector is used, a MAC is computed on the encrypted chunk
and both are stored in the chunk. Encryption and MAC use two different keys.
Expand All @@ -884,6 +909,9 @@ To reduce payload size, only 8 bytes of the 16 bytes nonce is saved in the
payload, the first 8 bytes are always zeros. This does not affect security but
limits the maximum repository capacity to only 295 exabytes (2**64 * 16 bytes).

Both modes
~~~~~~~~~~

Encryption keys (and other secrets) are kept either in a key file on the client
('keyfile' mode) or in the repository config on the server ('repokey' mode).
In both cases, the secrets are generated from random and then encrypted by a
Expand Down
Binary file added docs/internals/encryption-aead.odg
Binary file not shown.
Binary file added docs/internals/encryption-aead.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
90 changes: 86 additions & 4 deletions docs/internals/security.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,88 @@ prompt is a set BORG_PASSPHRASE. See issue :issue:`2169` for details.
Encryption
----------

Encryption is currently based on the Encrypt-then-MAC construction,
AEAD modes
~~~~~~~~~~

Modes: --encryption (repokey|keyfile)-[blake2-](aes-ocb|chacha20-poly1305)

Supported: borg 1.3+

Encryption with these modes is based on AEAD ciphers (authenticated encryption
with associated data) and session keys.

Depending on the chosen mode (see :ref:`borg_init`) different AEAD ciphers are used:

- AES-256-OCB - super fast, single-pass algorithm IF you have hw accelerated AES.
- chacha20-poly1305 - very fast, purely software based AEAD cipher.

The chunk ID is derived via a MAC over the plaintext (mac key taken from borg key):

- HMAC-SHA256 - super fast IF you have hw accelerated SHA256.
- Blake2b - very fast, purely software based algorithm.

For each borg invocation, a new session id is generated by `os.urandom`_.

From that session id, the initial key material (ikm, taken from the borg key)
and an application and cipher specific salt, borg derives a session key via HKDF.

For each session key, IVs (nonces) are generated by a counter which increments for
each encrypted message.

Session::

sessionid = os.urandom(24)
ikm = enc_key || enc_hmac_key
salt = "borg-session-key-CIPHERNAME"
sessionkey = HKDF(ikm, sessionid, salt)
message_iv = 0

Encryption::

id = MAC(id_key, data)
compressed = compress(data)

header = type-byte || 00h || message_iv || sessionid
aad = id || header
message_iv++
encrypted, auth_tag = AEAD_encrypt(session_key, message_iv, compressed, aad)
authenticated = header || auth_tag || encrypted

Decryption::

# Given: input *authenticated* data and a *chunk-id* to assert
type-byte, past_message_iv, past_sessionid, auth_tag, encrypted = SPLIT(authenticated)

ASSERT(type-byte is correct)

past_key = HKDF(ikm, past_sessionid, salt)
decrypted = AEAD_decrypt(past_key, past_message_iv, authenticated)

decompressed = decompress(decrypted)

ASSERT( CONSTANT-TIME-COMPARISON( chunk-id, MAC(id_key, decompressed) ) )

Notable:

- More modern and often faster AEAD ciphers instead of self-assembled stuff.
- Due to the usage of session keys, IVs (nonces) do not need special care here as
they did for the legacy encryption modes.
- The id is now also input into the authentication tag computation.
This strongly associates the id with the written data (== associates the key with
the value). When later reading the data for some id, authentication will only
succeed if what we get was really written by us for that id.
Comment on lines +193 to +196

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Curious user here who happens to be a cryptographer.)

What gains do you expect from this?

So if I read this correctly, then the chunk id is an additional MAC over the uncompressed and unencrypted data. But the AEAD already authenticates the unencrypted data (no matter whether it's encrypted or not), so I fail to see the purpose of adding the id to the AAD.

Does the id have a purpose in other parts of borg or is this just for the encryption?

I may be able to provide a more detailed review but I realize I'm quite late to the party... Do you think this would still be helpful?

Copy link
Member Author

@ThomasWaldmann ThomasWaldmann Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tim,

thanks for looking at our crypto, review there is very welcome (and not too late, borg2 is still in alpha, soon beta, so we could still change things)!

If we do not consider time needed for computing the MAC (chunkid) from plaintext, there is no advantage. But we would have to compute the MAC to verify if it is the same as the chunkid we wanted when we asked the repo for that chunk. borg 1.x does it like that. Without that, the self-made AEAD construction we used in 1.x (AES-CTR + HMAC-SHA256 and not including the chunkid) would just be able to tell that the content was written by us (authentic), but we could not be sure if we wrote that for that chunkid (could have been also some other chunkid).

If we feed the chunkid into the AAD computation when storing the chunk and also when requesting the chunk, the authentication will fail if:

  • the content is corrupted / tampered
  • the content is ok, but not for the chunkid we wanted

Thus, we do not need to compute the MAC to verify if the chunk is really for the chunkid we wanted - saves some CPU time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this makes a lot of sense now! Adding the chunkid as AAD binds the key (as in "key-value store") to the value.

Thus, we do not need to compute the MAC to verify if the chunk is really for the chunkid we wanted - saves some CPU time.

Indeed yes, I see that this would save CPU time. But then I'm confused whether it's really implemented like this? AFAIU the code still recomputes the MAC:

self.assert_id(id, data)

thanks for looking at our crypto, review there is very welcome (and not too late, borg2 is still in alpha, soon beta, so we could still change things)!

Thanks for the warm welcome. I'll keep looking at the PR and docs here. I'll probably continue to ask some very basic questions, so please bear with me. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed by #6880 (and added a test to make sure it works as expected).



Legacy modes
~~~~~~~~~~~~

Modes: --encryption (repokey|keyfile)-[blake2]

Supported: all borg versions, blake2 since 1.1

DEPRECATED. We strongly suggest you use the safer AEAD modes, see above.

Encryption with these modes is based on the Encrypt-then-MAC construction,
which is generally seen as the most robust way to create an authenticated
encryption scheme from encryption and message authentication primitives.

Expand All @@ -137,7 +218,7 @@ in the future.

Depending on the chosen mode (see :ref:`borg_init`) different primitives are used:

- The actual encryption is currently always AES-256 in CTR mode. The
- Legacy encryption modes use AES-256 in CTR mode. The
counter is added in plaintext, since it is needed for decryption,
and is also tracked locally on the client to avoid counter reuse.

Expand Down Expand Up @@ -253,7 +334,7 @@ Implementations used
We do not implement cryptographic primitives ourselves, but rely
on widely used libraries providing them:

- AES-CTR and HMAC-SHA-256 from OpenSSL 1.0 / 1.1 are used,
- AES-CTR, AES-OCB, CHACHA20-POLY1305 and HMAC-SHA-256 from OpenSSL 1.1 are used,
which is also linked into the static binaries we provide.
We think this is not an additional risk, since we don't ever
use OpenSSL's networking, TLS or X.509 code, but only their
Expand All @@ -268,7 +349,8 @@ on widely used libraries providing them:

Implemented cryptographic constructions are:

- Encrypt-then-MAC based on AES-256-CTR and either HMAC-SHA-256
- AEAD modes: AES-OCB and CHACHA20-POLY1305 are straight from OpenSSL.
- Legacy modes: Encrypt-then-MAC based on AES-256-CTR and either HMAC-SHA-256
or keyed BLAKE2b256 as described above under Encryption_.
- Encrypt-and-MAC based on AES-256-CTR and HMAC-SHA-256
as described above under `Offline key security`_.
Expand Down
1 change: 1 addition & 0 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,7 @@ For automated backups the passphrase can be specified using the
A backup inside of the backup that is encrypted with that key/passphrase
won't help you with that, of course.

Only applies to repos using legacy encryption modes:
In case you lose your repository and the security information, but have an
older copy of it to restore from, don't use that later for creating new
backups – you would run into security issues (reuse of nonce counter
Expand Down
13 changes: 8 additions & 5 deletions docs/usage/init.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,19 @@ Examples
~~~~~~~~
::

# Local repository, repokey encryption, BLAKE2b (often faster, since Borg 1.1)
$ borg init --encryption=repokey-blake2 /path/to/repo
# Local repository, recommended repokey AEAD crypto modes
$ borg init --encryption=repokey-aes-ocb /path/to/repo
$ borg init --encryption=repokey-chacha20-poly1305 /path/to/repo
$ borg init --encryption=repokey-blake2-aes-ocb /path/to/repo
$ borg init --encryption=repokey-blake2-chacha20-poly1305 /path/to/repo

# Local repository (no encryption)
# Local repository (no encryption), not recommended
$ borg init --encryption=none /path/to/repo

# Remote repository (accesses a remote borg via ssh)
# repokey: stores the (encrypted) key into <REPO_DIR>/config
$ borg init --encryption=repokey-blake2 user@hostname:backup
$ borg init --encryption=repokey-aes-ocb user@hostname:backup

# Remote repository (accesses a remote borg via ssh)
# keyfile: stores the (encrypted) key into ~/.config/borg/keys/
$ borg init --encryption=keyfile user@hostname:backup
$ borg init --encryption=keyfile-aes-ocb user@hostname:backup
6 changes: 3 additions & 3 deletions src/borg/archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -1789,7 +1789,7 @@ def mark_as_possibly_superseded(id_):

def add_callback(chunk):
id_ = self.key.id_hash(chunk)
cdata = self.key.encrypt(chunk)
cdata = self.key.encrypt(id_, chunk)
add_reference(id_, len(chunk), len(cdata), cdata)
return id_

Expand All @@ -1811,7 +1811,7 @@ def verify_file_chunks(archive_name, item):
def replacement_chunk(size):
chunk = Chunk(None, allocation=CH_ALLOC, size=size)
chunk_id, data = cached_hash(chunk, self.key.id_hash)
cdata = self.key.encrypt(data)
cdata = self.key.encrypt(chunk_id, data)
csize = len(cdata)
return chunk_id, size, csize, cdata

Expand Down Expand Up @@ -1998,7 +1998,7 @@ def valid_item(obj):
archive.items = items_buffer.chunks
data = msgpack.packb(archive.as_dict())
new_archive_id = self.key.id_hash(data)
cdata = self.key.encrypt(data)
cdata = self.key.encrypt(new_archive_id, data)
add_reference(new_archive_id, len(data), len(cdata), cdata)
self.manifest.archives[info.name] = (new_archive_id, info.ts)
pi.finish()
Expand Down
Loading