Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a more modern KDF than PBKDF2 #747

Closed
enkore opened this issue Mar 14, 2016 · 40 comments
Closed

Using a more modern KDF than PBKDF2 #747

enkore opened this issue Mar 14, 2016 · 40 comments
Assignees
Labels
Milestone

Comments

@enkore
Copy link
Contributor

enkore commented Mar 14, 2016

With repokey the encrypted keys are stored in the repo itself. So in almost any attack scenario the attacker has access to that (And if we store a repo in essentially untrusted storage, e.g. "the cloud", we must assume that the repokey blob is essentially 'public knowledge'). Which makes me think that PBKDF2 (with it's susceptibility to GPU/FPGA and ASIC based attacks) might not be the best choice here.

Upgrading the repokey can be done safely and transparently to the user when accessing a repo. Increasing just the iterations of PBKDF2 would not make the problem go away.

There are some recent developments into key derivation functions which are very hard to speed up with GPUs or dedicated hardware, e.g. scrypt, argon2d/argon2i. These can be tuned to derive a key in a reasonable time frame on commodity hardware (0.1 s < t < 1 s) while off-line attacks remain essentially inefficient.

I think we are in a very favourable position here, since it is not an issue if the KDF takes one second per key.

EDIT: This of course also affects the other key storage methods, repokey just being the most handy of them.

@ThomasWaldmann
Copy link
Member

There was already a discussion about pbkdf2, iterations, etc. in another ticket.

One paper linked from there pointed out that increasing iterations is not best method to increase security, but one should increase passphrase length.

So, do you still think we need another kdf even if people choose long passphrases?

The problem with these other kdfs is that they often add another dependency. If we add a dependency that is not present in some linux distribution, borg can't be packaged for it (or is way more effort to package all the stuff needed). Even worse, if we add a conflicting requirement, it might even block packaging.

@enkore
Copy link
Contributor Author

enkore commented Mar 15, 2016

another ticket

I should have searched before creating this one: #77

Yes, the dependency hell is an issue, especially since soft-dependencies are IMHO not an option for borg core functionality. Vendoring might be an option, since they're small, but meh. That's not a clean solution either.

I don't think that there is "immediate need for action". There is however a point to being "truly and completely paranoid safe than sorry just a little paranoid safe". Of course a very long, high entropy pass phrase is an absolute must, independent of KDF.

Specifically re. argon there are even two bindings floating around, both of which are maintained. Huh?

In summary: maybe increase them a little, or use calibration (on borg create / KeyfileKeyBase.create), but either one is really a "nice to have". Other KDFs are more a long-term thing until they have hit common libraries and distros.

Leave this open for now and tag later?

@ThomasWaldmann
Copy link
Member

Yes, tagging it "later". Maybe edit ticket title to "use more modern kdf?"?

@enkore enkore changed the title Repokey and long(ish)-term security Long(er)-term security: Using a more modern KDF than PBKDF2? Mar 15, 2016
@ssd63
Copy link

ssd63 commented Mar 22, 2016

Of course a very long, high entropy pass phrase is an absolute must, independent of KDF.

I object. In a perfect world, where everybody used secure passwords, and choosed a different password every time, nobody would care about the speed of a kdf, as cracking the key itself would cost as much time as cracking the password.
Indeed, weak passwords (only 16 alphanumeric chars) or password reusage are the reason for all those modern kdfs.

@ghost
Copy link

ghost commented Oct 29, 2016

The problem with these other kdfs is that they often add another dependency. If we add a dependency that is not present in some linux distribution, borg can't be packaged for it (or is way more effort to package all the stuff needed). Even worse, if we add a conflicting requirement, it might even block packaging.

Today we use Docker or any other things to split project to many microservices.. Anyway, you can add this support as Django, make them optional (for example, if you have skills - you can install Argon2i or Bcrypt or something like that, it will require third-party app, if not - you can use default settings)

@enkore
Copy link
Contributor Author

enkore commented Oct 29, 2016

We don't want optional or "it depends" features in core functionality like the crypto, because it means that some packaged versions of Borg can't read what Borg from another distro or just another release of a distro wrote, even if they're exactly the same version. For example to be able to restore data one would need to consider what packages are available on the distro and what the maintainer decided he would compile in etc. and that's bad.

While PBKDF2 isn't exactly state of teh art anymore there are no known flaws in it (only that there are better alternatives); and if the password has enough entropy in it then it will withstand attacks of significant scale*. A better KDF shouldn't be a reason to step down passphrase entropy.

* Borg uses PBKDF2-SHA256 so it has 256 bits of internal state you can stuff with entropy via the passphrase. A passphrase with more than 256 bits of entropy wouldn't make it harder anymore. However, attacking the AES encryption of the key directly probably becomes more economic before that... not that anyone ever managed to do that.

@ghost
Copy link

ghost commented Oct 29, 2016

But it's not possible choose easy and secure together.

If I need secure it mean, I can rebuild kernel with my custom config options, I can split 1 big projects to many small microservices, limit their access rights (put this all code in different containers or KVM VPS'es), resources etc, I use disc encryption etc. If we tell about client machine, it mean https://tails.boum.org/ , which uses Tor for all things...

I can spend week of time for learning such things.

At the end solution will be secure but it will not be possible to reuse it on another OS or may be even on another CPU... And it's acceptable solution for people, who need do things secure.

I do not say about default settings (I know I lot of newbies, who come to Linux and know nothing more than apt-get install ), but as optional we should do something for people, who need ultimate security settings. Or this feature will not be usable for them and they will not use encryption and add them later.

@enkore
Copy link
Contributor Author

enkore commented Oct 29, 2016

As @ssd63 and I pointed out above a better KDF doesn't do anything for security if your passphrase is of high quality.

Using argon2 would mainly benefit people whose passphrases are not of high quality. Users with high quality passphrases receive no benefit.

@ghost
Copy link

ghost commented Oct 29, 2016

Just 2 words about compatibility problems.

There is no problem to use borgbackup from Docker container with any even exotic dependencies.

-1. Just follow guide https://docs.docker.com/engine/installation/linux/ubuntulinux/ and install Docker on server and client machine (Docker for Mac or Docker for Windows, for example). Just copy-paste guide as usual.

-2. On client machine create just 1 file with name Dockerfile and contents:

FROM buildpack-deps:jessie
# It's Debian Jessie + some packages , more here https://hub.docker.com/_/buildpack-deps/

RUN set -ex \
    && echo 'deb http://ftp.debian.org/debian jessie-backports main' >> /etc/apt/sources.list \
    && apt-get update \
    && apt-get -y dist-upgrade \
    && apt-get -y autoremove \
    && apt-get autoclean \
    && apt-get -y -t jessie-backports  install borgbackup

-3. run build command docker build user/repository:borgbackup-latest .
where user and repository - your user and repository on https://hub.docker.com/ (1 free private repository, in this case you can use public), this command will build image with borgbackup. You can improve RUN command and reuse command again to get updated version of image.

-4. run push command docker push user/repository:borgbackup-latest to send image to repository

-5. run borgbackup on server docker run -it --rm user/repository:borgbackup-latest borg -V and you will see borg 1.0.7.

Let's assume, borg collective will create repository with Docker image [with all exotic dependencies if needed] and push it to public repository. It will mean, to use borgbackup, end user should only install Docker on server and nothing more. He/she can directly use your image from your official repository (step 5.) or copy-paste Dockerfile and build it yourself.

So I mean, it should not be problem to install borgbackup or any other software to any server. Just install Docker and use any software with any dependencies you need...

@ghost
Copy link

ghost commented Oct 29, 2016

So I suggest give choice to users: if they need easy and simple tools - they just use default settings and if will work everywhere. If they are perfectionists and prefer step by step configuring - why not?.. They understand, it may require some skills and they are ready... As perfectionist I would like to choose all settings myself and select algorithms. And with Docker even newbies can use them everywhere too (just add some lines to RUN commands)... Even if he/she use, for example, Gentoo, he/she can use Docker container with Debian and borgbackup inside with all dependencies.

@enkore
Copy link
Contributor Author

enkore commented Feb 17, 2017

Currently key files are encrypted using AES-CTR and HMAC-SHA256 in an Encrypt-and-MAC (!= -then-) scheme. That doesn't have any known weaknesses (unlike AES-CBC, which would at least theoretically make Borg vulnerable to padding oracle attacks [at the rate of one-guess-per-borg-invocation-and-failure]), but could also be done using our usual EtM scheme, which strongly adheres to up-to-date cryptographic recommendations (since it doesn't require much head-scratching to see that it works).

So for key file format v2 we want:

  • Using a more modern KDF than PBKDF2 #747 better, more paranoid KEK derivation as above
  • EtM using AES-CTR and HMAC-SHA256 or ChaPoly AEAD (perhaps even with another ...-then-MAC layer around it?), if we have it by then (probable).

Note that this is wholly incompatible with older borg versions, but if we improve the handling of unknown/unparsable keys, then we could leave a v1 borg key file locally for older clients.

(I melded this and #2173 together since they're quite close together and both break compat in the same spot)

@rugk rugk mentioned this issue Jul 21, 2017
@rugk
Copy link
Contributor

rugk commented Jul 21, 2017

As for KDFs even the Wikipedia article mentions some alternatives…

"Encrypt-and-MAC" Really?
(In the docs this not mentioned at all…)

That's not nice at all. Encrypt-then-MAC should be preferred nowadays.
Quoting here I see that SSH had problems with it, the integrity of the plaintext cannot be ensured and such problems.

The most important point in favor of Encrypt-then-MAC however is this one:

The MAC does not provide any information on the plaintext

And that's the important point, whcih prevents any sidechannel attacks using the MAC in other modes…

@rugk
Copy link
Contributor

rugk commented Jul 21, 2017

But should not we rather open a new issue for this topic? Or maybe include it in #1044? @ThomasWaldmann

@enkore
Copy link
Contributor Author

enkore commented Jul 21, 2017

That's documented here.

The most important point in favor of Encrypt-then-MAC however is this one:

Actually, the most important point is that modes with validated padding are susceptible to a padding oracle attack in E&M which is able to decrypt ciphertexts in a linear number of tries (~128 tries per ciphertext byte).

A MAC is not required to be a PRF, but within the standard model HMAC is a PRF if the hash is a PRF, and SHA2 is a PRF within its security margin. Therefore, no information on the plaintext is provided. (This is a bit hand-wavy since I'm preoccupied right now, but it's essentially correct)

@Fxrh
Copy link

Fxrh commented Feb 9, 2022

Are there currently any plans to support Argon2i as KDF?

I think argon2i would increase the security of borg in the RepoKey mode significantly, as even decent-length passwords will always be the weakest link in such a setup. Further, regarding the dependency problem, LUKS2 (i.e., the linux disk encryption system, version 2 seems to exist since kernel 4.12) uses Argon2i as default for key derivation, so I'd assume argon2i should exist on any current Linux distribution by now.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Feb 10, 2022

I could imagine we add argon2 within the helium milestone. Needs some careful checks for availability, usability, etc. first though.

update: python package argon2-cffi seems to be quite widespread. considering there's an rfc now for argon2, support for it seems quite safe and will improve in future. found packages for misc. linuxes, BSDs, macOS, windows, but not for openindiana.

Also, performance needs to be evaluated. Some people use devices like the raspberry pi, others have very high speed server or desktop cpus. borg needs to support a wide range of devices except these which are problematic already due to other reasons (e.g. not enough RAM for borg's in-memory hashtables).

update: argon2-cffi support different profiles. the high-memory one (2GiB) could be problematic for some users, the low-memory one (64MiB) should work for all borg users.

@ThomasWaldmann ThomasWaldmann added this to the helium milestone Feb 22, 2022
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Feb 22, 2022

Ideas / Plan:

  • keyfile format 2 (see Using a more modern KDF than PBKDF2 #747 (comment) )
  • https://github.com/hynek/argon2-cffi - considering argon2 won the pw hashing competition, guess we should use it if available.
  • maybe use scrypt - we could use this at "no cost" as a binding comes with py36 hashlib.scrypt. maybe this is interesting as a better-than-pbkdf2 alternative if there is no argon2 on some platform? this can be added later and is not part of the bounty.
  • pbkdf2 - do not change iterations (see top post), rather upgrade keys to argon2 (or scrypt).

Bounty: https://app.bountysource.com/issues/31864649-long-er-term-security-using-a-more-modern-kdf-than-pbkdf2

@hexagonrecursion
Copy link
Contributor

B) This would immediately and automatically break mixed setups with shared repos. Key upgrade could be also done manually via borg key .... A prompt is not very helpful as a lot of stuff is scripted and there is no interactive user to prompt (so it would just hang or fail).

I am sorry. I have misunderstood your previous comment. Thanks for clarification

@hexagonrecursion
Copy link
Contributor

EncryptedKey.algorithm is a confusingly ambiguous name. I did eventually find the documentation telling me that it means both the kdf algorithm and the hmac algorithm. Surprisingly there were no comments in the source. Should we rename it to message_authentication_argorithm while we are at it?

algorithm = PropDict._make_property('algorithm', str, encode=str.encode, decode=bytes.decode)

@hexagonrecursion
Copy link
Contributor

hexagonrecursion commented Mar 19, 2022

There are several commands that encrypt the key

  • borg key change-passphrase - I think this one should not change the KDF - this is less magical and less surprising
  • borg key change-location - I think this one should not change the KDF either
  • borg key export - default to version 2, allow version 1 via a command line switch
  • borg key import - default to version 2
  • borg init - default to version 2
  • borg key change-kdf - not implemented yet
  • Did I forget anything?

@ThomasWaldmann
Copy link
Member

About algorithm: I would avoid changing the attribute name, but for v2 keys, we can have better values, like argon2-aes256-ctr-hmac-sha256 (or whatever we end up with).

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 19, 2022

commands dealing with the key:

  • i agree that change-passphrase, change-location should only change what they have in their name
  • import / export: guess they should not change anything either, but just reproduce in another form
  • init: yes, default to v2
  • maybe rather borg key change-version (like v1 -> v2).

@hexagonrecursion
Copy link
Contributor

More user expecience considerations: a key currently has two versions: borg.item.EncryptedKey.version == 1 and borg.item.Key.version == 1. I think we should not expose both versions separately in the user interface - just one version is enough, two would be unnecessary cognitive overhead for the end user.
I Propose:

User-visible version EncryptedKey.version Key.version
1 1 1
2 2 1
A hypothetical future expansion:
3 2 3

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 22, 2022

[Encrypted]Key.version: i don't think this is for the end user or UI, but rather for the borg code to process different kinds of keys correctly.

It's not totally clear to me right now (check if there are docs). Looks like there is some overlap, both version and algorithm could do that (let borg process such a key correctly) on their own already. Problem cases like algorithm A version 1 meaning someting different than algorithm A version 2 could be avoided by just choosing different algorithm names.

Update: There is a bit: https://borgbackup.readthedocs.io/en/stable/internals/data-structures.html#key-files

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 22, 2022

Key: version should stay at 1 as long as the data structure is compatible with what we have in borg < 1.3.
I don't see a good reason why we should change that version number now. In my AEAD crypto PR, i just use ikm=enc_key + enc_hmac_key, so no change in the key data structure or version is needed. No "algorithm" in Key.

EncryptedKey: there we have algorithm and version and it is all about how the inner Key is processed / encrypted and how the outer EncryptedKey looks like..

An interesting question might be if a v2 EncryptedKey (or Vn+1 in general) has a superset of attributes compared to a v1 (Vn in general) key. If that is not the case, we maybe should first peek into data (see creation of EncryptedKey(internal_dict=data), extract the version from there and then, depending on the version we want, we create an object of class EncryptedKeyV1 or EncryptedKeyV2 - they could be totally different.

@hexagonrecursion
Copy link
Contributor

hexagonrecursion commented Mar 23, 2022

I am sorry. I should communicate better.

I wanted to figure out what interface to present to the end user and how to document it.

## The user may want to create a key compatible with old borgs:
borg init --encrypted-key-version 1 ...
# By year 3000 this may grow to:
borg init --key-version 1 --encrypted-key-version 1 ...

## Even if --key-version 2 --encrypted-key-version 2 is the current default for `borg init`,
# `borg key change-version` will require the user to be explicit:
# Only affects Key.version:
borg key change-version --key-version 2 ...
# Only affects EncryptedKey.version:
borg key change-version --encrypted-key-version 2 ...

We could simplify and use one argument to control both:

--key-version Key.version EncryptedKey.version EncryptedKey.algorithm defaults to
1 1 1 sha256
2 1 2 argon2 aes256-ctr hmac-sha256
3 3 (or 2 if you prefer not to skip a version) 2 foobar42

We may want to introduce a separate switch for the algorithm at some point, but I think haing separate --key-version and --encrypted-key-version is unnecessary cognitive overhead for the end user.

@rugk
Copy link
Contributor

rugk commented Mar 23, 2022

IMHO (just a note from an outsider), any string (like mentioning the actual used algorithm) would be better than tossing numbers around UX. Or maybe just an --upgrade-to-latest for borg key or generally a latest parameter for key or whatever version. I guess in 99% of the use cases you want to use the latest version and best security/whatever is currently recommended security-wise.

@hexagonrecursion
Copy link
Contributor

Here is another thought: we could keep EncryptedKey.version at 1 and dispatch based on EncryptedKey.algorithm instead. I am also considering folding argon2 type into the algorithm: 'argon2id aes256-ctr hmac-sha256' instead of 'argon2 aes256-ctr hmac-sha256' with a separate filed for type. We can then present a cleaner UI:

# --key-algorithm will eventually default to 'argon2id aes256-ctr hmac-sha256'
borg init ...
# Explicitly create a key compatible with old borgs:
# 'pbkdf2-sha256 aes256-ctr hmac-sha256' will internally
# map to 'sha256' - the magic string we use to refer to this
# algorithm in our file format
borg init --key-algorithm 'pbkdf2-sha256 aes256-ctr hmac-sha256' ...
# Upgrade the algorithm to the current best recommendation:
borg key change-algorithm ...
# Same as above (for now), but explicit
borg key change-algorithm --key-algorithm 'argon2id aes256-ctr hmac-sha256' ...
# Downgrade
borg key change-algorithm --key-algorithm 'pbkdf2-sha256 aes256-ctr hmac-sha256' ...

I think this is a cleaner interface then anything involving version numbers that imply algorithms or algorithms that imply version numbers and I think dispatch on EncryptedKey.algorithm is the most straightforward way to implement this.

@ThomasWaldmann
Copy link
Member

There is no need to put the argon2 with type into the algorithm name, it would just complicate things.
IF we want the type flexible and not just always use ID, we'll have that in some argon2_type attribute in the key like for all the other variable parameters of argon2.

For the EncryptedKey, try keeping the version at 1 and implement dispatch only based on algorithm name and we'll see how that goes.

hexagonrecursion added a commit to hexagonrecursion/borg that referenced this issue Mar 26, 2022
@ThomasWaldmann
Copy link
Member

+50 USD for the additional work caused by conflicting changes by #6463.

hexagonrecursion added a commit to hexagonrecursion/borg that referenced this issue Mar 27, 2022
1. Note: I have rebased this on top of 78f0414 and fixed the code to work with the new Passphrase.argon2 interface
2. Refactor: s/enc_key/encrypted_key/ - I intend to use the name enc_key for something else
3. Dispatch on algorithm instead of version borgbackup#747 (comment)

New: rebased on 28731c5 (current master)
@ThomasWaldmann ThomasWaldmann modified the milestones: helium, 1.3.0a1 Apr 8, 2022
@ThomasWaldmann
Copy link
Member

@hexagonrecursion fixed this in these changesets:

#6468
#6469
#6549
#6552
#6556
#6560

Thanks a lot!

@ThomasWaldmann
Copy link
Member

56c27a9#r70931262

dependency issue found by @bket.

@hexagonrecursion
Copy link
Contributor

56c27a9#r70931262

dependency issue found by @bket.

I'll try to find time for this tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants