default password hash of sha-1 is insecure #2426

jnhnum1 · 2017-04-19T04:39:29Z

see https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

would be better to use sha256, or even better something like scrypt or bcrypt (which are actually meant for password hashing)

takluyver · 2017-04-19T10:49:08Z

It's definitely not a good example for password storage, but a collision attack is not a plausible vector for stealing a password - an attacker would have to trick you into picking a specific long and unlikely password, which they would have to know, in order to generate a different password that results in the same hash. And if someone can persuade you to use a password they know, the collision is irrelevant. It's a good idea to move to a stronger hash anyway, but the recent news doesn't make that particularly urgent.

A collision attack (making two inputs which produce the same hash) is generally much easier to do than a preimage attack (making an input which produces a given hash). The former is what has happened for SHA-1, and I don't believe it makes much difference for this use case.

If authenticating users to access the notebook is important, we recommend using JupyterHub.

jnhnum1 · 2017-04-19T17:46:53Z

That's a good point; thanks.

nealmcb · 2019-09-07T22:26:22Z

The current hashing method is indeed very insecure, but not really because it uses SHA-1.

The problem is that it doesn't iterate the hash, making the hash function so blazing fast that brute-force attacks, using newly-available fast hardware, are way too tempting, even with a salt.
The fact that a collision was found just underscores the speed with which hashes can be done now, and the need for massive numbers of iterations.

Only doing a single iteration is madness. As even the hashlib documentation stated long ago:

As of 2013, at least 100,000 iterations of SHA-256 are suggested.

Since then, hardware acceleration has expanded speeds tremendously, and a homemade rig can try 26 billion trial passwords a second. Even passwords which were once considered pretty strong can be cheaply cracked these days via online services.

Offhand, pull request #3793 to use bcrypt would help with security, but requires a dependency on an external library.

It would seem better offhand to use the hashlib.pbkdf2_hmac function with a big iteration count. It is standard as of Python 3.4, and described in Hashing Passwords in Python – Useful code.

Soon I hope Python will support the much better Argon2 standard for password hashing, which also allows specifying space and parallelism complexity.

Until then, users should either be warned away from hashing passwords feature, or counseled to use really great passwords. And counseled to throw away and never reuse passwords they may have deployed in the past this way.

For more background, see Developers, its 2019, hash password accordingly

nealmcb · 2020-06-13T01:55:26Z

(...Continuing a conversation from the PR at #3793...)

With either scrypt or argon2, I guess the next question is how much memory overhead we're comfortable with. The default memory_cost is 105 MB (102400 KiB, from the handy python -m argon2 command). And the parameter tuning tips at the latest IETF internet-draft on the topic

draft-irtf-cfrg-argon2-10 - The memory-hard Argon2 password hash and proof-of-work function

currently emphasize maximizing memory (3. Figure out the maximum amount m of memory that each call can afford.)

But the initial size of Jupyter seems more like 50 MB, based on my laptop running Ubuntu 18.04, if I'm interpreting the output of smem correctly. After startup that is likely to grow, but do we want logging in to always take twice the initial size?

What is the maximum (momentary) extra memory hit you're comfortable with? I guess login most often happens at startup, but if the system is really short on memory, you'd hate to cause problems for users trying to log in to a notebook to gracefully shut it down.

So perhaps we go with a smaller value like memory_cost=10240, and a correspondingly higher number of iterations, like time_cost=18 (which takes the same time per password verification as the default parameters on my machine: ~100 ms).

We could also consider different values for the maximum number of threads to use, or pick a different maximum real time.

nealmcb · 2020-06-13T16:50:50Z

Is any synergy with JupyterHub desired? Offhand it seems that JupyterHub can leverage authentication from either a variety of external sources as noted at Authenticators, or from a native user authenticator. The latter stores password hashes locally, generated via bcrypt with default parameters, as see at nativeauthenticator.py

That seems to call for 2^12 bcrypt rounds, which isn't too bad according to Recommended # of rounds for bcrypt - Information Security Stack Exchange.

But I'll note that bcrypt isn't as memory-hard (4 KiB) as argon2 or scrypt (configurable up to MB or GB), and there is now FGPA support for bcrypt in Jack the Ripper (popular cracking software), which allows low-power cracking with significant speedups, if you can find suitable hardware.

nealmcb · 2020-06-26T21:02:22Z

I've reviewed the situation some more, and think the consensus continues to shift to Argon2id. So I think we should use argon2-cffi · PyPI with these default parameters:

memory_cost = 10240
time_cost = 18
parallelism = 8
ph = PasswordHasher(memory_cost=memory_cost, time_cost=time_cost, parallelism=parallelism)
ph.hash(password)

as discussed above. That involves using about 10 MiB of memory. Those parameters should be adjusted in future releases. See the Django Argon2 work factor section and class Argon2PasswordHasher(BasePasswordHasher) for their approach, though I think making the parameters adjustable via jupyter_notebook_config.py would be better.

remram44 · 2020-06-26T21:37:24Z

I don't think we have to go with state-of-the-art crypto here. Using something that can easily be installed and won't cause issues for most users probably should be preferred. Like @takluyver, people needing stronger authentication should be using JupyterHub already.

I'd just like to remind everyone that the current code is terribly insecure and can be improved at no cost, we should try to improve that soon.

nealmcb · 2020-06-26T22:29:45Z

I agree with @remram44 that the priority is addressing the very dangerous use of the current very fast SHA-1 approach.

If another dependency is ok, I think Argon2 is still the best bet. If not, even PBKDF2 would be a huge improvement, and it is the default (with a SHA256 hash and iterations = 260000) for that reason in Django.

I don't think bcrypt is preferable to Argon2.

nealmcb · 2020-06-29T02:24:44Z

I'm curious to know how this all fits in with JupyterLab. Does it inherit whatever solution we come up with here? Or does it have yet a different way of using passwords with notebook servers?

blink1073 · 2020-06-29T10:12:32Z

As of today, JupyterLab uses this package as its server. It is migrating to use the new jupyter_server project, so we'd have to port the change there as well. cf jupyter-server/jupyter_server#110

remram44 mentioned this issue Jul 24, 2018

Implement password hashing with argon2-cffi #3793

Merged

blink1073 closed this as completed in #3793 Jul 10, 2020

github-actions bot added the status:resolved-locked label Mar 25, 2021

github-actions bot locked as resolved and limited conversation to collaborators Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default password hash of sha-1 is insecure #2426

default password hash of sha-1 is insecure #2426

jnhnum1 commented Apr 19, 2017

takluyver commented Apr 19, 2017

jnhnum1 commented Apr 19, 2017

nealmcb commented Sep 7, 2019

nealmcb commented Jun 13, 2020 •

edited

Loading

nealmcb commented Jun 13, 2020

nealmcb commented Jun 26, 2020

remram44 commented Jun 26, 2020

nealmcb commented Jun 26, 2020 •

edited

Loading

nealmcb commented Jun 29, 2020

blink1073 commented Jun 29, 2020

default password hash of sha-1 is insecure #2426

default password hash of sha-1 is insecure #2426

Comments

jnhnum1 commented Apr 19, 2017

takluyver commented Apr 19, 2017

jnhnum1 commented Apr 19, 2017

nealmcb commented Sep 7, 2019

nealmcb commented Jun 13, 2020 • edited Loading

nealmcb commented Jun 13, 2020

nealmcb commented Jun 26, 2020

remram44 commented Jun 26, 2020

nealmcb commented Jun 26, 2020 • edited Loading

nealmcb commented Jun 29, 2020

blink1073 commented Jun 29, 2020

nealmcb commented Jun 13, 2020 •

edited

Loading

nealmcb commented Jun 26, 2020 •

edited

Loading