-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default password hash of sha-1 is insecure #2426
Comments
It's definitely not a good example for password storage, but a collision attack is not a plausible vector for stealing a password - an attacker would have to trick you into picking a specific long and unlikely password, which they would have to know, in order to generate a different password that results in the same hash. And if someone can persuade you to use a password they know, the collision is irrelevant. It's a good idea to move to a stronger hash anyway, but the recent news doesn't make that particularly urgent. A collision attack (making two inputs which produce the same hash) is generally much easier to do than a preimage attack (making an input which produces a given hash). The former is what has happened for SHA-1, and I don't believe it makes much difference for this use case. If authenticating users to access the notebook is important, we recommend using JupyterHub. |
That's a good point; thanks. |
The current hashing method is indeed very insecure, but not really because it uses SHA-1. The problem is that it doesn't iterate the hash, making the hash function so blazing fast that brute-force attacks, using newly-available fast hardware, are way too tempting, even with a salt. Only doing a single iteration is madness. As even the hashlib documentation stated long ago:
Since then, hardware acceleration has expanded speeds tremendously, and a homemade rig can try 26 billion trial passwords a second. Even passwords which were once considered pretty strong can be cheaply cracked these days via online services. Offhand, pull request #3793 to use bcrypt would help with security, but requires a dependency on an external library. It would seem better offhand to use the Soon I hope Python will support the much better Argon2 standard for password hashing, which also allows specifying space and parallelism complexity. Until then, users should either be warned away from hashing passwords feature, or counseled to use really great passwords. And counseled to throw away and never reuse passwords they may have deployed in the past this way. For more background, see Developers, its 2019, hash password accordingly |
(...Continuing a conversation from the PR at #3793...) With either scrypt or argon2, I guess the next question is how much memory overhead we're comfortable with. The default currently emphasize maximizing memory (3. Figure out the maximum amount m of memory that each call can afford.) But the initial size of Jupyter seems more like 50 MB, based on my laptop running Ubuntu 18.04, if I'm interpreting the output of What is the maximum (momentary) extra memory hit you're comfortable with? I guess login most often happens at startup, but if the system is really short on memory, you'd hate to cause problems for users trying to log in to a notebook to gracefully shut it down. So perhaps we go with a smaller value like We could also consider different values for the maximum number of threads to use, or pick a different maximum real time. |
Is any synergy with JupyterHub desired? Offhand it seems that JupyterHub can leverage authentication from either a variety of external sources as noted at Authenticators, or from a native user authenticator. The latter stores password hashes locally, generated via bcrypt with default parameters, as see at nativeauthenticator.py That seems to call for 2^12 bcrypt rounds, which isn't too bad according to Recommended # of rounds for bcrypt - Information Security Stack Exchange. But I'll note that bcrypt isn't as memory-hard (4 KiB) as argon2 or scrypt (configurable up to MB or GB), and there is now FGPA support for bcrypt in Jack the Ripper (popular cracking software), which allows low-power cracking with significant speedups, if you can find suitable hardware. |
I've reviewed the situation some more, and think the consensus continues to shift to Argon2id. So I think we should use argon2-cffi · PyPI with these default parameters:
as discussed above. That involves using about 10 MiB of memory. Those parameters should be adjusted in future releases. See the Django Argon2 work factor section and class Argon2PasswordHasher(BasePasswordHasher) for their approach, though I think making the parameters adjustable via |
I don't think we have to go with state-of-the-art crypto here. Using something that can easily be installed and won't cause issues for most users probably should be preferred. Like @takluyver, people needing stronger authentication should be using JupyterHub already. I'd just like to remind everyone that the current code is terribly insecure and can be improved at no cost, we should try to improve that soon. |
I agree with @remram44 that the priority is addressing the very dangerous use of the current very fast SHA-1 approach. If another dependency is ok, I think Argon2 is still the best bet. If not, even PBKDF2 would be a huge improvement, and it is the default (with a SHA256 hash and I don't think bcrypt is preferable to Argon2. |
I'm curious to know how this all fits in with JupyterLab. Does it inherit whatever solution we come up with here? Or does it have yet a different way of using passwords with notebook servers? |
As of today, JupyterLab uses this package as its server. It is migrating to use the new |
see https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
would be better to use sha256, or even better something like scrypt or bcrypt (which are actually meant for password hashing)
The text was updated successfully, but these errors were encountered: