Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow on long (base64 encoded) strings #28

Closed
smason opened this issue Mar 29, 2022 · 5 comments
Closed

very slow on long (base64 encoded) strings #28

smason opened this issue Mar 29, 2022 · 5 comments

Comments

@smason
Copy link

smason commented Mar 29, 2022

this algorithm gets very slow on long strings hence the original upstream version recommends truncating long inputs to 100 characters, for example see dropbox/zxcvbn#69 for downstream projects running into issues

the specific issue I have is zxcvbn being used in keepassxc, and recently running into an issue that causes the UI to become unresponsive for a minute or so. this was due to another program instructing keepass to store multiple JSON Web Tokens and keepass using zxcvbn to calculate password quality for each one

these secrets (not really "passwords") are large JWTs; which are basically base64 encoded strings which take a long time for this package to check. for example, you can create a string via the following Python code:

base64.b64encode(random.randbytes(num_bytes))

running ZxcvbnMatch on this takes multiple seconds when run with num_bytes > 2000, with the reported entropy being some suitably large number

@smason
Copy link
Author

smason commented Mar 29, 2022

it seems better if this truncation was done in this library, but I'm not sure where's best.

when running on large inputs it seems to spend the majority of its time running the *Match() functions at the top then again at the end using Dijkstra to optimize the result.

I'm wondering whether it's enough to just assume the start is representative. i.e. just run it on the first 100 bytes, then multiply the entropy up by the remaining characters, e.g.:

diff --git a/zxcvbn.c b/zxcvbn.c
index ebe9e31..d929ff6 100644
--- a/zxcvbn.c
+++ b/zxcvbn.c
@@ -52,6 +52,9 @@
 /* Minimum number of characters in a incrementing/decrementing sequence match */
 #define MIN_SEQUENCE_LEN 3
 
+/* Maximum number of characters to perform full entropy calculation over */
+#define MAX_DETAIL_LEN 100
+
 /* Year range for data matching */
 #define MIN_YEAR 1901
 #define MAX_YEAR 2050
@@ -1574,7 +1577,8 @@ double ZxcvbnMatch(const char *Pwd, const char *UserDict[], ZxcMatch_t **Info)
     ZxcMatch_t *Zp;
     Node_t *Np;
     double e;
-    int Len = strlen(Pwd);
+    const int StrLen = strlen(Pwd);
+    const int Len = StrLen < MAX_DETAIL_LEN ? StrLen : MAX_DETAIL_LEN;
     const uint8_t *Passwd = (const uint8_t *)Pwd;
     uint8_t *RevPwd;
     /* Create the paths */
@@ -1704,6 +1708,11 @@ double ZxcvbnMatch(const char *Pwd, const char *UserDict[], ZxcMatch_t **Info)
     /* Make e hold entropy result and adjust to log base 2 */
     e = Nodes[Len].Dist / log(2.0);
 
+    /* guesstimate remainder by assuming the start is representative */
+    if (Len < StrLen) {
+        e *= StrLen / (double)Len;
+    }
+
     if (Info)
     {
         /* Construct info on password parts */

@tsyrogit
Copy link
Owner

tsyrogit commented Apr 7, 2022

I have implemented a character limit of 100 on the entropy calculation as you suggested, but using a different method. There doesn't seem to be one correct way of handling the excess length, I chose to include an entropy figure for them based on (password length) - 100. This allowed the test cases to pass without warning of incorrect lengths and recognizes that a 500 character password has more entropy than its first 100 characters (although the figure for the last 400 characters would be low).
On my computer the calculation time for a 4000 character password went from a few seconds to a few Milli-seconds.

@tsyrogit tsyrogit closed this as completed Apr 7, 2022
@smason
Copy link
Author

smason commented Apr 7, 2022

0745d23 looks like you're putting more effort into giving an estimate than I'd bother with. any secret with more than a few hundred bits of entry is beyond any feasible brute forcing, but I guess it's nice to give some sort of number back

thanks!

@droidmonkey
Copy link

Concur, can you shift your PR to just incorporate this latest version of zxcvbn. Although we do "prefer" the system's version of the library... which will backfire on Linux distros (the prime target for the fix). So more thought needs to be put into place on whether we decide to fully embed the code or allow the user to choose system or internal. I prefer the former.

@Tostino
Copy link

Tostino commented Jan 26, 2023

Glad to see this was fixed already in your implementation. I was issuing warnings to other implementations due to some reports of crafted password attacks from users of my library.

There is even an exploit script written to target libraries like this: https://github.com/twosixlabs/acsploit/blob/fd5602adf9f312482b8010abf6b4691f08929bc4/acsploit/exploits/passwords/zxcvbn.py

The length limitation seems to do alright in stopping things from getting out of hand, even with specially crafted passwords you can still only get on the order of hundreds of milliseconds to make a guess (with my implementation at least).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants