very slow on long (base64 encoded) strings #28

smason · 2022-03-29T17:18:48Z

this algorithm gets very slow on long strings hence the original upstream version recommends truncating long inputs to 100 characters, for example see dropbox/zxcvbn#69 for downstream projects running into issues

the specific issue I have is zxcvbn being used in keepassxc, and recently running into an issue that causes the UI to become unresponsive for a minute or so. this was due to another program instructing keepass to store multiple JSON Web Tokens and keepass using zxcvbn to calculate password quality for each one

these secrets (not really "passwords") are large JWTs; which are basically base64 encoded strings which take a long time for this package to check. for example, you can create a string via the following Python code:

base64.b64encode(random.randbytes(num_bytes))

running ZxcvbnMatch on this takes multiple seconds when run with num_bytes > 2000, with the reported entropy being some suitably large number

The text was updated successfully, but these errors were encountered:

smason · 2022-03-29T17:25:13Z

it seems better if this truncation was done in this library, but I'm not sure where's best.

when running on large inputs it seems to spend the majority of its time running the *Match() functions at the top then again at the end using Dijkstra to optimize the result.

I'm wondering whether it's enough to just assume the start is representative. i.e. just run it on the first 100 bytes, then multiply the entropy up by the remaining characters, e.g.:

diff --git a/zxcvbn.c b/zxcvbn.c
index ebe9e31..d929ff6 100644
--- a/zxcvbn.c
+++ b/zxcvbn.c
@@ -52,6 +52,9 @@
 /* Minimum number of characters in a incrementing/decrementing sequence match */
 #define MIN_SEQUENCE_LEN 3
 
+/* Maximum number of characters to perform full entropy calculation over */
+#define MAX_DETAIL_LEN 100
+
 /* Year range for data matching */
 #define MIN_YEAR 1901
 #define MAX_YEAR 2050
@@ -1574,7 +1577,8 @@ double ZxcvbnMatch(const char *Pwd, const char *UserDict[], ZxcMatch_t **Info)
     ZxcMatch_t *Zp;
     Node_t *Np;
     double e;
-    int Len = strlen(Pwd);
+    const int StrLen = strlen(Pwd);
+    const int Len = StrLen < MAX_DETAIL_LEN ? StrLen : MAX_DETAIL_LEN;
     const uint8_t *Passwd = (const uint8_t *)Pwd;
     uint8_t *RevPwd;
     /* Create the paths */
@@ -1704,6 +1708,11 @@ double ZxcvbnMatch(const char *Pwd, const char *UserDict[], ZxcMatch_t **Info)
     /* Make e hold entropy result and adjust to log base 2 */
     e = Nodes[Len].Dist / log(2.0);
 
+    /* guesstimate remainder by assuming the start is representative */
+    if (Len < StrLen) {
+        e *= StrLen / (double)Len;
+    }
+
     if (Info)
     {
         /* Construct info on password parts */

tsyrogit · 2022-04-07T17:50:22Z

I have implemented a character limit of 100 on the entropy calculation as you suggested, but using a different method. There doesn't seem to be one correct way of handling the excess length, I chose to include an entropy figure for them based on (password length) - 100. This allowed the test cases to pass without warning of incorrect lengths and recognizes that a 500 character password has more entropy than its first 100 characters (although the figure for the last 400 characters would be low).
On my computer the calculation time for a 4000 character password went from a few seconds to a few Milli-seconds.

smason · 2022-04-07T19:19:54Z

0745d23 looks like you're putting more effort into giving an estimate than I'd bother with. any secret with more than a few hundred bits of entry is beyond any feasible brute forcing, but I guess it's nice to give some sort of number back

thanks!

droidmonkey · 2022-04-07T19:30:18Z

Concur, can you shift your PR to just incorporate this latest version of zxcvbn. Although we do "prefer" the system's version of the library... which will backfire on Linux distros (the prime target for the fix). So more thought needs to be put into place on whether we decide to fully embed the code or allow the user to choose system or internal. I prefer the former.

Tostino · 2023-01-26T18:34:47Z

Glad to see this was fixed already in your implementation. I was issuing warnings to other implementations due to some reports of crafted password attacks from users of my library.

There is even an exploit script written to target libraries like this: https://github.com/twosixlabs/acsploit/blob/fd5602adf9f312482b8010abf6b4691f08929bc4/acsploit/exploits/passwords/zxcvbn.py

The length limitation seems to do alright in stopping things from getting out of hand, even with specially crafted passwords you can still only get on the order of hundreds of milliseconds to make a guess (with my implementation at least).

tsyrogit closed this as completed Apr 7, 2022

smason mentioned this issue Apr 7, 2022

Limit size of password sent to zxcvbn to prevent lock ups keepassxreboot/keepassxc#7712

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very slow on long (base64 encoded) strings #28

very slow on long (base64 encoded) strings #28

smason commented Mar 29, 2022

smason commented Mar 29, 2022

tsyrogit commented Apr 7, 2022

smason commented Apr 7, 2022

droidmonkey commented Apr 7, 2022

Tostino commented Jan 26, 2023

very slow on long (base64 encoded) strings #28

very slow on long (base64 encoded) strings #28

Comments

smason commented Mar 29, 2022

smason commented Mar 29, 2022

tsyrogit commented Apr 7, 2022

smason commented Apr 7, 2022

droidmonkey commented Apr 7, 2022

Tostino commented Jan 26, 2023