-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/strings: optimise hasInfix function #168175
Conversation
|
@pennae while |
@ajs124 of course, escaping them was implied. we do have an |
@pennae I've written up a second implementation using regexes. In my isolated test, the effect on CPU time is roughly the same, although when applied to nixpkgs this will be greater if the compiled regex can be reused. Memory usage is much lower now. |
Co-authored-by: pennae <[email protected]>
Nix caches compiled regexes in a lookup table.
Is this a common use case? |
So, how do we proceed here? Either approach seems worthwhile and better than what we currently have, but personally, I don't feel qualified to approve or disapprove of a specific implementation. |
based on how this function is used in nixpkgs at this day the regex approach seems better. regex caching does mean that large numbers of infixes increase memory usage, costing about 900 bytes per infix. it seems to be a reasonable tradeoff. a single regex costs about 10µs to compile while a single short substring costs about 0.4µs. the cost of regex compilation is a one-time overhead, iterating over substrings is not. would say the regex version is preferrable. |
In a coincidence I just encountered a need for lib.hasInfix and thought "hrm... i wonder if builtins.match can do better" and found this PR. Is there any reason not to merge? Possible issues:
otherwise is seems to be used fairly infrequently. @DavHau: you do a lot of parsing, string manipulation, and usage of regex, thoughts? |
since the regex input is (should be) fully escaped in this PR there shouldn't be any behavioural differences. might be worth checking that |
I use it on large amounts of small strings checking for a small number of infixes. I think a performance improvement would be great. |
Although I'm searching over a small number of large strings, I too only use a handful of infixes. So this won't cause issues for me.
Since the A small breaking change is that it's no longer possible to use |
This update breaks nixops:
|
Can you “show-trace” to help pinpoint the cause? I’d suspect this is when calling hasInfix on a derivation. Perhaps it used to be coerced during the equality check? And we should verify if
|
|
the regex-based version will also fail on paths (which are also used in user checks):
looks like we really ought to |
This should fix the issue mentioned here: NixOS/nixpkgs#168175 (comment)
This should fix the issue mentioned here: NixOS/nixpkgs#168175 (comment)
* lib/strings: optimise hasInfix function * lib/strings: optimise hasInfix further using regex * rstudio: call hasInfix with a string * lib/strings: remove let from hasInfix Co-authored-by: pennae <[email protected]> Co-authored-by: pennae <[email protected]>
This should fix the issue mentioned here: NixOS/nixpkgs#168175 (comment)
This should fix the issue mentioned here: NixOS/nixpkgs#168175 (comment)
Description of changes
Made the definition of
hasInfix
more efficient.I measured the improvement with
NIX_SHOW_STATS=1 NIX_COUNT_CALLS=1
for both the old and new definitions, searching for a 5 letter infix which was not present within a 10,000 character string. A summary of the results:Although the code is longer, I think this implementation is also easier to understand.
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)nixos/doc/manual/md-to-db.sh
to update generated release notes