-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib.lists.ifilter0: init #214021
lib.lists.ifilter0: init #214021
Conversation
I wonder where the name ifilter0 was chosen instead of ifilter. Is it to leave ifilter for nix builtins, or does it come from a standard naming scheme ? |
@layus Aligns with |
On PM's @balsoft also asked why not just implement this as pred: lst: map (a: a.el) (filter ({ idx, el }: pred idx el) (genList (idx: { inherit idx; el = elemAt lst idx; }) (length lst))) Arguments against this are:
Arguably this function should be added as a |
lib/lists.nix
Outdated
outputLength = foldl' (count: include: | ||
if include | ||
then count + 1 | ||
else count | ||
) 0 includeList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is indeed more efficient/otherwise preferrable to length ∘ filter
, maybe we should make it a function as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What function are you thinking of here? Because this code here counts the number of true
elements, I don't think that has a lot of applications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count = predicate: foldl' (count: el:
if predicate el
then count + 1
else count
) 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I think that's worthwhile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out it already exists! Now using it instead :)
Thank you for cleaning this up and bringing it to a mergable state! |
lib/lists.nix
Outdated
@@ -164,6 +164,53 @@ rec { | |||
*/ | |||
imap1 = f: list: genList (n: f (n + 1) (elemAt list n)) (length list); | |||
|
|||
/* Like `filter`, but with an index. O(n) complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Like `filter`, but with an index. O(n) complexity. | |
/* Like `filter`, but with an index. O(n) complexity. Strict in all elements (`a`) unless the predicate function is constant with regard to the elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a stab at writing a bit more extensive docs: 7b497df
lib/lists.nix
Outdated
@@ -164,6 +164,53 @@ rec { | |||
*/ | |||
imap1 = f: list: genList (n: f (n + 1) (elemAt list n)) (length list); | |||
|
|||
/* Like `filter`, but with an index. O(n) complexity. | |||
|
|||
Type: filter :: (int -> a -> bool) -> [a] -> [a] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type: filter :: (int -> a -> bool) -> [a] -> [a] | |
Type: ifilter :: (int -> a -> bool) -> [a] -> [a] |
Did we make them upper camel case to distinguish them from module system "types" and type variables? I think we should.
Type: filter :: (int -> a -> bool) -> [a] -> [a] | |
Type: ifilter :: (Int -> a -> Bool) -> [a] -> [a] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of random, I tried to match the style from the surrounding functions though 🤷
lib/lists.nix
Outdated
# given starting index | ||
nextIncludedInputIndex = i: | ||
if elemAt includeList i then i | ||
else nextIncludedInputIndex (i + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will overflow the stack for sufficiently large sequences of skipped elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that is a good point, and while an implementation like map . filter . imap
would also stack overflow because of filter
, it wouldn't overflow as quickly. In some quick testing, the version here can already stack overflow for 100'000 elements, while a map . filter . imap
-based implementation can handle up to 8'000'000 elements.
Furthermore, doing some benchmarking, a map . filter . imap
-based implementation even uses less memory1 (though it's close), it's also much less code..
As neat as the current implementation is, unfortunately I don't think it makes sense to use it 😅
I'll change it, but as an honorary mention I'll leave the neat implementation in the parent commit!
Footnotes
-
NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=stats.json nix-instantiate --eval --strict --expr 'with import ./lib; ifilter0 (i: x: i == 0 || x == 10000 - 1) (genList (i: i) 10000)'
↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added a warning to mention that stack overflow can happen: (diff)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overflow should be fixed by NixOS/nix#7348
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If map ({ v, ... }: v) (filter ({ i, v }: ipred i v) (imap0 (i: v: { inherit i v; }) input))
benches better with huge lists and can handle larger lists (yay that the VLA problem is resolved!), I'm all for using that. What's the change in speed for a microbenchmark between that and the old ifilter0
, given lists at the maximum supported sizes for both functions (100_000 elements and 8_000_000 elements)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly comparable with different numbers, but
implementation\count | 90_000 | 8_000_000 |
---|---|---|
map . filter . imap0 |
4.9ms (40.4MB) | 4328.8ms (3.58GB) |
index-based | 5.9ms (44.7MB) | (stack overflow) |
@thufschmitt's | 4.2ms (30.3MB) | 3240.2ms (2.69GB) |
The last one is a faster version that @thufschmitt came up with on the spot!
pred: lst:
map (idx: elemAt lst idx) (
filter (idx: pred idx (elemAt lst idx)) (
genList (x: x) (length lst)
)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at that more optimized implementation, I'd be inclined to offer an ifilterMap0
of
pred: f: lst:
map (idx: f idx (elemAt lst idx)) (
filter (idx: pred idx (elemAt lst idx)) (
genList (idx: idx) (length lst)
)
)
that allows avoiding the cost of an extra map (i.e. an extra iteration) over large lists. (I don't know whether pred
or f
should be the first argument here.) From there, ifilter0
is:
pred: ifilterMap0 pred (idx: elem: elem)
If we were feeling spicy, ifilterMap0
would be the partner of iconcatMap0
:
f: lst:
concatMap (idx: f idx (elemAt lst idx)) (
genList (idx: idx) (length lst)
)
A less efficient ifilterMap0
would then be:
f: pred: iconcatMap0 (idx: elem:
if pred idx elem then [ (f idx elem) ] else [ ]
)
Though, to risk this tangent a bit further, iconcatMap0
could also be
f: lst:
concat (
genList (idx: f idx (elemAt lst idx)) (length lst)
)
due to the nature of genList
. That definition is equivalent to
f: lst:
concat (imap0 f lst))
and this overall leaves be feeling a little strange about the ifilterMap0
implementation. I believe that we're stuck with explicit genList
there due to the cheaper cost of filter
compared to concat
/ concatMap
over lists of either empty or singleton lists, but I haven't benched any of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input, nice ideas! I'd say let's keep it at the existing ifilter0
for now and only come back to this if/when we need more functionality :)
Note that NixOS/nix#9399 reverts NixOS/nix#7348, at least for now and probably until after the next stable release (Nix v2.19.0). |
NixOS/nix#9430 addressed stack overflows in primops and has shipped in Nix 2.20. |
I now updated this PR to use the implementation from #214021 (comment), removed the note on stack overflows and updated the docs to fix the new conventions. This is long overdue and has gotten enough feedback already, so I'll merge this myself fairly soon. We can make further improvements in future PRs :) |
Description of changes
Originally part of #119286, credit for the smart O(n) implementation goes to @bb010g! I only rebased, improved readability and added comments and tests.
Things done
nix-instantiate --eval --strict lib/tests/misc.nix
)