Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib.lists.ifilter0: init #214021

Merged
merged 1 commit into from
Apr 22, 2024
Merged

lib.lists.ifilter0: init #214021

merged 1 commit into from
Apr 22, 2024

Conversation

infinisil
Copy link
Member

Description of changes

Originally part of #119286, credit for the smart O(n) implementation goes to @bb010g! I only rebased, improved readability and added comments and tests.

Things done
  • Wrote and ran tests successfully (nix-instantiate --eval --strict lib/tests/misc.nix)
  • Has documentation

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Feb 1, 2023
@layus
Copy link
Member

layus commented Feb 2, 2023

I wonder where the name ifilter0 was chosen instead of ifilter. Is it to leave ifilter for nix builtins, or does it come from a standard naming scheme ?

@infinisil
Copy link
Member Author

@layus Aligns with imap0. And why is that not imap itself? Because traditionally imap existed, but it started indexing from 1, which was counterintuitive, so imap was deprecated and imap0 and imap1 were introduced instead to make it obvious. While we could ignore that legacy and just go with ifilter indexed from 0 (which is the obvious choice for Nix), I figured it wouldn't be a problem to be more explicit about it.

@infinisil
Copy link
Member Author

On PM's @balsoft also asked why not just implement this as

pred: lst: map (a: a.el) (filter ({ idx, el }: pred idx el) (genList (idx: { inherit idx; el = elemAt lst idx; }) (length lst)))

Arguments against this are:

  • As noticed by @bb010g, The Nix implementation of filter might blow the stack for big lists. This should really be fixed in Nix though
  • It's probably slower because it uses more memory, since in addition to two intermediate list allocations (same for this PR), a lot of attribute sets have to be allocated, which is not the case for this PR.

Arguably this function should be added as a builtin anyways though, it's very useful and can't efficiently be implemented in Nix.

lib/lists.nix Outdated
Comment on lines 135 to 187
outputLength = foldl' (count: include:
if include
then count + 1
else count
) 0 includeList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is indeed more efficient/otherwise preferrable to length ∘ filter, maybe we should make it a function as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What function are you thinking of here? Because this code here counts the number of true elements, I don't think that has a lot of applications

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      count = predicate: foldl' (count: el:
        if predicate el
        then count + 1
        else count
      ) 0;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I think that's worthwhile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it already exists! Now using it instead :)

@roberth roberth added the 6.topic: lib The Nixpkgs function library label Jul 8, 2023
@infinisil infinisil requested a review from roberth November 14, 2023 08:26
@infinisil
Copy link
Member Author

This is very nice, I don't think there's a reason we wouldn't want this, it allows a more efficient unique, see #119286 and #239722

I'll merge this myself soon unless there's any substantial feedback

@bb010g
Copy link
Contributor

bb010g commented Nov 15, 2023

Thank you for cleaning this up and bringing it to a mergable state!

lib/lists.nix Outdated
@@ -164,6 +164,53 @@ rec {
*/
imap1 = f: list: genList (n: f (n + 1) (elemAt list n)) (length list);

/* Like `filter`, but with an index. O(n) complexity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Like `filter`, but with an index. O(n) complexity.
/* Like `filter`, but with an index. O(n) complexity. Strict in all elements (`a`) unless the predicate function is constant with regard to the elements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a stab at writing a bit more extensive docs: 7b497df

lib/lists.nix Outdated
@@ -164,6 +164,53 @@ rec {
*/
imap1 = f: list: genList (n: f (n + 1) (elemAt list n)) (length list);

/* Like `filter`, but with an index. O(n) complexity.

Type: filter :: (int -> a -> bool) -> [a] -> [a]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type: filter :: (int -> a -> bool) -> [a] -> [a]
Type: ifilter :: (int -> a -> bool) -> [a] -> [a]

Did we make them upper camel case to distinguish them from module system "types" and type variables? I think we should.

Suggested change
Type: filter :: (int -> a -> bool) -> [a] -> [a]
Type: ifilter :: (Int -> a -> Bool) -> [a] -> [a]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of random, I tried to match the style from the surrounding functions though 🤷

lib/lists.nix Outdated
# given starting index
nextIncludedInputIndex = i:
if elemAt includeList i then i
else nextIncludedInputIndex (i + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will overflow the stack for sufficiently large sequences of skipped elements.

Copy link
Member Author

@infinisil infinisil Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that is a good point, and while an implementation like map . filter . imap would also stack overflow because of filter, it wouldn't overflow as quickly. In some quick testing, the version here can already stack overflow for 100'000 elements, while a map . filter . imap-based implementation can handle up to 8'000'000 elements.

Furthermore, doing some benchmarking, a map . filter . imap-based implementation even uses less memory1 (though it's close), it's also much less code..

As neat as the current implementation is, unfortunately I don't think it makes sense to use it 😅

I'll change it, but as an honorary mention I'll leave the neat implementation in the parent commit!

Footnotes

  1. NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=stats.json nix-instantiate --eval --strict --expr 'with import ./lib; ifilter0 (i: x: i == 0 || x == 10000 - 1) (genList (i: i) 10000)'

Copy link
Member Author

@infinisil infinisil Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a warning to mention that stack overflow can happen: (diff)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overflow should be fixed by NixOS/nix#7348

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If map ({ v, ... }: v) (filter ({ i, v }: ipred i v) (imap0 (i: v: { inherit i v; }) input)) benches better with huge lists and can handle larger lists (yay that the VLA problem is resolved!), I'm all for using that. What's the change in speed for a microbenchmark between that and the old ifilter0, given lists at the maximum supported sizes for both functions (100_000 elements and 8_000_000 elements)?

Copy link
Member Author

@infinisil infinisil Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly comparable with different numbers, but

implementation\count 90_000 8_000_000
map . filter . imap0 4.9ms (40.4MB) 4328.8ms (3.58GB)
index-based 5.9ms (44.7MB) (stack overflow)
@thufschmitt's 4.2ms (30.3MB) 3240.2ms (2.69GB)

The last one is a faster version that @thufschmitt came up with on the spot!

pred: lst:
map (idx: elemAt lst idx) (
  filter (idx: pred idx (elemAt lst idx)) (
    genList (x: x) (length lst)
  )
)

Copy link
Contributor

@bb010g bb010g Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at that more optimized implementation, I'd be inclined to offer an ifilterMap0 of

pred: f: lst:
map (idx: f idx (elemAt lst idx)) (
  filter (idx: pred idx (elemAt lst idx)) (
    genList (idx: idx) (length lst)
  )
)

that allows avoiding the cost of an extra map (i.e. an extra iteration) over large lists. (I don't know whether pred or f should be the first argument here.) From there, ifilter0 is:

pred: ifilterMap0 pred (idx: elem: elem)

If we were feeling spicy, ifilterMap0 would be the partner of iconcatMap0:

f: lst:
concatMap (idx: f idx (elemAt lst idx)) (
  genList (idx: idx) (length lst)
)

A less efficient ifilterMap0 would then be:

f: pred: iconcatMap0 (idx: elem:
  if pred idx elem then [ (f idx elem) ] else [ ]
)

Though, to risk this tangent a bit further, iconcatMap0 could also be

f: lst:
concat (
  genList (idx: f idx (elemAt lst idx)) (length lst)
)

due to the nature of genList. That definition is equivalent to

f: lst:
concat (imap0 f lst))

and this overall leaves be feeling a little strange about the ifilterMap0 implementation. I believe that we're stuck with explicit genList there due to the cheaper cost of filter compared to concat / concatMap over lists of either empty or singleton lists, but I haven't benched any of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input, nice ideas! I'd say let's keep it at the existing ifilter0 for now and only come back to this if/when we need more functionality :)

@bb010g
Copy link
Contributor

bb010g commented Nov 21, 2023

Note that NixOS/nix#9399 reverts NixOS/nix#7348, at least for now and probably until after the next stable release (Nix v2.19.0).

@bb010g
Copy link
Contributor

bb010g commented Mar 2, 2024

NixOS/nix#9430 addressed stack overflows in primops and has shipped in Nix 2.20.

@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Mar 20, 2024
@infinisil
Copy link
Member Author

I now updated this PR to use the implementation from #214021 (comment), removed the note on stack overflows and updated the docs to fix the new conventions.

This is long overdue and has gotten enough feedback already, so I'll merge this myself fairly soon. We can make further improvements in future PRs :)

@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Apr 22, 2024
@infinisil infinisil merged commit 24af4c0 into NixOS:master Apr 22, 2024
21 checks passed
@infinisil infinisil deleted the ifilter0 branch April 22, 2024 20:18
@eclairevoyant eclairevoyant mentioned this pull request May 28, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: lib The Nixpkgs function library 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants