Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/lists: introduce ifilter0, uniqBy, uniq, and fastUnique #119286

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions lib/default.nix
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,11 @@ let
getLib getDev getMan chooseDevOutputs zipWithNames zip
recurseIntoAttrs dontRecurseIntoAttrs cartesianProductOfSets;
inherit (self.lists) singleton forEach foldr fold foldl foldl' imap0 imap1
concatMap flatten remove findSingle findFirst any all count
ifilter0 concatMap flatten remove findSingle findFirst any all count
optional optionals toList range partition zipListsWith zipLists
reverseList listDfs toposort sort naturalSort compareLists take
drop sublist last init crossLists unique intersectLists
subtractLists mutuallyExclusive groupBy groupBy';
drop sublist last init crossLists uniq uniqBy unique fastUnique
intersectLists subtractLists mutuallyExclusive groupBy groupBy';
inherit (self.strings) concatStrings concatMapStrings concatImapStrings
intersperse concatStringsSep concatMapStringsSep
concatImapStringsSep makeSearchPath makeSearchPathOutput
Expand Down
107 changes: 101 additions & 6 deletions lib/lists.nix
Original file line number Diff line number Diff line change
Expand Up @@ -95,25 +95,84 @@ rec {
*/
foldl' = builtins.foldl' or foldl;

/* Map with index starting from 0
/* Like `map`, but with an index. O(n) complexity.

Type: imap0 :: (int -> a -> b) -> [a] -> [b]

Example:
imap0 (i: v: "${v}-${toString i}") ["a" "b"]
=> [ "a-0" "b-1" ]
*/
imap0 = f: list: genList (n: f n (elemAt list n)) (length list);
imap0 = f: list:
let eAt = elemAt list; in genList (n: f n (eAt n)) (length list);

/* Map with index starting from 1
/* Same as `imap0`, but indices start from 1. O(n) complexity.

Type: imap1 :: (int -> a -> b) -> [a] -> [b]

Example:
imap1 (i: v: "${v}-${toString i}") ["a" "b"]
=> [ "a-1" "b-2" ]
*/
imap1 = f: list: genList (n: f (n + 1) (elemAt list n)) (length list);
imap1 = f: list:
let eAt = elemAt list; in genList (n: f (n + 1) (eAt n)) (length list);

/* Like `filter`, but with an index. O(n) complexity.

Note that `ifilter0` does not stack allocate any intermediate lists.

Type: filter :: (int -> a -> bool) -> [a] -> [a]

Example:
ifilter0 (i: v: i == 0 || v > 2) [ 1 2 3 ]
=> [ 1 3 ]
*/
ifilter0 =
ipred:
list:
let listLength = length list; in
if listLength <= 0 then list else
/* For a function `generator :: n -> a` and variable `count :: n`,
list `list :: [a]` of `list = genList generator count` memoizes
`generator` over the domain ((domain of `list`) union `range 0 count`).
Function `(elemAt list) :: n -> a` provides the same interface as
`generator`.
*/
let
# View `list` as a memoization.
# `nToE` maps `n` (a `list` index) to `e` (a `list` element). Memoized.
nToE = elemAt list;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining these elemAt functions doesn't do anything to memoize individual elements. You can inline this function to save a thunk allocation and some mental capacity. Same with nToKeep, keptNToNKept and eAt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elemAt list n doesn't allocate a thunk in the middle, despite being equivalent to (elemAt list) n?

Copy link
Member

@infinisil infinisil Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refreshing my past experience with the Nix evaluator, I created this wiki page to explain how thunks work in Nix: https://nixos.wiki/wiki/Nix_Evaluation_Performance#Thunks

# `nToKeep` maps `n` (a `list` index) to `keep` (whether a `list` element
# `e` should be kept, according to `ipred`). Memoized.
nToKeep = elemAt nToKeepList;
nToKeepList = genList (n: ipred n (nToE n)) listLength;
# `keptNToEKept` maps `keptN` (a `keptList` index) to `eKept` (a kept
# `list` element). Our final result is this memoization, viewed as a list.
keptList = keptNToEKeptList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also just inline this, no need to define a variable the same as another variable.

# The length of `keptList` will be the count of kept elements in `list`.
keptLength =
foldl' (count: keep: count + (if keep then 1 else 0)) 0 nToKeepList;
# `nToNKept` maps `n` (a `list` index) to `nKept` (the closest `list`
# index greater than or equal to `n` for which `keep` is true).
nToNKept = n: if nToKeep n then n else nToNKept (n + 1);
# `keptNToNUniq` maps `keptN` (a `keptList` index) to `nKept` (a `list`
# index with a kept element). Memoized.
keptNToNKept = elemAt keptNToNKeptList;
keptNToNKeptList = let keptNToNKeptUnmemoized = keptN:
# Base case. Begin searching for a kept element from the beginning.
if keptN == 0 then nToNKept 0 else let
# To get `keptN`'s `nKept`, we need the previous `keptN`'s `nKept`.
prevKeptN = keptN - 1;
# This recursion is cheap thanks to memoization.
prevNKept = keptNToNKept prevKeptN;
# Find the next `nKept`, not `prevNKept` again.
n = prevNKept + 1;
in nToNKept n
; in genList keptNToNKeptUnmemoized keptLength;
keptNToEKeptList = let keptNToEKeptUnmemoized = keptN:
nToE (keptNToNKept keptN)
; in genList keptNToEKeptUnmemoized keptLength;
Copy link
Member

@infinisil infinisil May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these variable names and comments make it easier or harder to understand!

How about renaming

  • nToKeepList -> keepOrNot
  • keptLength -> resultLength
  • nToNKept -> nextResultIndex
  • keptNToNKeptList -> resultIndices
  • keptNToEKeptList -> result

Also, how about adding some visuals:

Example: Filtering for only uppercase letters in [ "b" "A" "f" "B" "E" "c" ]

       unfiltered list indices: 0 1 2 3 4 5
      unfiltered list elements: b A f B E c
       include condition holds: - Y - Y Y -  -> result length is 3
                                               
         filtered list indices:   0   1 2
 unfiltered indices to include:   1   3 4
                                  ^   ^ ^
    traverse condition list to find   Continue traversing condition list after the
the first Y, return the index of it   previously included index to find the next Y
                                 
        filtered list elements:   A   B E

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea with the naming was to keep it clear what type of data you were dealing with by using consistent "conversion" functions, especially when different index styles are being processed.

Visuals in the documentation is a good idea.

in keptList;

/* Map and concatenate the result.

Expand Down Expand Up @@ -633,6 +692,31 @@ rec {
"lib.crossLists is deprecated, use lib.cartesianProductOfSets instead"
(f: foldl (fs: args: concatMap (f: map f args) fs) [f]);

/* Remove duplicate adjacent elements from the list, using the supplied
equality predicate. O(n) complexity.

Type: uniqBy :: (a -> a -> bool) -> [a] -> [a]

Example:
uniqBy (x: y: x == y) [ 1 2 2 3 1 1 1 3 1 1 2 ]
=> [ 1 2 3 1 3 1 2 ]
*/
uniqBy =
pred:
list:
let eAt = elemAt list; in
# The first element is never considered a duplicate.
ifilter0 (n: e: n == 0 || !(pred (eAt (n - 1)) e)) list;

/* Remove duplicate adjacent elements from the list. O(n) complexity.

Type: uniq :: [a] -> [a]

Example:
uniq [ 1 2 2 3 1 1 3 1 1 2 ]
=> [ 1 2 3 1 3 1 2 ]
*/
uniq = uniqBy (a: b: a == b);

/* Remove duplicate elements from the list. O(n^2) complexity.

Expand All @@ -641,8 +725,19 @@ rec {
Example:
unique [ 3 2 3 4 ]
=> [ 3 2 4 ]
*/
unique = foldl' (acc: e: if elem e acc then acc else acc ++ [ e ]) [];
*/
unique = foldl' (acc: e: if elem e acc then acc else acc ++ [ e ]) [];

/* Sort and remove duplicate elements from the list.
O(n) complexity on top of `sort`.

Type: fastUnique :: (a -> a -> bool) -> [a] -> [a]

Example:
fastUnique (a: b: a < b) [ 3 2 3 4 ]
=> [ 2 3 4 ]
*/
fastUnique = comparator: list: uniq (sort comparator list);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have a more generic form by implementing this with

comparator: list: uniqBy (a: b: ! comparator a b) (sort comparator list)

This works because uniqBy's predicate is only ever called on increasing list values, which are sorted already (aka a <= b). With the additional check ! a < b we can then ensure that they're equal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reminded of Rust's core::cmp::PartialEq & core::cmp::PartialOrd traits. Using inverted comparator as a uniqBy argument implies that, letting a < b = comparator a b, if a < b && !(b < a) then a == b. This is not the case if a and b are partially ordered by comparator and not totally ordered. Nix doesn't have NaN to make its floats not totally ordered, but the user-supplied data ordered by user-supplied comparator might not be totally ordered. However, this total ordering requirement can be disclosed in documentation.

I'm more worried about a user-supplied comparator being substantially slower than a user-supplied pred would be. If user-supplied data doesn't follow Nix == equality, then forcing explicit fallback to uniqBy pred (sort comparator list) (which isn't much longer) makes it clearer how uniqBy (a: b: !(comparator a b)) (sort comparator list) would repeatedly run a potentially-expensive comparator.

Expensive comparators might not be an issue in practice, though, and I defer here to those familiar with more of Nixpkgs.

Alternatively, fastUnique could be defined as fastUnique = list: uniq (sort lessThan list);, and we could force manual expansion for any other usage. I don't see much of a benefit to this though, given that builtins.sort takes a comparator. I'd like to mirror builtins.sort here. For comparison, builtins.elem does not take a pred, and uniq & unique reflect this. If the pattern of uniquniqBy was followed, there would be fastUniqueBy = pred: comparator: list: uniqBy pred (sort comparator list);. fastUniqueBy could shorten that earlier explicit expansion to fastUniqueBy (a: b: !(comparator a b)) comparator list, but I don't think defining this adds much?


/* Intersects list 'e' and another list. O(nm) complexity.

Expand Down
2 changes: 1 addition & 1 deletion nixos/modules/services/networking/firewall.nix
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ let
'';

canonicalizePortList =
ports: lib.unique (builtins.sort builtins.lessThan ports);
ports: lib.fastUnique builtins.lessThan ports;

commonOptions = {
allowedTCPPorts = mkOption {
Expand Down
2 changes: 1 addition & 1 deletion pkgs/tools/typesetting/tex/texlive/combine.nix
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ let
++ lib.optional (lib.any pkgNeedsRuby splitBin.wrong) ruby;
};

uniqueStrings = list: lib.sort (a: b: a < b) (lib.unique list);
uniqueStrings = list: lib.fastUnique (a: b: a < b) list;

mkUniqueOutPaths = pkgs: uniqueStrings
(map (p: p.outPath) (builtins.filter lib.isDerivation pkgs));
Expand Down