-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/lists: introduce ifilter0, uniqBy, uniq, and fastUnique #119286
base: master
Are you sure you want to change the base?
Conversation
I considered using /* Filter a list using a predicate; that is, return a list containing
every element from the list for which the predicate function
returns true. */
static void prim_filter(EvalState & state, const Pos & pos, Value * * args, Value & v)
{
state.forceFunction(*args[0], pos);
state.forceList(*args[1], pos);
// FIXME: putting this on the stack is risky.
Value * vs[args[1]->listSize()];
unsigned int k = 0; I figured not relying on it would be a better idea for robustly handling huge lists, which is where you really want to use |
lib/lists.nix
Outdated
genList (n: if n == 0 then 0 else go (uniqNAt (n - 1) + 1)) uniqLength; | ||
uniqNAt = elemAt uniqNList; | ||
in | ||
genList (n: listElemAt (uniqNAt n)) uniqLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function definitely needs some comments for how it works, but this is seriously impressive how you achieved O(n) like that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all the complexity here has been split out into the ifilter0
function, so you'll want to review the names & comments there.
fastUnique (a: b: a < b) [ 3 2 3 4 ] | ||
=> [ 2 3 4 ] | ||
*/ | ||
fastUnique = comparator: list: uniq (sort comparator list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have a more generic form by implementing this with
comparator: list: uniqBy (a: b: ! comparator a b) (sort comparator list)
This works because uniqBy
's predicate is only ever called on increasing list values, which are sorted already (aka a <= b
). With the additional check ! a < b
we can then ensure that they're equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reminded of Rust's core::cmp::PartialEq
& core::cmp::PartialOrd
traits. Using inverted comparator
as a uniqBy
argument implies that, letting a < b = comparator a b
, if a < b && !(b < a)
then a == b
. This is not the case if a
and b
are partially ordered by comparator
and not totally ordered. Nix doesn't have NaN to make its floats not totally ordered, but the user-supplied data ordered by user-supplied comparator
might not be totally ordered. However, this total ordering requirement can be disclosed in documentation.
I'm more worried about a user-supplied comparator
being substantially slower than a user-supplied pred
would be. If user-supplied data doesn't follow Nix ==
equality, then forcing explicit fallback to uniqBy pred (sort comparator list)
(which isn't much longer) makes it clearer how uniqBy (a: b: !(comparator a b)) (sort comparator list)
would repeatedly run a potentially-expensive comparator
.
Expensive comparator
s might not be an issue in practice, though, and I defer here to those familiar with more of Nixpkgs.
Alternatively, fastUnique
could be defined as fastUnique = list: uniq (sort lessThan list);
, and we could force manual expansion for any other usage. I don't see much of a benefit to this though, given that builtins.sort
takes a comparator
. I'd like to mirror builtins.sort
here. For comparison, builtins.elem
does not take a pred
, and uniq
& unique
reflect this. If the pattern of uniq
→ uniqBy
was followed, there would be fastUniqueBy = pred: comparator: list: uniqBy pred (sort comparator list);
. fastUniqueBy
could shorten that earlier explicit expansion to fastUniqueBy (a: b: !(comparator a b)) comparator list
, but I don't think defining this adds much?
An O(n) alternative to unique that only removes adjacent elements, but can also take an equality predicate. Named after the similar uniq(1).
A more efficient alternative to unique if sorted output is acceptable.
After starting to write a comment to this effect, explaining why |
let | ||
# View `list` as a memoization. | ||
# `nToE` maps `n` (a `list` index) to `e` (a `list` element). Memoized. | ||
nToE = elemAt list; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining these elemAt
functions doesn't do anything to memoize individual elements. You can inline this function to save a thunk allocation and some mental capacity. Same with nToKeep
, keptNToNKept
and eAt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elemAt list n
doesn't allocate a thunk in the middle, despite being equivalent to (elemAt list) n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refreshing my past experience with the Nix evaluator, I created this wiki page to explain how thunks work in Nix: https://nixos.wiki/wiki/Nix_Evaluation_Performance#Thunks
nToKeepList = genList (n: ipred n (nToE n)) listLength; | ||
# `keptNToEKept` maps `keptN` (a `keptList` index) to `eKept` (a kept | ||
# `list` element). Our final result is this memoization, viewed as a list. | ||
keptList = keptNToEKeptList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also just inline this, no need to define a variable the same as another variable.
; in genList keptNToNKeptUnmemoized keptLength; | ||
keptNToEKeptList = let keptNToEKeptUnmemoized = keptN: | ||
nToE (keptNToNKept keptN) | ||
; in genList keptNToEKeptUnmemoized keptLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these variable names and comments make it easier or harder to understand!
How about renaming
nToKeepList
->keepOrNot
keptLength
->resultLength
nToNKept
->nextResultIndex
keptNToNKeptList
->resultIndices
keptNToEKeptList
->result
Also, how about adding some visuals:
Example: Filtering for only uppercase letters in [ "b" "A" "f" "B" "E" "c" ]
unfiltered list indices: 0 1 2 3 4 5
unfiltered list elements: b A f B E c
include condition holds: - Y - Y Y - -> result length is 3
filtered list indices: 0 1 2
unfiltered indices to include: 1 3 4
^ ^ ^
traverse condition list to find Continue traversing condition list after the
the first Y, return the index of it previously included index to find the next Y
filtered list elements: A B E
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea with the naming was to keep it clear what type of data you were dealing with by using consistent "conversion" functions, especially when different index styles are being processed.
Visuals in the documentation is a good idea.
I marked this as stale due to inactivity. → More info |
I would really love to have at least the |
Motivation for this change
TeXLive
combine
was really slow (at least when I first wrote this patch).uniqBy
,uniq
, andfastUnique
are significantly more efficient alternatives to current standard library functions that should be used when possible.Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)This change is