Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib.lists.allUnique: init #239722

Merged
merged 1 commit into from
Nov 15, 2023
Merged

lib.lists.allUnique: init #239722

merged 1 commit into from
Nov 15, 2023

Conversation

Stunkymonkey
Copy link
Contributor

@Stunkymonkey Stunkymonkey commented Jun 25, 2023

Description of changes

The intention is to write easier and better assert like check port-allocation, subdomains, ...

there have been a suggestions in the matrix channel to expose countValues to other functions.
I think this is not a good idea, because the key is converted to a string, which makes it not really reuseable.
Except for assert messages maybe.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@Stunkymonkey Stunkymonkey force-pushed the lib-allUnique branch 4 times, most recently from 84be328 to 314575e Compare June 25, 2023 13:21
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Jun 25, 2023
lib/lists.nix Outdated Show resolved Hide resolved
@Stunkymonkey Stunkymonkey force-pushed the lib-allUnique branch 5 times, most recently from 734f897 to ffa0fd7 Compare June 25, 2023 15:43
@piegamesde
Copy link
Member

I must admit that I'm a bit confused by what the allUniqueBy function does, and its documentation didn't help me much. Also why does allUnique use <, what happens if you call it with non-numbers?

@Stunkymonkey
Copy link
Contributor Author

@pennae are you fine in only providing the allUnique function? the allUniqueBy seems unclear. shure we could filter things, but maybe its better to to this beforehand as it becomes more clear.

@pennae
Copy link
Contributor

pennae commented Jul 2, 2023

@pennae are you fine in only providing the allUnique function?

not so much. almost all other sort-like list operations have a generic variant (the notable exception being naturalSort, which is explicitly specific about what that means), and hard-coding lessThan into the sort does nothing useful on its own--it only (unnecessarily) limits which lists the function can work on.

the allUniqueBy seems unclear. shure we could filter things, but maybe its better to to this beforehand as it becomes more clear.

not sure how filtering even figures here. allByUnique just doesn't hard-code the sort predicate?

@Stunkymonkey
Copy link
Contributor Author

@pennae
Having allUniqueBy and sort by the predicate seems a bit odd for me. Can you explain to me what you want to achieve with this?
Why do you want to sort differently and then check if there is equality in subsequent elements?

If we would filter the list and only look for uniqueness in a subset of the list this makes sense to me. But sorting based on a predicate and then check if the previous is equal does not really make sense to me.

fyi: my main motivation is to have the function allUnique.

@pennae
Copy link
Contributor

pennae commented Jul 4, 2023

Having allUniqueBy and sort by the predicate seems a bit odd for me. Can you explain to me what you want to achieve with this?

genericism.

Why do you want to sort differently and then check if there is equality in subsequent elements?

not sure we understand the question, but "list contains duplicates" and "two adjacent equal elements in sort order exist" are the same thing.

If we would filter the list and only look for uniqueness in a subset of the list this makes sense to me. But sorting based on a predicate and then check if the previous is equal does not really make sense to me.

if you filter anything at all you can no longer check uniqueness though? filtering a list literally means dropping things from it, invalidating the entire idea of checking the original for any kind of uniqueness?

fyi: my main motivation is to have the function allUnique.

and we don't want to stop that function existing. but limiting it to things that are directly <-comparable is short-sighted and will trip people up when used on eg listOf (enum [ 1 2 "open" ])

@Stunkymonkey Stunkymonkey marked this pull request as ready for review July 4, 2023 20:31
@roberth roberth added the 6.topic: lib The Nixpkgs function library label Jul 8, 2023
@Stunkymonkey Stunkymonkey requested a review from pennae July 9, 2023 21:20
@Stunkymonkey
Copy link
Contributor Author

how are we going to continue here?

@infinisil
Copy link
Member

infinisil commented Jul 24, 2023

We shouldn't merge functions that have no use case. If allUnique exists, it satisfies @Stunkymonkey's use case, and we don't know of any other use cases, so allUniqueBy would just not get used at all.

So I don't think we should have allUniqueBy here now. We can still introduce it later in case somebody asks for it with a use case.

I also want to point out that #119286 also tried to introduce a fastUnique :: (a -> a -> bool) -> [a] -> [a], though it didn't get a lot of feedback. Notably that PR's only use case for fastUnique also just used lessThan for the comparison.


I do however think that we need a better name. Currently we only have

  • lib.lists.unique :: [ a ] -> [ a ]: slow but works on items that aren't comparable

The allUnique from this PR is:

  • lib.lists.allUnique :: [ a ] -> bool: fast but doesn't work on items that aren't comparable

This would then lead to an inconsistent library where allUnique is different from unique and then checking if the result is the same.

So instead we should probably have functions like these:

  • lib.lists.unique :: [ a ] -> [ a ]: slow but works on items that aren't comparable
  • lib.lists.allUnique :: [ a ] -> bool: slow but works on items that aren't comparable
  • lib.lists.fastUnique :: [ a ] -> [ a ]: fast but doesn't work on items that aren't comparable
  • lib.lists.fastAllUnique :: [ a ] -> bool: fast but doesn't work on items that aren't comparable

We could also deprecate unique and go for something like:

  • lib.lists.slowUnique :: [ a ] -> [ a ]: slow but works on items that aren't comparable
  • lib.lists.slowAllUnique :: [ a ] -> bool: slow but works on items that aren't comparable
  • lib.lists.fastUnique :: [ a ] -> [ a ]: fast but doesn't work on items that aren't comparable
  • lib.lists.fastAllUnique :: [ a ] -> bool: fast but doesn't work on items that aren't comparable

And then you could also think of how the By variants fit nicely into this schema

@Stunkymonkey
Copy link
Contributor Author

ping @bb010g for continuing the discussion.

@infinisil I am not a huge fan of naming functions according to their runtime, because no function is named like this yet. What do you think about:

  • lib.lists.unique :: [ a ] -> [ a ]: slow but works on items that aren't comparable
  • lib.lists.allUnique :: [ a ] -> bool: slow but works on items that aren't comparable
  • lib.lists.comparableUnique :: [ a ] -> [ a ]: fast but doesn't work on items that aren't comparable
  • lib.lists.comparableAllUnique :: [ a ] -> bool: fast but doesn't work on items that aren't comparable

any thoughts?

@infinisil
Copy link
Member

I like the idea, how about:

  • lib.lists.sortUnique :: [ a ] -> [ a ]: fast but doesn't work on items that aren't comparable

  • lib.lists.sortAllUnique :: [ a ] -> bool: fast but doesn't work on items that aren't comparable

This indicates that "It first sorts it, then does makes elements unique".

Although then we run into another inconsistency: lib.sort :: (a -> a -> Bool) -> [ a ] -> [ a ]. I think it would make sense to rename lib.sort to lib.sortBy though, so that lib.sort could be used for the version using lessThan by default. This could be a future change.

So, I think sort[All]Unique would still be fine.

@Stunkymonkey Stunkymonkey changed the title lib.lists.allUnique: init lib.lists.sort[All]Unique: init Jul 25, 2023
lib/lists.nix Outdated
*/
sortUnique = lst:
let sortedList = lib.sort builtins.lessThan lst; in
(lib.foldl' (acc: e: {r = if e != acc.last then acc.r ++ [ e ] else acc.r; last = e;}) {r = []; last = (dummy: 0);} sortedList).r;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infinisil not sure if this is the perfect way... This is the first idea that came to my mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like findFirst id false (zipListsWith (fst: snd: fst != snd) sortedList (drop 1 sortedList)) might be nicer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but I am unable to get this to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm you're right this doesn't work. And refactoring your implementation I'm noticing that there's a ++ [ e ] in a loop, which actually makes this still an $\mathcal{O}(n^2)$ implementation. I think we should have #214021 first so this can be nicer. I'll push that PR a bit

@roberth
Copy link
Member

roberth commented Jul 26, 2023

All a user needs to know is which comparison method is used, and that implementations that are restricted to == are significantly slower; e.g. O(n²).
For this reason the < should be the implicit default in the naming scheme; for example:

  • sortBy : (a -> b) -> [a] -> [a] where b supports <
  • sortByEq : (a -> b) -> [a] -> [a] where b supports ==

However, these functions seem to be comparisons. allUnique seems like a good name, because of the all function. Unique is a good signal for functions that don't really sort, except perhaps as an implementation detail. If it's called sort, I'd expect a sorted list to be returned.

  • allUnique
  • allUniqueEq (perhaps not even necessary at this point)

@piegamesde
Copy link
Member

I agree with your point, however usually the more specific method with the more specific name is expected to have better performance. So maybe sortByOrd would be preferable here to counter that.

How difficult would it be to have only one function which automatically does the right thing depending on the data? I don't know how expensive a O(n) data walk to check is in practice, but maybe one could implement it such that it starts with sorting and falls back to the other as soon as it encounters non-comparable data?

@roberth
Copy link
Member

roberth commented Jul 26, 2023

I agree with your point, however usually the more specific method with the more specific name is expected to have better performance.

I have the opposite expectation, so I guess we better make it explicit then; both Eq and Ord signifiers.
I'd expect the best function to have the short name, not necessarily the most general one.

falls back to the other as soon as it encounters non-comparable data

Not sure what type would actually need == instead of <. Perhaps functions, but comparing structures with functions should be avoided because of a bad legacy behavior. (And the theoretical argument about function comparison also applies)
I expect the Eq case to be very rare.

@infinisil
Copy link
Member

If it's called sort, I'd expect a sorted list to be returned.

A version of unique that first sorts items and then makes them unique, will return a sorted result.

allUnique seems like a good name

As mentioned earlier, we already have lib.unique that only uses == (and is therefore slower). It would be odd for these two functions to behave differently. In particular the implementation of allUnique should be equivalent to

allUnique = list:
  list == unique list

Which wouldn't be the case.

@Stunkymonkey
Copy link
Contributor Author

so you all agree to the naming of the functions? or are there any objections?

@infinisil
Copy link
Member

I'd like @roberth to reconsider #239722 (comment). And in any case, it's better to have more verbose names first, we can still introduce shorter aliases later if it becomes clear how they should behave.

@Stunkymonkey
Copy link
Contributor Author

reminder @roberth .

@Stunkymonkey
Copy link
Contributor Author

any chance this will get merged into 23.11?

@infinisil
Copy link
Member

Suggestion:

  • lib.lists.sortUnique: Sort the list and limits it to unique elements.
  • lib.lists.sortedAllUnique: Whether all elements in the sorted list are unique.

That avoids @roberth's concern regarding sort meaning to return a sorted list in #239722 (comment)

lib/lists.nix Outdated Show resolved Hide resolved
lib/lists.nix Outdated Show resolved Hide resolved
lib/lists.nix Outdated Show resolved Hide resolved
lib/lists.nix Outdated Show resolved Hide resolved
lib/lists.nix Outdated
*/
sortUnique = lst:
let sortedList = lib.sort builtins.lessThan lst; in
(lib.foldl' (acc: e: {r = if e != acc.last then acc.r ++ [ e ] else acc.r; last = e;}) {r = []; last = (dummy: 0);} sortedList).r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like findFirst id false (zipListsWith (fst: snd: fst != snd) sortedList (drop 1 sortedList)) might be nicer

@Stunkymonkey Stunkymonkey force-pushed the lib-allUnique branch 2 times, most recently from acdb8e3 to 5abae15 Compare October 18, 2023 21:07
lib/lists.nix Outdated Show resolved Hide resolved
@infinisil
Copy link
Member

infinisil commented Nov 14, 2023

Sorry for all of the different opinions and the long discussions @Stunkymonkey. Since I'm not quite sure about the sort variants (see #239722 (comment)), and you really only needed allUnique anyways, how about just limiting this PR to just allUnique. That's gonna be consistent with unique as I mentioned here. I could merge it then, and we can have the discussion on the sorted variants in another PR. Would that be okay with you?

@Stunkymonkey
Copy link
Contributor Author

thanks @infinisil for pointing this out. Small steps is better then no steps at all.

I just remove the unneeded code. Please have a look.

@Stunkymonkey Stunkymonkey changed the title lib.lists.sort[All]Unique: init lib.lists.allUnique: init Nov 14, 2023
Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@infinisil infinisil merged commit b04b7d6 into NixOS:master Nov 15, 2023
7 checks passed
@Stunkymonkey Stunkymonkey deleted the lib-allUnique branch November 22, 2023 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: lib The Nixpkgs function library 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants