Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init: lib.{collect',flattenAttrs} #221608

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

init: lib.{collect',flattenAttrs} #221608

wants to merge 2 commits into from

Conversation

tie
Copy link
Member

@tie tie commented Mar 17, 2023

Description of changes

This change adds an advanced version of lib.collect that exposes the attribute path, and a function to flatten an attribute set. These are mostly inteded for use with Nix flakes where packages are a flat set of derivations. That is, this allows flakes to define packages (or other outputs) as a tree of derivations (or other types) and flatten them before exposing.

In particular, this is useful for flakes that want to expose cross-compiled variants of packages so the user can run, for example, nix build path:.#go-linux-amd64-v3 independent of the current system (as in packages.${system}). In this case, Go is used as an example since the build system has an excellent support for cross-compilation out-of-the-box (assuming Cgo is disabled).

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@tie tie requested review from edolstra and infinisil as code owners March 17, 2023 06:45
@tie tie force-pushed the collect-path branch 2 times, most recently from 3824ed4 to 1e24d95 Compare March 17, 2023 06:59
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Mar 17, 2023
@tie
Copy link
Member Author

tie commented Mar 17, 2023

I think I can improve the implementation in a few places, so marking this as a draft for now.

@tie tie marked this pull request as draft March 17, 2023 15:49
tie added 2 commits March 17, 2023 19:47
This change adds lib.collect' function, an advanced version of
lib.collect, that exposes both attribute path and value for the
predicate.
This change adds lib.flattenAttrs function that flattens an attribute
set. Along with lib.collect', it is mostly inteded for use with Nix
flakes where packages must be a flat attribute set of derivations. That
is, this allows flakes to define packages (or other outputs) as a tree
of derivations (or other types) and flatten them before exposing.

In particular, this is useful for flakes that want to expose
cross-compiled variants of packages so the user can run, for example,
`nix build path:.#go-linux-amd64-v3` independent of the current system
(as in `packages.${system}`). In this case, Go is used as an example
since the build system has an excellent support for cross-compilation
out-of-the-box (assuming Cgo is disabled).
@tie
Copy link
Member Author

tie commented Mar 17, 2023

Switched from using foldlAttrs to concatMap, and collect'’s f argument returns the value directly instead of a list.

@tie tie marked this pull request as ready for review March 17, 2023 16:57
@tie
Copy link
Member Author

tie commented Mar 22, 2023

@roberth, you’ve recently reviewed a similar PR #218776, perhaps you can take a look at this too?

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have only really reviewed flattenAttrs so far.

collect' seems to be a tree traversal. Some of the same suggestions may apply.
Would it make sense to specify the order (pre / in / post, https://en.wikipedia.org/wiki/Tree_traversal) in the documentation?

if pred attrs then attrs
else
listToAttrs (map (x: nameValuePair (f x.path) x.value) (collect'
(_: v: pred v || !isAttrs v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the tree only contains the empty path, and the root isn't an attrset?
I think you could come up with this in the tests by thinking more about the possible inputs, besides the desired results.

Comment on lines +499 to +502
/* Flatten an attribute set `attrs` with nested attributes where leaf nodes
verify a given predicate named `pred`. That is, a flattened set will have
all leaf values at top level. Attribute paths are transformed to names
using the function `f`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Flatten an attribute set `attrs` with nested attributes where leaf nodes
verify a given predicate named `pred`. That is, a flattened set will have
all leaf values at top level. Attribute paths are transformed to names
using the function `f`.
/* Consider the final argument to be the root of a tree.
Every node that satisfies `isLeaf` or is not an attribute set is considered a leaf.
Derive an attribute name from each path using `f`, and return them as a single attribute set.

More declarative, less control flow.

Comment on lines +505 to +506
Using this function on attribute sets that refer to themselves will cause
infinite recursion if the predicate allows that.
Copy link
Member

@roberth roberth Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Using this function on attribute sets that refer to themselves will cause
infinite recursion if the predicate allows that.
If your input is cyclic or diverging, make sure that `isLeaf` only returns `true` for a finite part of your input.
The returned flat attribute set does not exhibit the laziness of the original input.
Specifically, `flattenAttrs` can not return before evaluating all leaves (except when `f` behaves as `x: true`, in which case `flattenAttrs f` is a convoluted attribute renaming function).

Diverging inputs also need a good isLeaf.
Might want to simplify my phrasing though; not sure.

@tie
Copy link
Member Author

tie commented Mar 24, 2023

Would it make sense to specify the order (pre / in / post, https://en.wikipedia.org/wiki/Tree_traversal) in the documentation?

This should be a depth-first pre-order traversal, just like the lib.attrsets.collect. Yes, makes perfect sense.

Should I also add docs for lib.attrsets.collect in a separate commit? Also, same for the infinite recursion warning.

@roberth
Copy link
Member

roberth commented Mar 24, 2023

That'd be great!

@tennox
Copy link
Contributor

tennox commented May 26, 2023

I found good use for this for my flattenVSCodeSettings lib function 😍

Would be great to have this in nixpkgs - then I might add that to the Wiki 🤔

@tie
Copy link
Member Author

tie commented Jun 3, 2023

@tennox, neat! I plan to revisit this PR on a vacation next week (June 12th).
Although I don’t want to dedicate all of my time to software development during vacation so I can’t guarantee anything 🥲

The implementation should be pretty solid, so I think what’s left are tests and docs for edge cases (see previous review comments).

@roberth
Copy link
Member

roberth commented Jun 4, 2023

I almost forgot about my PR

flattenAttrs seems similar to joinAttrsRecursive.

I've updated it to document the strictness issues, at least for derivations, where that's not insignificant. In general I wouldn't recommend creating trees if you need to flatten them anyway.
You can turn a flat attrset with separator characters in the names into a tree without evaluating any of the leaves (which are just attribute values), but if you want to do the opposite, you're going to have to evaluate the leaves because they're mixed in with your non-leaf nodes.

@roberth roberth mentioned this pull request Jun 4, 2023
7 tasks
@roberth roberth mentioned this pull request Jun 14, 2023
13 tasks
@infinisil
Copy link
Member

infinisil commented Jun 14, 2023

Another implementation of pretty much the same thing is in #234615 here

@roberth roberth added the 6.topic: lib The Nixpkgs function library label Jul 8, 2023
@wegank wegank added 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 2.status: merge conflict This PR has merge conflicts with the target branch labels Mar 19, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@@ -564,6 +564,115 @@ runTests {
};
};

# code from the example
testCollect'Example = let
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
testCollect'Example = let
testCollectPrimeExample = let

Since your test seems to be the first one that tests a function ending in "prime", following test contributions might the naming convention.

These are a little awkward to type:

mapAttrs'Example
overrideScope'Example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a PrimeExample of an unnecessary rule that may lead to more discomfort rather than less.

To be fair, I wouldn't actually call it a prime example, but I couldn't resist the pun, and it actually makes a point!
However, test names are a bit inconsequential.

@@ -444,6 +444,105 @@ rec {
else
[];

/* Recursively collect values produced by function `f` that verify a given
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the format to commonmark. This PR is open for so long... conventions beeing changed in master branch already.

https://github.com/NixOS/nixpkgs/blob/master/lib/attrsets.nix#L65

Suggested change
/* Recursively collect values produced by function `f` that verify a given
/** Recursively collect values produced by function `f` that verify a given

when the predicate is verified.

If `pred` verifies the given `attrs` set, the result is a list with a
single element `f [ ] attrs`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of recursion? Looks like it traverse the leafs first then going up? Sometimes this can be important to know.

{ path = [ ]; value = attrs; }
];
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you expect:

collect' (_: v: !isAttrs v) (path: value: { inherit path value; }) { a = 1; b = {}; }

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mostly inteded for use with Nix flakes

The motivating use case is a sneaky performance/robustness hazard, by evaluating all packages instead of just the one that's requested, so we probably shouldn't do this, and instead advise users on how to achieve the same result efficiently, for example by using a nesting of concatMapAttrs to create their packages attrsets.

I am very sorry about this, but this is actually a hard problem to solve because this is a fundamental property of "attribute trees" that people want to use, and we can hardly blame them.

@roberth
Copy link
Member

roberth commented Jun 28, 2024

Note that I'm not opposed to the addition of these functions, but we should make it very very clear that they should be used either

  • on data that is consumed in whole
  • as a last resort.

@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Nov 15, 2024
@hsjobeki
Copy link
Contributor

hsjobeki commented Nov 15, 2024

@tie can we split this up. I'd like to have separate discussions about collect' which i feel could be less controversial to merge right away.

@hsjobeki
Copy link
Contributor

Would it make sense to specify the order (pre / in / post, https://en.wikipedia.org/wiki/Tree_traversal) in the documentation?

This should be a depth-first pre-order traversal, just like the lib.attrsets.collect. Yes, makes perfect sense.

Should I also add docs for lib.attrsets.collect in a separate commit? Also, same for the infinite recursion warning.

collect' is not a depth first function. As soon as the value is collected it stops the recursion.
Which is better i think but should be correctly documented.

@roberth
Copy link
Member

roberth commented Nov 15, 2024

collect' is not a depth first function. As soon as the value is collected it stops the recursion.

A depth first algorithm can stop its traversal at any depth it deems sufficient. I don't think it implies that the whole tree or graph is always traversed in its entirety.
Depth first / breadth first would be a completely internal property if it wasn't for things like errors and side effects like builtins.trace, but I'm not sure that we need users to care about that. (iiuc you could make it breadth first by putting a (seq attrValues attrs [ (f path value) ]) instead of just [ (f path value) ], but I'm probably mistaken and it doesn't matter. We might even need more detailed terminology to describe such variations of the implementation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: lib The Nixpkgs function library 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants