Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib.attrsets: add flattenAttrs function #354874

Closed
wants to merge 1 commit into from

Conversation

sebaszv
Copy link

@sebaszv sebaszv commented Nov 9, 2024

Suggestions for how to better implement the functionality or what else to
name the function are welcome. If there is already an existing solution for this
use case that I've missed, please do let me know. If anything at all is missing or out of place, do mention, since this is my first time contributing to lib. Please do make sure to read Points of Consideration down below. Many thanks.

lib.attrsets.flattenAttrs: Flatten an attribute set to the bottom name and
value pairs as one singular final attribute set, stripping everything else
above.

In a few words, this does what lib.lists.flatten does, but for attribute
sets, rather than lists.

Attribute sets are useful for structuring code, with attribute names serving as
labels, rather than relying on comments.

One example would be when working defining environment.shellAliases.

environment.shellAliases =
  let
    aliases = {
      fd.f = "fd --hidden --no-ignore";

      lsd = {
        ls = {
          lc = "lsd --almost-all";

          lci = "lsd --almost-all --inode";

          lac = "lsd --all";
          lca = "lsd --all";

        };
        tree = {
          tc = "lsd --almost-all --ignore-glob .git --tree";

          tcd = "lsd --almost-all --ignore-glob .git --tree --depth";

          tci = "lsd --all --ignore-glob .git --inode --tree";
          tic = "lsd --all --ignore-glob .git --inode --tree";

          tcid = "lsd --all --ignore-glob .git --inode --tree --depth";
          ticd = "lsd --all --ignore-glob .git --inode --tree --depth";
        };
      };

      ripgrep = {
        r = "rg --glob='!.git'";
        rh = "rg --glob='!.git' --hidden";
        rhn = "rg --glob='!.git' --hidden --no-ignore";
        rhnn = "rg --hidden --no-ignore";
      };
    };
  in
    flattenAttrs aliases;

Here, the aliases are categorized by the binary that is being aliased. The lsd
set also has another subsection layer for ls and tree. Comments could be
used, but an in-code approach is better, especially when the attribute set is
large and not just a brief example like what is shown here. This allows for
inspection using Nix directly along with any other transformations that are
desired.

environment.shellAliases only takes one attribute set with direct string
name-value pairs. flattenAttrs will strip the "labels" and return only the
bottom level across the passed attribute set, being equivalent to:

environment.shellAliases = {
  # fd
  f = "fd --hidden --no-ignore";

  # lsd - ls
  lc = "lsd --almost-all";

  lci = "lsd --almost-all --inode";

  lac = "lsd --all";
  lca = "lsd --all";

  # lsd - tree
  tc = "lsd --almost-all --ignore-glob .git --tree";

  tcd = "lsd --almost-all --ignore-glob .git --tree --depth";

  tci = "lsd --all --ignore-glob .git --inode --tree";
  tic = "lsd --all --ignore-glob .git --inode --tree";

  tcid = "lsd --all --ignore-glob .git --inode --tree --depth";
  ticd = "lsd --all --ignore-glob .git --inode --tree --depth";

  # ripgrep
  r = "rg --glob='!.git'";
  rh = "rg --glob='!.git' --hidden";
  rhn = "rg --glob='!.git' --hidden --no-ignore";
  rhnn = "rg --hidden --no-ignore";
};

This brings the usefulness of lib.lists.flatten to attribute sets. This
doesn't conflict with nor modify any existing implementations, this is an
addition, so this change should be safe.

Points of Consideration

If there are duplicate names in the flattened attribute set, values will be lost since the last added value is favoured when merging the acc attribute set with x. Whatever is in x will override what was in acc. I'm not sure what the recommendation for handling this would be. Should the function be modified to compensate for this? Should unique values be enforced? Should the later added values have the name modified to preserve all additions? Should these possible variations be split into other function variations? Is a simple warning in the function definition documentation/comment to make known this behaviour sufficient generally?

I'm not sure what the best practice would be for handing this. Suggestions for ensuring this is implemented correctly in line with common practices and behaviour that other library functions follow are greatly appreciated. Particularly, since I am still new to the Nix ecosystem.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See
    Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using
    nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all
    changes have to be committed, also see
    nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in
    ./result/bin/)
  • 24.11 Release Notes
    (or backporting
    23.11
    and
    24.05
    Release notes)
    • (Package updates) Added a release notes entry if the change is major or
      breaking
    • (Module updates) Added a release notes entry if the change is
      significant
    • (Module addition) Added a release notes entry if adding a new NixOS
      module
    • Added a line mentioning the function addition to lib.attrsets
  • Fits
    CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: documentation This PR adds or changes documentation 8.has: changelog 6.topic: lib The Nixpkgs function library labels Nov 9, 2024
@nix-owners nix-owners bot requested a review from infinisil November 9, 2024 23:13
@NixOSInfra NixOSInfra added the 12. first-time contribution This PR is the author's first one; please be gentle! label Nov 9, 2024
@sebaszv sebaszv force-pushed the lib-add-flattenattrs branch 2 times, most recently from 8ca214d to f91c485 Compare November 10, 2024 00:05
lib/attrsets.nix Outdated Show resolved Hide resolved

:::
*/
flattenAttrs = attrs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should choose a different name.

When reading flattenAttrs i would think of something like:

    flattenAttrs { a.b.c.d = "e"; }
    => { "a.b.c.d" = "e"; }

It only flattens.

But your function removes information. Going in the other way with something like expandAttrs as the inverse of your function is impossible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a.b.c.d"

This still removes information, because concatStringsSep, the implied behavior, is not injective. Counterexample:

Both of the attribute paths a.b.c.d and "a.b".c.d map to the attribute name "a.b.c.d".

Injectivity is often not a requirement, but I think you're right that a flatten operation should generally preserve all information except the nesting structure.

How to do that here is not obvious, but that can be solved by adding a parameter to make that explicit.

I would suggest to make the type (List String -> String) -> Attrs -> Attrs, where the first parameter decides what to do with the attribute path.

Then at least the call sites have a hint what happens, and it makes the function more versatile.
Examples:

  • flattenAttrs (concatStringsSep ".") foo
  • flattenAttrs last foo

A rename to something like flattenAttrPaths would make it possible to guess the behavior without reading the docs, although the flattening happens to the attrsets, and not the paths, so that would be joinAttrPaths or joinAttrsRecursive to match another pattern that we have in lib.attrsets. (Maybe not joinAttrs, which I've proposed here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberth this is a duplicate implementation/feature-set of #221608

I think we should merge collect' which is the smallest common denominator here.

expr = flattenAttrs { };
expected = { };
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some test for conflicting bottom keys ? What would be the desired behavior?

Suggested change
flattenAttrs {
a.foo = "A";
b.foo = "B";
};

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good mention. This was mentioned in Points of Consideration. Seeing as the function itself needs some reimplementation, and even potentially some branching into some other function variations, I'll rework the test cases once that is done first. I will revisit this.

let
v = attrs.${n};
x =
if (lib.isAttrs v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to make the predicate configurable

Suggested change
if (lib.isAttrs v)
if (pred v)

Copy link
Author

@sebaszv sebaszv Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you recommend making this change or writing this? Have be a parameter? I have seen it around in some other functions, but am not too familiar. Sounds good to me.

Reading back, I believe this ties into the other comment left below with having an injectable function. Seeing as this is likely mostly going to be rewritten, I'll keep this in mind for that.


:::
*/
flattenAttrs = attrs:
Copy link
Contributor

@hsjobeki hsjobeki Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want that function but with paths having the option of beeing preserved or an injectable function that receives the full path and maps that to the new name in the output.

Copy link
Author

@sebaszv sebaszv Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First question would be: Should I convert this into a Draft, since this is incomplete and discussion is ongoing?

I have some idea of what you are describing. Could you provide some pseudo-example of what that would like look like, such as <function> <what-args> -> <expected-result>? Some brief example of what you "envision" would be helpful to know how to orient the reimplementation.

I suppose what I am looking for here is some sort of direction from someone more experienced here in the ecosystem. I get that you understand the use case that I am trying to cover, having a useful function that covers this. This being to "scrap" the leading attribute names, and just extracted the very bottom layer name-value pairs. As discussed, it isn't that simple, as information is being discarded and the behaviour around that, as well as the expectations, must be configurable in some manner, whether through parameters or as various function offerings.

As far as incorporating something to fit that in lib, what suggestions do you have? Any at all. I have some ideas, but am eager to hear what some rough concept that fits in with existing trends. Particularly, since this is my first time contributing here and there is some complexity here requiring some decisions to be made. I am fine with scrapping it entirely and starting over. I am not attached to this. This could potentially be broken into more than one function, such as having a prime version to have slightly different behaviour, like with mapAttrs and mapAttrs'. That or having configurable parameters, such as accepting some sort of function to dictate the behavior or do filtering.

Thanks. I'll work with some variations in meantime.

Copy link
Contributor

@hsjobeki hsjobeki Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function reminds me of collect.

It iterates recursively over attribute values. If the predicate is true it adds the mapped value to accumulator (a list)

Its a mixture of mapAttrsRecursiveCond and collect

We have this open PR #221608 that adds collect' which implements what you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the predicate is true it adds the mapped value to accumulator (a list)

Seems that it could be optimized a bit then. Lists are quite inefficient because they don't have an efficient append.

concatMapAttrs + a recursive call is sufficient for a whole bunch of attribute set transformations. If you need context from the parent, you can pass it down as a parameter in the recursive call. No need for a list that way.

> `lib.attrsets.flattenAttrs`:
> Flatten an attribute set to the bottom name and value pairs as
> one singular final attribute set, stripping everything else
> above.
>
> In a few words, this does what `lib.lists.flatten` does, but for
> attribute sets, rather than lists.

Attribute sets are useful for structuring code, with attribute
names serving as labels, rather than relying on comments.

One example would be when working defining `environment.shellAliases.`

```nix
environment.shellAliases =
  let
    aliases = {
      fd.f = "fd --hidden --no-ignore";

      lsd = {
        ls = {
          lc = "lsd --almost-all";

          lci = "lsd --almost-all --inode";

          lac = "lsd --all";
          lca = "lsd --all";

        };
        tree = {
          tc = "lsd --almost-all --ignore-glob .git --tree";

          tcd = "lsd --almost-all --ignore-glob .git --tree --depth";

          tci = "lsd --all --ignore-glob .git --inode --tree";
          tic = "lsd --all --ignore-glob .git --inode --tree";

          tcid = "lsd --all --ignore-glob .git --inode --tree --depth";
          ticd = "lsd --all --ignore-glob .git --inode --tree --depth";
        };
      };

      ripgrep = {
        r = "rg --glob='!.git'";
        rh = "rg --glob='!.git' --hidden";
        rhn = "rg --glob='!.git' --hidden --no-ignore";
        rhnn = "rg --hidden --no-ignore";
      };
    };
  in
    flattenAttrs aliases;
```

Here, the aliases are categorized by the binary that is being aliased.
The `lsd` set also has another subsection layer for `ls` and `tree`.
Comments could be used, but an in-code approach is better, especially
when the attribute set is large and not just a brief example like
what is shown here. This allows for inspection using
Nix directly along with any other transformations that are desired.

`environment.shellAliases` only takes one attribute set with direct
string name-value pairs. `flattenAttrs` will strip the "labels" and
return only the bottom level across the passed attribute set, being
equivalent to:

```nix
environment.shellAliases = {
  # fd
  f = "fd --hidden --no-ignore";

  # lsd - ls
  lc = "lsd --almost-all";

  lci = "lsd --almost-all --inode";

  lac = "lsd --all";
  lca = "lsd --all";

  # lsd - tree
  tc = "lsd --almost-all --ignore-glob .git --tree";

  tcd = "lsd --almost-all --ignore-glob .git --tree --depth";

  tci = "lsd --all --ignore-glob .git --inode --tree";
  tic = "lsd --all --ignore-glob .git --inode --tree";

  tcid = "lsd --all --ignore-glob .git --inode --tree --depth";
  ticd = "lsd --all --ignore-glob .git --inode --tree --depth";

  # ripgrep
  r = "rg --glob='!.git'";
  rh = "rg --glob='!.git' --hidden";
  rhn = "rg --glob='!.git' --hidden --no-ignore";
  rhnn = "rg --hidden --no-ignore";
};
```

This brings the usefulness of `lib.lists.flatten` to attribute
sets. This doesn't conflict with nor modify any existing
implementations, this is an addition, so this change should be safe.
@sebaszv sebaszv force-pushed the lib-add-flattenattrs branch from f91c485 to 4000c16 Compare November 14, 2024 02:21
@hsjobeki
Copy link
Contributor

I am starting to realize this is now a duplicate of #221608

@sebaszv
Copy link
Author

sebaszv commented Nov 15, 2024

Is discussion for this functionality continuing in #237776? It seems sensible that some discussion is needed to come to some solution, considering that this has been "in the works" by others for quite some time.

Is the aim to merge #221608? Or would these be two separate concerns?

@roberth
Copy link
Member

roberth commented Nov 15, 2024

I don't have a lot of time to work on these things, but I've put my analysis and thoughts in #237776 (comment)

I agree that these ideas have simmered for plenty of time, and it would be good to make some decisions. I hope what I described works for that. I'm trying to balance it such that we don't bloat the library more than needed, because there's a real human cost to that as well. That's not very fun, but I think we can make progress on this now. Just see the linked comment, I guess.

@sebaszv sebaszv closed this Nov 15, 2024
@sebaszv sebaszv deleted the lib-add-flattenattrs branch November 15, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: lib The Nixpkgs function library 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 1-10 12. first-time contribution This PR is the author's first one; please be gentle!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants