Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] File set combinators #222981

Closed
wants to merge 27 commits into from
Closed

[WIP] File set combinators #222981

wants to merge 27 commits into from

Conversation

infinisil
Copy link
Member

@infinisil infinisil commented Mar 25, 2023

This PR rethinks @roberth's original source combinators into a more composable datatype: Sets of files. Nothing is decided yet, but I'm really amazed by how well this is working.

The abstraction here allows efficient and lazy operations on a set of files. Typical set operations are supported, including union, intersection, difference and filtering.

This work is sponsored by Antithesis

Update: This has now been mostly incrementally merged! See #266356 for more info.

Plan

  1. Get the code fully working, but without fully commented code
  2. Write clean introduction and reference documentation
    1. Reference
    2. Introduction
  3. Make a Discourse post, showing off the concept and how people can try it out, to get wider feedback: https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117
  4. Iterate the above until everybody is happy with the design
    Despite reaching out to the community, not much feedback was given. This does mean that there's no complaints about it either though. Because of this, I'll move towards implementing it, see [WIP] File set combinators #222981 (comment)
  5. Split up this PR into smaller more incremental PR's, see File set library tracking issue and feature requests #266356

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Mar 25, 2023
@github-actions github-actions bot added the 8.has: documentation This PR adds or changes documentation label Apr 12, 2023
Copy link
Member Author

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already got some quick preliminary (generally positive) feedback by @roberth in PMs and meetings.

To move this forward I decided to start writing documentation, even if it's not fully implemented, to get a sense of what the interface should look like. I'm creating comments to highlight some key parts of that interface, but it would be great to get some rough feedback on the other parts of the interface documentation too.

doc/functions/fileset.section.md Show resolved Hide resolved
lib/fileset/default.nix Outdated Show resolved Hide resolved
doc/functions/fileset.section.md Outdated Show resolved Hide resolved
doc/functions/fileset.section.md Outdated Show resolved Hide resolved
lib/fileset/default.nix Outdated Show resolved Hide resolved
lib/fileset/default.nix Outdated Show resolved Hide resolved
doc/functions/fileset.section.md Outdated Show resolved Hide resolved
lib/fileset/default.nix Outdated Show resolved Hide resolved
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-47/27387/1

@infinisil infinisil added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Apr 19, 2023
@infinisil
Copy link
Member Author

infinisil commented Apr 21, 2023

I've now done a major rewrite of this and am super hype now, this is turning into something really cool. Weekend time now though, I'll continue next week. See the PR description for how I plan to proceed.

}
```

To see everything you can do with file sets, check out the [reference documentation](#sec-functions-library-fileset).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To see everything you can do with file sets, check out the [reference documentation](#sec-functions-library-fileset).
To see everything you can do with file sets, see the [library documentation](#sec-functions-library-fileset).

We're already in reference documentation; just an introductory section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this in a tutorial-style, with the expectation that it would be moved to (or just rendered at) nix.dev at some point (that's where the doc team is moving tutorial docs towards).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While all text can be learned from, this is not a "learning experience", certainly not according to diataxis. And I think that's fine. File sets aren't just functions; the type needs reference documentation too, and it's a good place for a basic introduction to the concept.

In general, tutorials are the weakest part of documentation, the most misunderstood and the most difficult to do well. Most software projects have poor - or non-existent - tutorials.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then I clearly don't know how to write a tutorial 😅, please take another look at the document now though, I wouldn't really count that as part of the reference.

lib/fileset/default.nix Outdated Show resolved Hide resolved
lib/fileset/default.nix Outdated Show resolved Hide resolved
lib/fileset/default.nix Outdated Show resolved Hide resolved
lib/fileset/default.nix Outdated Show resolved Hide resolved
lib/fileset/default.nix Outdated Show resolved Hide resolved
lib/fileset.nix Outdated Show resolved Hide resolved
error: lib.fileset.trace: Expected second argument "/home/user/my/project/non-existent" to be a path that exists, but it doesn't.
```

File sets can be composed using the functions [`union`](#function-library-lib.fileset.union) (and the list-based equivalent [`unions`](#function-library-lib.fileset.unions)), [`intersect`](#function-library-lib.fileset.intersect) (and the list-based equivalent [`intersects`](#function-library-lib.fileset.intersects)) and [`difference`](#function-library-lib.fileset.difference), the most useful of which are [`unions`](#function-library-lib.fileset.unions) and [`difference`](#function-library-lib.fileset.difference):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising to me that there are two different versions of each function, and the list versions end in "s". It made me wonder where the multiple unions were coming from, because I'd think about "the union of three directories", rather than that being two different unions in a fold, which is more of an implementation detail.

Copy link
Member Author

@infinisil infinisil May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted this from Haskell's Data.Set functions, which also features union and unions.

We could also read unions [ a b c ] as union a (union b c), which then does have multiple unions, though that's a bit of a stretch.

Do you perhaps have any suggestions for better names? unions is probably one of the most useful functions, so I like how it's currently fairly short.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unions is probably one of the most useful functions

Agreed on this, but is union (no s) useful? Maybe we just need the list versions, and then we could call them union, intersect, etc? In Nix we're a lot less likely to use them as arguments to higher-order functions like fold than might be the case in Haskell, where functions that operate on exactly two arguments might be more useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined that two-argument functions would still be preferred for use cases like "I have a file set, intersect that with only files in ./lib", which would be e.g.

intersect ./lib (fileFilter (file: file.ext == "nix") ./.)

But I'm just realizing that the list-based one isn't much worse:

intersects [ ./lib (fileFilter (file: file.ext == "nix") ./.) ]

Though it does require adding a ] at the end, which isn't the case for the two-argument function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have both ways for convenience, and the s for lists is not the worst. If we can come up with a more intuitive naming for non-Haskellers that would be great.

@roberth
Copy link
Member

roberth commented Jun 4, 2023

Nix / lib sucks at trees.

It's just a thought; maybe this goes nowhere, but if we could do better at trees, perhaps we could have better file sets, or at least a better file set implementation internally.

@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jun 8, 2023
Comment on lines +121 to +122
_base = base;
_tree = tree;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a call with @roberth where we concluded that this would be even better to make sure people know these are internal:

Suggested change
_base = base;
_tree = tree;
_internalBase = base;
_internalTree = tree;

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/zurich-23-05-zhf-hackathon-and-workshop-report/29093/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117/1

@infinisil
Copy link
Member Author

These 4 separate path-library related PR's are needed before a basic version of file set combinators can be merged, they are all up-to-date with all feedback addressed, please review:

@infinisil
Copy link
Member Author

I opened a first draft PR for just lib.fileset.toSource and the file set combinators base: #245623


# Coerce and normalise the bases of multiple file set values passed to user-facing functions
# Type: String -> [ { context :: String, value :: Any } ] -> { commonBase :: Path, trees :: [ <tree> ] }
_normaliseBase = function: list:
Copy link
Member Author

@infinisil infinisil Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own future reference for once I get to implementing the combinators (in particular intersect):

This function here should only be used for union, not for difference or intersect. In particular:

  • For union, the resulting base path should be the common prefix of all the inputs base path, e.g. union /foo /foo/bar should give /foo as the base, this is what this function does
  • For intersect, the resulting base path should be the path with the most components of the input bases, e.g. intersect /foo /foo/bar should give /foo/bar as the base. If the paths don't overlap, an empty set is returned.
    • TODO: What base should an empty result have? E.g. intersect /foo /bar, should that give an error, pick /foo or /bar arbitrarily, or use /? Maybe the notion of a special empty file set value should be introduced.
  • For difference, the resulting base is the same as the base from the first argument, e.g. difference /foo / has base /foo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This not being done leads to this error when it should work just fine:

nix-repl> fileset.toSource { root = ./lib; fileset = fileset.intersect ./lib ./.; }
error: lib.fileset.toSource: Expected attribute `fileset` to not be influenceable by any paths outside `root`, but `lib.fileset.getInfluenceBase fileset` "/home/tweagysil/antithesis/nixpkgs" is outside `root`.

@adrian-gierakowski
Copy link
Contributor

Is there a lightweight way to start using these right now? Or would I need to pull an entire copy of nixpkgs from this branch?

@mohe2015
Copy link
Contributor

mohe2015 commented Aug 3, 2023

Is there a lightweight way to start using these right now? Or would I need to pull an entire copy of nixpkgs from this branch?

you could do a shallow copy (--depth 1) from this branch. That shouldn't be too big

@adrian-gierakowski
Copy link
Contributor

Is there a lightweight way to start using these right now? Or would I need to pull an entire copy of nixpkgs from this branch?

you could do a shallow copy (--depth 1) from this branch. That shouldn't be too big

thanks! So the changes from the other PR are included here?

@infinisil
Copy link
Member Author

infinisil commented Aug 3, 2023

@adrian-gierakowski @mohe2015 I think a shallow clone would be about the same as a GitHub tarball download, I don't think there's a better way for now. In stable Nix it would look like this:

fetchTarball "https://github.com/tweag/nixpkgs/tarball/file-sets"
# Or replace `file-sets` with a commit to pin it 

With niv it would be

niv add tweag/nixpkgs --branch=file-sets

And with experimental Flakes:

{
  inputs.nixpgksFilesets.url = "github:tweag/nixpkgs/file-sets";
}

And if you give it a try I'd be happy to hear any feedback you have, good or bad :D

@roberth
Copy link
Member

roberth commented Aug 3, 2023

You could fetch just lib using the tree hash:

[nixpkgs]$ gh pr checkout 222981

[nixpkgs]$ git rev-parse HEAD:lib
1bdcd7fc8a6a40b2e805bad759b36e64e911036b

[nixpkgs]$ curl -L https://github.com/NixOS/nixpkgs/archive/1bdcd7fc8a6a40b2e805bad759b36e64e911036b.tar.gz | tar -tz | grep fileset
nixpkgs-1bdcd7fc8a6a40b2e805bad759b36e64e911036b/fileset.nix
let
  inherit (import (builtins.fetchTarball "https://github.com/NixOS/nixpkgs/archive/1bdcd7fc8a6a40b2e805bad759b36e64e911036b.tar.gz"))
    fileset;
in
  ...

@adrian-gierakowski
Copy link
Contributor

You could fetch just lib using the tree hash:

[nixpkgs]$ gh pr checkout 222981

[nixpkgs]$ git rev-parse HEAD:lib
1bdcd7fc8a6a40b2e805bad759b36e64e911036b

[nixpkgs]$ curl -L https://github.com/NixOS/nixpkgs/archive/1bdcd7fc8a6a40b2e805bad759b36e64e911036b.tar.gz | tar -tz | grep fileset
nixpkgs-1bdcd7fc8a6a40b2e805bad759b36e64e911036b/fileset.nix
let
  inherit (import (builtins.fetchTarball "https://github.com/NixOS/nixpkgs/archive/1bdcd7fc8a6a40b2e805bad759b36e64e911036b.tar.gz"))
    fileset;
in
  ...

Awesome, thanks @roberth!


The predicate is called with an attribute set containing these attributes:

- `name`: The filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% clear. Is this like baseNameOf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yup it is!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw note that this is the draft PR. The one ready for review is #245623

@roberth
Copy link
Member

roberth commented Sep 6, 2023

a more composable datatype

Which doesn't compose with the existing source library except through explicit conversions, and the conversion isn't isomorphic.

It's not really the composability that improves (comparatively). Rather the meaning of the operations is more set-like, to a large degree. Instead of two somewhat unintuitive attributes, we now have just one - and I'll note that in both designs we do get good behavior almost all the time.

@infinisil
Copy link
Member Author

I'll close this, because this is now mostly implemented! See #266356 for further updates and details :)

@infinisil infinisil closed this Nov 9, 2023
@infinisil infinisil deleted the file-sets branch November 9, 2023 00:06
@infinisil infinisil restored the file-sets branch November 9, 2023 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: lib The Nixpkgs function library 6.topic: policy discussion 6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants