Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filterFiles to lib and lib.sources #188301

Closed
wants to merge 1 commit into from
Closed

Conversation

ursi
Copy link
Contributor

@ursi ursi commented Aug 25, 2022

Get all files (recursively) that pass a filter function. Unlike other source
filtering functions, this will not include empty directories in the final
derivation.

Type: sourceLike -> (String -> Bool) -> Source

Example:
  filterFiles ./. (lib.hasSuffix ".html")

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Aug 25, 2022
@bobby285271 bobby285271 added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Aug 26, 2022
lib/sources.nix Outdated Show resolved Hide resolved
Get all files (recursively) that pass a filter function. Unlike other source
filtering functions, this will not include empty directories in the final
derivation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This function always scans the entire directory tree. In large code bases, you can improve performance by filtering the second argument.

(assuming you're flipping the parameters as suggested below, and we fix sourceLike support)

includedPaths =
builtins.filter
filter
(lib.filesystem.listFilesRecursive src);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not actually support all sourceLike values, such as the result of cleanSource. Types are a bit vague in Nix documentation, so I really can't blame you for not picking up on this subtlety.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried

lib.filesystem.listFilesRecursive (lib.cleanSource ./.)

in nixpkgs and it worked.

lib/sources.nix Outdated Show resolved Hide resolved
lib/sources.nix Outdated
Comment on lines 262 to 263
(path': lib.hasPrefix path (toString path'))
includedPaths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the source O(#files²) . Maybe something the memoizeSource function could be useful.

lib/sources.nix Outdated
(lib.filesystem.listFilesRecursive src);

name =
# Fix the problem of store paths names causing a rebuild every time they change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem is not unique to this function, so it should be moved somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I good to pull this out into its own function and submit a PR with both, or should it be pulled out and put into a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is no good, because it reintroduces the parent path impurity.
Unfortunately it's up to us to make sure that the evaluation of a directory does not depend on the location of that directory. That location includes the basename: someone may use a different checkout, or a flake source may resolve to a different location depending on the entire project through the store path hash, thereby largely defeating the purpose of source filtering.
name should be "source" unless the caller specifies something else. In this case you can let the caller set the name afterwards with cleanSourceWith. @infinisil is working on #112083, which will add a more convenient function for setting the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you elaborate on what exactly "largely defeats the purpose of source filtering"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies; I might have responded too quickly here. Nonetheless I am not sure if this new solution should be applied to this new function, and whether this new function fits the larger design of the source combinator library. I'll defer to @infinisil who's currently more up to speed about all of that, including lazy trees.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ursi I'm not entirely sure I understand this name handling here. What exactly does it do and what use cases are affected by this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It addresses the problem caused by using builtins.filterSource on a generated source. Such a source will have a name of the form /nix/store/c59y63zj2pgjx87184kv1kyxqqs9b5sl-source, which will be updated any time that source changes (A particularly common example of this is the source ./. inside a flake). Because filterSource uses the base name of the file for the name of the new derivation, this means that whenever the generated source changes, the source created from filterSource changes, even if the files are exactly the same - it has a different name.

This function checks to see if you're filtering a source from a derivation, and if you are, it just uses the generic name "source" instead of getting the name from the base name like filterSource would.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so on one hand this shouldn't be a problem anymore with lazy trees, which makes it so that the path to the local flake is not directly a nix store path anymore. But also, that Nix branch would break the use case this intends to fix, because baseNameOf for lazy paths (default for flakes) returns something like "virtual0000000000000000000000004-source". While I reported an issue with exactly this in that PR, all the proposed solutions would also break the code here (and if you consider paths to always be relative to a filesystem root, it also makes sense). In general I'd say that we shouldn't split "/nix/store/-" paths and try to infer something from that, it's not very sound in general, not just for the lazy trees PR.

On the other hand, @roberth's proposed solution of just always using "source" as the name and deferring changing of that name to other modifier functions sounds much better, I think we should go for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone ahead and just used the name "source" for everything.

lib/sources.nix Outdated Show resolved Hide resolved
lib/sources.nix Outdated
Comment on lines 256 to 257
cleanSourceWith {
inherit name src;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines can be simplified to filter after the other changes.

@Quelklef
Copy link

Would it make sense to relativize the paths with respect to project root? This way someone can include/exclude directories "for free" with this api, for instance by using startsWith "./assets/" to include all of assets/.

(If paths are not relativized, then you could test for contains "/assets/", but this assumes we will never see some other directory called assets)

@ursi
Copy link
Contributor Author

ursi commented Oct 20, 2022

Good point. It is unclear to me why filterSource passes absolute paths to the filter function instead of paths relative to the source you're filtering.

@roberth
Copy link
Member

roberth commented Oct 24, 2022

It is unclear to me why filterSource passes absolute paths to the filter function instead of paths relative to the source you're filtering.

Path values in Nix are always absolute. I don't think diverging from that is a great idea, because it creates a patch interface, compared to the other source functions.

@ursi
Copy link
Contributor Author

ursi commented Oct 24, 2022

well for one, filterSource doesn't actually use path values.

filterSource (path: _: trace (typeOf path) false) ./.

outputs trace: string ....
I understand the argument for keeping things the same as all the other interfaces, but I also can't see any reason you would ever want your string to be an absolute path as opposed to a path relative to your source. Now, either api can be transformed into the other. You can tack the source directory path onto the relative path, and you can subtract it from the absolute path. That being said, I think tacking it on is easier than removing it, and I also think that no good nix code is ever going to rely on the parts of the path that are outside of the current project. If you want to keep it consistent, I understand, but I think relative paths is definitely the superior API.

@Quelklef
Copy link

(Relative paths also improves purity, since strictly speaking otherwise one can write a filter over e.g. startsWith "/home/quelklef/")

@MatthewCroughan
Copy link
Contributor

@roberth If it's easy enough to demonstrate, can you give an example of the patch interference in code?

@ursi
Copy link
Contributor Author

ursi commented Oct 24, 2022

Okay so I realized flakes actually fix the impurity issue of using absolute paths, since everything is moved to the nix store for evaluation, however, I still do think the relative path API has superior ergonomics.

@bjornfor
Copy link
Contributor

Okay so I realized flakes actually fix the impurity issue of using absolute paths, since everything is moved to the nix store for evaluation, however, I still do think the relative path API has superior ergonomics.

AFAIU, NixOS/nix#6530 makes it so flakes can be evaluated without copying them to the nix store first.

@ursi
Copy link
Contributor Author

ursi commented Oct 24, 2022

Well if that's the case, the potentially the impurity argument still stands.

@infinisil infinisil mentioned this pull request Jan 18, 2023
9 tasks
@ursi
Copy link
Contributor Author

ursi commented Jan 20, 2023

I have pushed a new version with a lot of the changes made that had been pointed out. It is also much faster than before.

@infinisil
Copy link
Member

infinisil commented Jun 19, 2023

I recently developed a safer and easier-to-use abstraction for source filtering in a draft PR to Nixpkgs, please take a look, try it out, and give feedback! https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117

In particular the use case of this issue is covered with the generic lib.fileset.fileFilter function:

let
  fs = lib.fileset;
in
fs.fileFilter (file: file.ext == "html") ./.

Though we can also easily add a convenience function like fs.fileExtFilter [ "html" ] ./. to make this shorter :)

@ursi
Copy link
Contributor Author

ursi commented Jun 19, 2023

@infinisil in your example, does that just get all the html files directly in ./. or does it traverse the entire ./. file tree and copy over an html-files-only version of it?

@infinisil
Copy link
Member

@ursi It traverses the entire ./. recursively :)

@drupol drupol added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jul 26, 2023
@drupol drupol marked this pull request as draft July 26, 2023 13:17
@infinisil
Copy link
Member

@infinisil in your example, does that just get all the html files directly in ./. or does it traverse the entire ./. file tree and copy over an html-files-only version of it?

Oh I'm just realizing that I answered something different than I think you asked. So the answer is that it only copies the html files into the Nix store. The other files are traversed recursively but not copied.

@ursi
Copy link
Contributor Author

ursi commented Oct 31, 2023

I finally looked more into the fileset library. It does have one restriction that this does not - it doesn't work on any path, only local path values. So you cannot filter the result of another derivation.

@infinisil
Copy link
Member

infinisil commented Oct 31, 2023

@ursi Indeed. What's your use case for filtering a store path? I'm wondering because maybe we could also have library functions for that, see also #264541. Note that if you need to filter a lib.sources value, that will be supported soon with #261732.

It does make a lot of sense for lib.fileset to not support that though, see #264537.

@ursi
Copy link
Contributor Author

ursi commented Oct 31, 2023

Well, the toy use case that made me realize this limitation was me trying to use your library to filter over all of nixpkgs to compare its performance with this function. I just tried pulling in nixpkgs from my project's flake input in the repl.

@infinisil
Copy link
Member

@ursi I see, we obviously can't justify this as a use case for that feature, but if you have a practical use case let me know in #264541. I'd be also very happy to get feedback on the library in general, probably best to open issues and ping me if you encounter problems :)

Regarding performance, I've done a benchmark before, and indeed it's less performant, however it's not possible to improve it without updating the builtins, see NixOS/nix#8820. In fact it's the same problem you've had to work around in this very PR here (doing a recursive readDir). If performance of this is important to you, leave a 👍 or a comment there.

In any case, I've done my best to optimise the file set library as much as possible without any such builtin, there's even a benchmark to check for regressions.

@infinisil
Copy link
Member

I'll close this PR, since the file set library is fairly usable now. See #266356 for status, updates and feature requests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants