Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make parent dir available in fileset.fileFilter #306371

Closed

Conversation

andrewhamon
Copy link
Contributor

@andrewhamon andrewhamon commented Apr 23, 2024

Description of changes

Currently, fileFilter only allows filtering based based on a files base name + file type.

This is a bit limiting if you want to include files based on the name of the directory they reside in.

I bumped in to this when using fileFilter to make a minimal set of files to make Cargo happy in a Rust workspace - specifically, I needed to pull in any .rs file that lived under a bin/, examples/, or benches/ folder before Cargo would stop complaining. (Note that I am trying to avoid pulling in all rust files - I'm working in a large Rust workspace but trying to package a single small member of the workspace).

This change allows me to write filters like such as this:

file.hasExt "rs" && (pkgs.lib.hasSuffix  "bin" (toString file.path))

Open questions

  • should dir be a Path (as implemented it is) or should it be converted to stringified path relative to the filter root?

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

Benchmark results

There is a preexisting benchmark script. I ran ./benchmark.sh master and got the following results:

Benchmarking expression toSource { root = ./.; fileset = ./.; }
Mean CPU time 0.179607 (σ = 0.00527474) for 10 runs is 100.03230315958318% (σ = 3.669259422572901%) of the old value 0.179549 (σ = 0.00394593)
Statistic .gc.totalBytes (19776544) is 100.0001% (+32) of the old value 19776512
1 stats differ between the current tree and master

Benchmarking expression toSource { root = ./.; fileset = unions (mapAttrsToList (name: value: ./. + "/${name}") (builtins.readDir ./.)); }
Mean CPU time 0.105958 (σ = 0.00257414) for 10 runs is 101.95621842675006% (σ = 2.7418005135301344%) of the old value 0.103925 (σ = 0.00119843)
0 stats differ between the current tree and master

This doesn't seem to have a large impact on performance if not using dir

Currently, fileFilter only allows filtering based based on a files base
name + file type.

This is a bit limiting if you want to include files based on the name of
the directory they reside in.

I bumped in to this when using fileFilter to make a minimal set of files
to make Cargo happy in a Rust workspace - specifically, I needed to pull
in any `.rs` file that lived under a `bin/`, `examples/`, or `benches/`
folder before Cargo would stop complaining. (Note that I am trying to
avoid pulling in all rust files - I'm working in a large Rust workspace
but trying to package a single small member of the workspace).

```
@andrewhamon andrewhamon requested a review from infinisil as a code owner April 23, 2024 20:11
@github-actions github-actions bot added the 6.topic: lib The Nixpkgs function library label Apr 23, 2024
Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Unfortunately I already explored this option in #269504, where we figured that this shouldn't be done like that because it leads to poorer performance than expected. In your example, it would have to recurse throughout all bin directories, even though they won't be included in the result.

A more general tracking issue for this use case is #271307, for which I think this is the best way forward: #271307 (comment)

@andrewhamon
Copy link
Contributor Author

andrewhamon commented Apr 23, 2024

Re: performance, it seems like passing the path unaltered (as I do in this PR) is close to zero cost for any use cases that don't do anything with dir. The benchmarks seem to confirm this.

In your example, it would have to recurse throughout all bin directories, even though they won't be included in the result.

I'm a little confused... doesn't fileFilter already recurse through all directories? Since it passes every file to the predicate?

@andrewhamon
Copy link
Contributor Author

andrewhamon commented Apr 23, 2024

After reading #269504 a little more closely you gave this example:

lib.fileset.fileFilter (file:
  head file.components == ".git"
) ./.

I see how that is very wasteful, but that is a very different situation that my example - I am already in a situation where it makes sense to use fileFilter (thus paying the performance penalty).

This is the full example of how I'm using this:

  fileSetForCrate = fileset.toSource {
    root = ../..;
    fileset = fileset.fileFilter (file:
      file.name == "Cargo.toml"
        || file.name == "Cargo.lock"
        || file.name == "main.rs"
        || file.name == "lib.rs"
        || file.name == "build.rs"
        # Include bin/*.rs, examples/*.rs, and benches/*.rs to make Cargo happy
        || (file.hasExt "rs" && (pkgs.lib.hasSuffix  "bin" (toString file.dir)))
        || (file.hasExt "rs" && (pkgs.lib.hasSuffix  "examples" (toString file.dir)))
        || (file.hasExt "rs" && (pkgs.lib.hasSuffix  "benches" (toString file.dir)))
      ) ../..;
  };

@andrewhamon
Copy link
Contributor Author

andrewhamon commented Apr 23, 2024

I think I agree that your preferred solution (#271307 (comment)) is a better fit. To be clear, for my needs it will still have the same performance since I am already in a situation where I need to recurse through all files, but that API seems harder to accidentally misuse.

@infinisil
Copy link
Member

Performance is a bit more tricky because the fileset combinators are lazy. So e.g. while fileFilter by itself always has to recurse throughout the entire directory, if you combine it with e.g. intersection, it won't have to. Here's a fun example that creates a file set which contains literally every Nix file on your system, but then limits it to just a single directory:

nix-repl> :l <nixpkgs/lib>
Added 428 variables.

nix-repl> :a fileset
Added 13 variables.

nix-repl> nixFiles = fileFilter (file: file.hasExt "nix") /.

nix-repl> trace (intersection nixFiles ~/src/nixpkgs/main/lib/fileset)
trace: /home/tweagysil/src/nixpkgs/main/lib/fileset
trace: - default.nix (regular)
trace: - internal.nix (regular)
trace: - mock-splitRoot.nix (regular)
«lambda @ /nix/store/fmb2gkj0jyd4nzi7xzkp5k1c9pz45v5s-source/lib/fileset/default.nix:225:8»

The same will happen with the proposed withoutDirectories function.

@infinisil
Copy link
Member

Re: performance, it seems like passing the path unaltered (as I do in this PR) is close to zero cost for any use cases that don't do anything with dir. The benchmarks seem to confirm this.

This would be problematic because it would allow filesets to depend on the absolute location of the local directory. It's a very intentional design choice to make sure this cannot happen to avoid problems relating to that. This would be good to add to the design goals of the library.

So we cannot pass absolute paths because of that, but relative paths would be fine (even if they're a bit slower). Efficiency by itself is notably not a design goal of the library :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: lib The Nixpkgs function library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants