-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
1,122 additions
and
1 deletion.
There are no files selected for viewing
Validating CODEOWNERS rules …
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
<!-- TODO: Render this document in front of function documentation in case https://github.com/nix-community/nixdoc/issues/19 is ever supported --> | ||
|
||
# File sets {#sec-fileset} | ||
|
||
The [`lib.fileset`](#sec-functions-library-fileset) library allows you to work with _file sets_. | ||
A file set is a mathematical set of local files that can be added to the Nix store for use in Nix derivations. | ||
File sets are easy and safe to use, providing obvious and composable semantics with good error messages to prevent mistakes. | ||
|
||
These sections apply to the entire library. | ||
See the [function reference](#sec-functions-library-fileset) for function-specific documentation. | ||
|
||
The file set library is currently very limited but is being expanded to include more functions over time. | ||
|
||
## Implicit coercion from paths to file sets {#sec-fileset-path-coercion} | ||
|
||
All functions accepting file sets as arguments can also accept [paths](https://nixos.org/manual/nix/stable/language/values.html#type-path) as arguments. | ||
Such path arguments are implicitly coerced to file sets containing all files under that path: | ||
- A path to a file turns into a file set containing that single file. | ||
- A path to a directory turns into a file set containing all files _recursively_ in that directory. | ||
|
||
If the path points to a non-existent location, an error is thrown. | ||
|
||
::: {.note} | ||
Just like in Git, file sets cannot represent empty directories. | ||
Because of this, a path to a directory that contains no files (recursively) will turn into a file set containing no files. | ||
::: | ||
|
||
:::{.note} | ||
File set coercion does _not_ add any of the files under the coerced paths to the store. | ||
Only the [`toSource`](#function-library-lib.fileset.toSource) function adds files to the Nix store, and only those files contained in the `fileset` argument. | ||
This is in contrast to using [paths in string interpolation](https://nixos.org/manual/nix/stable/language/values.html#type-path), which does add the entire referenced path to the store. | ||
::: | ||
|
||
### Example {#sec-fileset-path-coercion-example} | ||
|
||
Assume we are in a local directory with a file hierarchy like this: | ||
``` | ||
├─ a/ | ||
│ ├─ x (file) | ||
│ └─ b/ | ||
│ └─ y (file) | ||
└─ c/ | ||
└─ d/ | ||
``` | ||
|
||
Here's a listing of which files get included when different path expressions get coerced to file sets: | ||
- `./.` as a file set contains both `a/x` and `a/b/y` (`c/` does not contain any files and is therefore omitted). | ||
- `./a` as a file set contains both `a/x` and `a/b/y`. | ||
- `./a/x` as a file set contains only `a/x`. | ||
- `./a/b` as a file set contains only `a/b/y`. | ||
- `./c` as a file set is empty, since neither `c` nor `c/d` contain any files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,183 @@ | ||
# File set library | ||
|
||
The main goal of the file set library is to be able to select local files that should be added to the Nix store. | ||
It should have the following properties: | ||
- Easy: | ||
The functions should have obvious semantics, be low in number and be composable. | ||
- Safe: | ||
Throw early and helpful errors when mistakes are detected. | ||
- Lazy: | ||
Only compute values when necessary. | ||
|
||
Non-goals are: | ||
- Efficient: | ||
If the abstraction proves itself worthwhile but too slow, it can be still be optimized further. | ||
|
||
## Tests | ||
|
||
Tests are declared in [`tests.sh`](./tests.sh) and can be run using | ||
``` | ||
./tests.sh | ||
``` | ||
|
||
## Benchmark | ||
|
||
A simple benchmark against the HEAD commit can be run using | ||
``` | ||
./benchmark.sh HEAD | ||
``` | ||
|
||
This is intended to be run manually and is not checked by CI. | ||
|
||
## Internal representation | ||
|
||
The internal representation is versioned in order to allow file sets from different Nixpkgs versions to be composed with each other, see [`internal.nix`](./internal.nix) for the versions and conversions between them. | ||
This section describes only the current representation, but past versions will have to be supported by the code. | ||
|
||
### `fileset` | ||
|
||
An attribute set with these values: | ||
|
||
- `_type` (constant string `"fileset"`): | ||
Tag to indicate this value is a file set. | ||
|
||
- `_internalVersion` (constant string equal to the current version): | ||
Version of the representation | ||
|
||
- `_internalBase` (path): | ||
Any files outside of this path cannot influence the set of files. | ||
This is always a directory. | ||
|
||
- `_internalTree` ([filesetTree](#filesettree)): | ||
A tree representation of all included files under `_internalBase`. | ||
|
||
- `__noEval` (error): | ||
An error indicating that directly evaluating file sets is not supported. | ||
|
||
## `filesetTree` | ||
|
||
One of the following: | ||
|
||
- `{ <name> = filesetTree; }`: | ||
A directory with a nested `filesetTree` value for every directory entry. | ||
Even entries that aren't included are present as `null` because it improves laziness and allows using this as a sort of `builtins.readDir` cache. | ||
|
||
- `"directory"`: | ||
A directory with all its files included recursively, allowing early cutoff for some operations. | ||
This specific string is chosen to be compatible with `builtins.readDir` for a simpler implementation. | ||
|
||
- `"regular"`, `"symlink"`, `"unknown"` or any other non-`"directory"` string: | ||
A nested file with its file type. | ||
These specific strings are chosen to be compatible with `builtins.readDir` for a simpler implementation. | ||
Distinguishing between different file types is not strictly necessary for the functionality this library, | ||
but it does allow nicer printing of file sets. | ||
|
||
- `null`: | ||
A file or directory that is excluded from the tree. | ||
It may still exist on the file system. | ||
|
||
## API design decisions | ||
|
||
This section justifies API design decisions. | ||
|
||
### Internal structure | ||
|
||
The representation of the file set data type is internal and can be changed over time. | ||
|
||
Arguments: | ||
- (+) The point of this library is to provide high-level functions, users don't need to be concerned with how it's implemented | ||
- (+) It allows adjustments to the representation, which is especially useful in the early days of the library. | ||
- (+) It still allows the representation to be stabilized later if necessary and if it has proven itself | ||
|
||
### Influence tracking | ||
|
||
File set operations internally track the top-most directory that could influence the exact contents of a file set. | ||
Specifically, `toSource` requires that the given `fileset` is completely determined by files within the directory specified by the `root` argument. | ||
For example, even with `dir/file.txt` being the only file in `./.`, `toSource { root = ./dir; fileset = ./.; }` gives an error. | ||
This is because `fileset` may as well be the result of filtering `./.` in a way that excludes `dir`. | ||
|
||
Arguments: | ||
- (+) This gives us the guarantee that adding new files to a project never breaks a file set expression. | ||
This is also true in a lesser form for removed files: | ||
only removing files explicitly referenced by paths can break a file set expression. | ||
- (+) This can be removed later, if we discover it's too restrictive | ||
- (-) It leads to errors when a sensible result could sometimes be returned, such as in the above example. | ||
|
||
### Empty directories | ||
|
||
File sets can only represent a _set_ of local files, directories on their own are not representable. | ||
|
||
Arguments: | ||
- (+) There does not seem to be a sensible set of combinators when directories can be represented on their own. | ||
Here's some possibilities: | ||
- `./.` represents the files in `./.` _and_ the directory itself including its subdirectories, meaning that even if there's no files, the entire structure of `./.` is preserved | ||
|
||
In that case, what should `fileFilter (file: false) ./.` return? | ||
It could return the entire directory structure unchanged, but with all files removed, which would not be what one would expect. | ||
|
||
Trying to have a filter function that also supports directories will lead to the question of: | ||
What should the behavior be if `./foo` itself is excluded but all of its contents are included? | ||
It leads to having to define when directories are recursed into, but then we're effectively back at how the `builtins.path`-based filters work. | ||
|
||
- `./.` represents all files in `./.` _and_ the directory itself, but not its subdirectories, meaning that at least `./.` will be preserved even if it's empty. | ||
|
||
In that case, `intersect ./. ./foo` should only include files and no directories themselves, since `./.` includes only `./.` as a directory, and same for `./foo`, so there's no overlap in directories. | ||
But intuitively this operation should result in the same as `./foo` – everything else is just confusing. | ||
- (+) This matches how Git only supports files, so developers should already be used to it. | ||
- (-) Empty directories (even if they contain nested directories) are neither representable nor preserved when coercing from paths. | ||
- (+) It is very rare that empty directories are necessary. | ||
- (+) We can implement a workaround, allowing `toSource` to take an extra argument for ensuring certain extra directories exist in the result. | ||
- (-) It slows down store imports, since the evaluator needs to traverse the entire tree to remove any empty directories | ||
- (+) This can still be optimized by introducing more Nix builtins if necessary | ||
|
||
### String paths | ||
|
||
File sets do not support Nix store paths in strings such as `"/nix/store/...-source"`. | ||
|
||
Arguments: | ||
- (+) Such paths are usually produced by derivations, which means `toSource` would either: | ||
- Require IFD if `builtins.path` is used as the underlying primitive | ||
- Require importing the entire `root` into the store such that derivations can be used to do the filtering | ||
- (+) The convenient path coercion like `union ./foo ./bar` wouldn't work for absolute paths, requiring more verbose alternate interfaces: | ||
- `let root = "/nix/store/...-source"; in union "${root}/foo" "${root}/bar"` | ||
|
||
Verbose and dangerous because if `root` was a path, the entire path would get imported into the store. | ||
|
||
- `toSource { root = "/nix/store/...-source"; fileset = union "./foo" "./bar"; }` | ||
|
||
Does not allow debug printing intermediate file set contents, since we don't know the paths contents before having a `root`. | ||
|
||
- `let fs = lib.fileset.withRoot "/nix/store/...-source"; in fs.union "./foo" "./bar"` | ||
|
||
Makes library functions impure since they depend on the contextual root path, questionable composability. | ||
|
||
- (+) The point of the file set abstraction is to specify which files should get imported into the store. | ||
|
||
This use case makes little sense for files that are already in the store. | ||
This should be a separate abstraction as e.g. `pkgs.drvLayout` instead, which could have a similar interface but be specific to derivations. | ||
Additional capabilities could be supported that can't be done at evaluation time, such as renaming files, creating new directories, setting executable bits, etc. | ||
|
||
### Single files | ||
|
||
File sets cannot add single files to the store, they can only import files under directories. | ||
|
||
Arguments: | ||
- (+) There's no point in using this library for a single file, since you can't do anything other than add it to the store or not. | ||
And it would be unclear how the library should behave if the one file wouldn't be added to the store: | ||
`toSource { root = ./file.nix; fileset = <empty>; }` has no reasonable result because returing an empty store path wouldn't match the file type, and there's no way to have an empty file store path, whatever that would mean. | ||
|
||
## To update in the future | ||
|
||
Here's a list of places in the library that need to be updated in the future: | ||
- > The file set library is currently very limited but is being expanded to include more functions over time. | ||
in [the manual](../../doc/functions/fileset.section.md) | ||
- > Currently the only way to construct file sets is using implicit coercion from paths. | ||
in [the `toSource` reference](./default.nix) | ||
- > For now filesets are always paths | ||
in [the `toSource` implementation](./default.nix), also update the variable name there | ||
- Once a tracing function exists, `__noEval` in [internal.nix](./internal.nix) should mention it | ||
- If/Once a function to convert `lib.sources` values into file sets exists, the `_coerce` and `toSource` functions should be updated to mention that function in the error when such a value is passed | ||
- If/Once a function exists that can optionally include a path depending on whether it exists, the error message for the path not existing in `_coerce` should mention the new function |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Benchmarks lib.fileset | ||
# Run: | ||
# [nixpkgs]$ lib/fileset/benchmark.sh HEAD | ||
|
||
set -euo pipefail | ||
shopt -s inherit_errexit dotglob | ||
|
||
if (( $# == 0 )); then | ||
echo "Usage: $0 HEAD" | ||
echo "Benchmarks the current tree against the HEAD commit. Any git ref will work." | ||
exit 1 | ||
fi | ||
compareTo=$1 | ||
|
||
SCRIPT_FILE=$(readlink -f "${BASH_SOURCE[0]}") | ||
SCRIPT_DIR=$(dirname "$SCRIPT_FILE") | ||
|
||
nixpkgs=$(cd "$SCRIPT_DIR/../.."; pwd) | ||
|
||
tmp="$(mktemp -d)" | ||
clean_up() { | ||
rm -rf "$tmp" | ||
} | ||
trap clean_up EXIT SIGINT SIGTERM | ||
work="$tmp/work" | ||
mkdir "$work" | ||
cd "$work" | ||
|
||
# Create a fairly populated tree | ||
touch f{0..5} | ||
mkdir d{0..5} | ||
mkdir e{0..5} | ||
touch d{0..5}/f{0..5} | ||
mkdir -p d{0..5}/d{0..5} | ||
mkdir -p e{0..5}/e{0..5} | ||
touch d{0..5}/d{0..5}/f{0..5} | ||
mkdir -p d{0..5}/d{0..5}/d{0..5} | ||
mkdir -p e{0..5}/e{0..5}/e{0..5} | ||
touch d{0..5}/d{0..5}/d{0..5}/f{0..5} | ||
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5} | ||
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5} | ||
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5} | ||
|
||
bench() { | ||
NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \ | ||
nix-instantiate --eval --strict --show-trace >/dev/null \ | ||
--expr '(import <nixpkgs/lib>).fileset.toSource { root = ./.; fileset = ./.; }' | ||
cat "$tmp/stats.json" | ||
} | ||
|
||
echo "Running benchmark on index" >&2 | ||
bench "$nixpkgs" > "$tmp/new.json" | ||
( | ||
echo "Checking out $compareTo" >&2 | ||
git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo" | ||
trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT | ||
echo "Running benchmark on $compareTo" >&2 | ||
bench "$tmp/worktree" > "$tmp/old.json" | ||
) | ||
|
||
declare -a stats=( | ||
".envs.elements" | ||
".envs.number" | ||
".gc.totalBytes" | ||
".list.concats" | ||
".list.elements" | ||
".nrFunctionCalls" | ||
".nrLookups" | ||
".nrOpUpdates" | ||
".nrPrimOpCalls" | ||
".nrThunks" | ||
".sets.elements" | ||
".sets.number" | ||
".symbols.number" | ||
".values.number" | ||
) | ||
|
||
different=0 | ||
for stat in "${stats[@]}"; do | ||
oldValue=$(jq "$stat" "$tmp/old.json") | ||
newValue=$(jq "$stat" "$tmp/new.json") | ||
if (( oldValue != newValue )); then | ||
percent=$(bc <<< "scale=100; result = 100/$oldValue*$newValue; scale=4; result / 1") | ||
if (( oldValue < newValue )); then | ||
echo -e "Statistic $stat ($newValue) is \e[0;31m$percent% (+$(( newValue - oldValue )))\e[0m of the old value $oldValue" >&2 | ||
else | ||
echo -e "Statistic $stat ($newValue) is \e[0;32m$percent% (-$(( oldValue - newValue )))\e[0m of the old value $oldValue" >&2 | ||
fi | ||
(( different++ )) || true | ||
fi | ||
done | ||
echo "$different stats differ between the current tree and $compareTo" |
Oops, something went wrong.