Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib.fileset.union, lib.fileset.unions: init #255025

Merged
merged 17 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/functions/fileset.section.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ File sets are easy and safe to use, providing obvious and composable semantics w
These sections apply to the entire library.
See the [function reference](#sec-functions-library-fileset) for function-specific documentation.

The file set library is currently very limited but is being expanded to include more functions over time.
The file set library is currently somewhat limited but is being expanded to include more functions over time.

## Implicit coercion from paths to file sets {#sec-fileset-path-coercion}

Expand Down
24 changes: 13 additions & 11 deletions lib/fileset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,21 @@ An attribute set with these values:
- `_type` (constant string `"fileset"`):
Tag to indicate this value is a file set.

- `_internalVersion` (constant string equal to the current version):
Version of the representation
- `_internalVersion` (constant `2`, the current version):
Version of the representation.

- `_internalBase` (path):
Any files outside of this path cannot influence the set of files.
This is always a directory.

- `_internalBaseRoot` (path):
The filesystem root of `_internalBase`, same as `(lib.path.splitRoot _internalBase).root`.
This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.

- `_internalBaseComponents` (list of strings):
The path components of `_internalBase`, same as `lib.path.subpath.components (lib.path.splitRoot _internalBase).subpath`.
This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.

- `_internalTree` ([filesetTree](#filesettree)):
A tree representation of all included files under `_internalBase`.

Expand All @@ -59,8 +67,8 @@ An attribute set with these values:
One of the following:

- `{ <name> = filesetTree; }`:
A directory with a nested `filesetTree` value for every directory entry.
Even entries that aren't included are present as `null` because it improves laziness and allows using this as a sort of `builtins.readDir` cache.
A directory with a nested `filesetTree` value for directory entries.
Entries not included may either be omitted or set to `null`, as necessary to improve efficiency or laziness.

- `"directory"`:
A directory with all its files included recursively, allowing early cutoff for some operations.
Expand Down Expand Up @@ -169,15 +177,9 @@ Arguments:
## To update in the future

Here's a list of places in the library that need to be updated in the future:
- > The file set library is currently very limited but is being expanded to include more functions over time.
- > The file set library is currently somewhat limited but is being expanded to include more functions over time.

in [the manual](../../doc/functions/fileset.section.md)
- > Currently the only way to construct file sets is using implicit coercion from paths.

in [the `toSource` reference](./default.nix)
- > For now filesets are always paths

in [the `toSource` implementation](./default.nix), also update the variable name there
- Once a tracing function exists, `__noEval` in [internal.nix](./internal.nix) should mention it
- If/Once a function to convert `lib.sources` values into file sets exists, the `_coerce` and `toSource` functions should be updated to mention that function in the error when such a value is passed
- If/Once a function exists that can optionally include a path depending on whether it exists, the error message for the path not existing in `_coerce` should mention the new function
140 changes: 93 additions & 47 deletions lib/fileset/benchmark.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#!/usr/bin/env bash
#!/usr/bin/env nix-shell
#!nix-shell -i bash -p sta jq bc nix -I nixpkgs=../..
# shellcheck disable=SC2016

# Benchmarks lib.fileset
# Run:
Expand Down Expand Up @@ -28,38 +30,6 @@ work="$tmp/work"
mkdir "$work"
cd "$work"

# Create a fairly populated tree
touch f{0..5}
mkdir d{0..5}
mkdir e{0..5}
touch d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5}

bench() {
NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \
nix-instantiate --eval --strict --show-trace >/dev/null \
--expr '(import <nixpkgs/lib>).fileset.toSource { root = ./.; fileset = ./.; }'
cat "$tmp/stats.json"
}

echo "Running benchmark on index" >&2
bench "$nixpkgs" > "$tmp/new.json"
(
echo "Checking out $compareTo" >&2
git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo"
trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT
echo "Running benchmark on $compareTo" >&2
bench "$tmp/worktree" > "$tmp/old.json"
)

declare -a stats=(
".envs.elements"
".envs.number"
Expand All @@ -77,18 +47,94 @@ declare -a stats=(
".values.number"
)

different=0
for stat in "${stats[@]}"; do
oldValue=$(jq "$stat" "$tmp/old.json")
newValue=$(jq "$stat" "$tmp/new.json")
if (( oldValue != newValue )); then
percent=$(bc <<< "scale=100; result = 100/$oldValue*$newValue; scale=4; result / 1")
if (( oldValue < newValue )); then
echo -e "Statistic $stat ($newValue) is \e[0;31m$percent% (+$(( newValue - oldValue )))\e[0m of the old value $oldValue" >&2
else
echo -e "Statistic $stat ($newValue) is \e[0;32m$percent% (-$(( oldValue - newValue )))\e[0m of the old value $oldValue" >&2
runs=10

run() {
# Empty the file
: > cpuTimes

for i in $(seq 0 "$runs"); do
NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \
nix-instantiate --eval --strict --show-trace >/dev/null \
--expr 'with import <nixpkgs/lib>; with fileset; '"$2"

# Only measure the time after the first run, one is warmup
if (( i > 0 )); then
jq '.cpuTime' "$tmp/stats.json" >> cpuTimes
fi
(( different++ )) || true
fi
done
echo "$different stats differ between the current tree and $compareTo"
done

# Compute mean and standard deviation
read -r mean sd < <(sta --mean --sd --brief <cpuTimes)

jq --argjson mean "$mean" --argjson sd "$sd" \
'.cpuTimeMean = $mean | .cpuTimeSd = $sd' \
"$tmp/stats.json"
}

bench() {
echo "Benchmarking expression $1" >&2
#echo "Running benchmark on index" >&2
run "$nixpkgs" "$1" > "$tmp/new.json"
(
#echo "Checking out $compareTo" >&2
git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo"
trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT
#echo "Running benchmark on $compareTo" >&2
run "$tmp/worktree" "$1" > "$tmp/old.json"
)

read -r oldMean oldSd newMean newSd percentageMean percentageSd < \
<(jq -rn --slurpfile old "$tmp/old.json" --slurpfile new "$tmp/new.json" \
' $old[0].cpuTimeMean as $om
| $old[0].cpuTimeSd as $os
| $new[0].cpuTimeMean as $nm
| $new[0].cpuTimeSd as $ns
| (100 / $om * $nm) as $pm
# Copied from https://github.com/sharkdp/hyperfine/blob/b38d550b89b1dab85139eada01c91a60798db9cc/src/benchmark/relative_speed.rs#L46-L53
| ($pm * pow(pow($ns / $nm; 2) + pow($os / $om; 2); 0.5)) as $ps
| [ $om, $os, $nm, $ns, $pm, $ps ]
| @sh')

echo -e "Mean CPU time $newMean (σ = $newSd) for $runs runs is \e[0;33m$percentageMean% (σ = $percentageSd%)\e[0m of the old value $oldMean (σ = $oldSd)" >&2

different=0
for stat in "${stats[@]}"; do
oldValue=$(jq "$stat" "$tmp/old.json")
newValue=$(jq "$stat" "$tmp/new.json")
if (( oldValue != newValue )); then
percent=$(bc <<< "scale=100; result = 100/$oldValue*$newValue; scale=4; result / 1")
if (( oldValue < newValue )); then
echo -e "Statistic $stat ($newValue) is \e[0;31m$percent% (+$(( newValue - oldValue )))\e[0m of the old value $oldValue" >&2
else
echo -e "Statistic $stat ($newValue) is \e[0;32m$percent% (-$(( oldValue - newValue )))\e[0m of the old value $oldValue" >&2
fi
(( different++ )) || true
fi
done
echo "$different stats differ between the current tree and $compareTo"
echo ""
}

# Create a fairly populated tree
touch f{0..5}
mkdir d{0..5}
mkdir e{0..5}
touch d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5}

bench 'toSource { root = ./.; fileset = ./.; }'

rm -rf -- *

touch {0..1000}
bench 'toSource { root = ./.; fileset = unions (mapAttrsToList (name: value: ./. + "/${name}") (builtins.readDir ./.)); }'
rm -rf -- *
Loading