Skip to content

Commit

Permalink
Merge pull request #255025 from tweag/fileset.union
Browse files Browse the repository at this point in the history
`lib.fileset.union`, `lib.fileset.unions`: init
  • Loading branch information
roberth authored Sep 21, 2023
2 parents f35534c + 94e103e commit 5c97f01
Show file tree
Hide file tree
Showing 6 changed files with 611 additions and 176 deletions.
2 changes: 1 addition & 1 deletion doc/functions/fileset.section.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ File sets are easy and safe to use, providing obvious and composable semantics w
These sections apply to the entire library.
See the [function reference](#sec-functions-library-fileset) for function-specific documentation.

The file set library is currently very limited but is being expanded to include more functions over time.
The file set library is currently somewhat limited but is being expanded to include more functions over time.

## Implicit coercion from paths to file sets {#sec-fileset-path-coercion}

Expand Down
24 changes: 13 additions & 11 deletions lib/fileset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,21 @@ An attribute set with these values:
- `_type` (constant string `"fileset"`):
Tag to indicate this value is a file set.

- `_internalVersion` (constant string equal to the current version):
Version of the representation
- `_internalVersion` (constant `2`, the current version):
Version of the representation.

- `_internalBase` (path):
Any files outside of this path cannot influence the set of files.
This is always a directory.

- `_internalBaseRoot` (path):
The filesystem root of `_internalBase`, same as `(lib.path.splitRoot _internalBase).root`.
This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.

- `_internalBaseComponents` (list of strings):
The path components of `_internalBase`, same as `lib.path.subpath.components (lib.path.splitRoot _internalBase).subpath`.
This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.

- `_internalTree` ([filesetTree](#filesettree)):
A tree representation of all included files under `_internalBase`.

Expand All @@ -59,8 +67,8 @@ An attribute set with these values:
One of the following:

- `{ <name> = filesetTree; }`:
A directory with a nested `filesetTree` value for every directory entry.
Even entries that aren't included are present as `null` because it improves laziness and allows using this as a sort of `builtins.readDir` cache.
A directory with a nested `filesetTree` value for directory entries.
Entries not included may either be omitted or set to `null`, as necessary to improve efficiency or laziness.

- `"directory"`:
A directory with all its files included recursively, allowing early cutoff for some operations.
Expand Down Expand Up @@ -169,15 +177,9 @@ Arguments:
## To update in the future

Here's a list of places in the library that need to be updated in the future:
- > The file set library is currently very limited but is being expanded to include more functions over time.
- > The file set library is currently somewhat limited but is being expanded to include more functions over time.
in [the manual](../../doc/functions/fileset.section.md)
- > Currently the only way to construct file sets is using implicit coercion from paths.
in [the `toSource` reference](./default.nix)
- > For now filesets are always paths
in [the `toSource` implementation](./default.nix), also update the variable name there
- Once a tracing function exists, `__noEval` in [internal.nix](./internal.nix) should mention it
- If/Once a function to convert `lib.sources` values into file sets exists, the `_coerce` and `toSource` functions should be updated to mention that function in the error when such a value is passed
- If/Once a function exists that can optionally include a path depending on whether it exists, the error message for the path not existing in `_coerce` should mention the new function
140 changes: 93 additions & 47 deletions lib/fileset/benchmark.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#!/usr/bin/env bash
#!/usr/bin/env nix-shell
#!nix-shell -i bash -p sta jq bc nix -I nixpkgs=../..
# shellcheck disable=SC2016

# Benchmarks lib.fileset
# Run:
Expand Down Expand Up @@ -28,38 +30,6 @@ work="$tmp/work"
mkdir "$work"
cd "$work"

# Create a fairly populated tree
touch f{0..5}
mkdir d{0..5}
mkdir e{0..5}
touch d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5}

bench() {
NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \
nix-instantiate --eval --strict --show-trace >/dev/null \
--expr '(import <nixpkgs/lib>).fileset.toSource { root = ./.; fileset = ./.; }'
cat "$tmp/stats.json"
}

echo "Running benchmark on index" >&2
bench "$nixpkgs" > "$tmp/new.json"
(
echo "Checking out $compareTo" >&2
git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo"
trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT
echo "Running benchmark on $compareTo" >&2
bench "$tmp/worktree" > "$tmp/old.json"
)

declare -a stats=(
".envs.elements"
".envs.number"
Expand All @@ -77,18 +47,94 @@ declare -a stats=(
".values.number"
)

different=0
for stat in "${stats[@]}"; do
oldValue=$(jq "$stat" "$tmp/old.json")
newValue=$(jq "$stat" "$tmp/new.json")
if (( oldValue != newValue )); then
percent=$(bc <<< "scale=100; result = 100/$oldValue*$newValue; scale=4; result / 1")
if (( oldValue < newValue )); then
echo -e "Statistic $stat ($newValue) is \e[0;31m$percent% (+$(( newValue - oldValue )))\e[0m of the old value $oldValue" >&2
else
echo -e "Statistic $stat ($newValue) is \e[0;32m$percent% (-$(( oldValue - newValue )))\e[0m of the old value $oldValue" >&2
runs=10

run() {
# Empty the file
: > cpuTimes

for i in $(seq 0 "$runs"); do
NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \
nix-instantiate --eval --strict --show-trace >/dev/null \
--expr 'with import <nixpkgs/lib>; with fileset; '"$2"

# Only measure the time after the first run, one is warmup
if (( i > 0 )); then
jq '.cpuTime' "$tmp/stats.json" >> cpuTimes
fi
(( different++ )) || true
fi
done
echo "$different stats differ between the current tree and $compareTo"
done

# Compute mean and standard deviation
read -r mean sd < <(sta --mean --sd --brief <cpuTimes)

jq --argjson mean "$mean" --argjson sd "$sd" \
'.cpuTimeMean = $mean | .cpuTimeSd = $sd' \
"$tmp/stats.json"
}

bench() {
echo "Benchmarking expression $1" >&2
#echo "Running benchmark on index" >&2
run "$nixpkgs" "$1" > "$tmp/new.json"
(
#echo "Checking out $compareTo" >&2
git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo"
trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT
#echo "Running benchmark on $compareTo" >&2
run "$tmp/worktree" "$1" > "$tmp/old.json"
)

read -r oldMean oldSd newMean newSd percentageMean percentageSd < \
<(jq -rn --slurpfile old "$tmp/old.json" --slurpfile new "$tmp/new.json" \
' $old[0].cpuTimeMean as $om
| $old[0].cpuTimeSd as $os
| $new[0].cpuTimeMean as $nm
| $new[0].cpuTimeSd as $ns
| (100 / $om * $nm) as $pm
# Copied from https://github.com/sharkdp/hyperfine/blob/b38d550b89b1dab85139eada01c91a60798db9cc/src/benchmark/relative_speed.rs#L46-L53
| ($pm * pow(pow($ns / $nm; 2) + pow($os / $om; 2); 0.5)) as $ps
| [ $om, $os, $nm, $ns, $pm, $ps ]
| @sh')

echo -e "Mean CPU time $newMean (σ = $newSd) for $runs runs is \e[0;33m$percentageMean% (σ = $percentageSd%)\e[0m of the old value $oldMean (σ = $oldSd)" >&2

different=0
for stat in "${stats[@]}"; do
oldValue=$(jq "$stat" "$tmp/old.json")
newValue=$(jq "$stat" "$tmp/new.json")
if (( oldValue != newValue )); then
percent=$(bc <<< "scale=100; result = 100/$oldValue*$newValue; scale=4; result / 1")
if (( oldValue < newValue )); then
echo -e "Statistic $stat ($newValue) is \e[0;31m$percent% (+$(( newValue - oldValue )))\e[0m of the old value $oldValue" >&2
else
echo -e "Statistic $stat ($newValue) is \e[0;32m$percent% (-$(( oldValue - newValue )))\e[0m of the old value $oldValue" >&2
fi
(( different++ )) || true
fi
done
echo "$different stats differ between the current tree and $compareTo"
echo ""
}

# Create a fairly populated tree
touch f{0..5}
mkdir d{0..5}
mkdir e{0..5}
touch d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5}

bench 'toSource { root = ./.; fileset = ./.; }'

rm -rf -- *

touch {0..1000}
bench 'toSource { root = ./.; fileset = unions (mapAttrsToList (name: value: ./. + "/${name}") (builtins.readDir ./.)); }'
rm -rf -- *
Loading

0 comments on commit 5c97f01

Please sign in to comment.