Skip to content

Commit

Permalink
feat: track compressed size & compare to parquet(zstd)? & canonical (#…
Browse files Browse the repository at this point in the history
…882)

We now track these six values:

1. Compression time (s).
2. Compression throughput (bytes/s).
3. Compressed size (bytes).
4. Compressed size as fraction of a Vortex Canonical array.
5. Compressed Layout size as fraction of Parquet without block
compression.
6. Compressed Layout size as fraction of Parquet with Zstd.

It's a bit janky: I just unconditionally compute these values for
several datasets. I couldn't figure out how to ask criterion which
benchmark regex is currently in use so, for example, `cargo bench taxi`
will still run all the size benchmarks for every other dataset.

I also had to do some janky jq parsing to convert from Criterion's JSON
output to the style expected by the benchmark-action GitHub action that
we use.

Nevertheless, now, for each commit to `develop`, we should get all six
numbers for the Taxi, Airline Sentiment, Arade, Bimbo, CMSprovider,
Euro2016, Food, HashTags, and TPC-H l_comment datasets. They'll be
displayed under [Vortex

Compression](https://spiraldb.github.io/vortex/dev/bench/#Vortex_Compression)
at the benchmarks site.

I might need to delete some old data form the gh-pages-bench branch
since I changed some benchmark names, but after a few commits, those
plots should become useful measures of our compression performance in
space and time.
  • Loading branch information
danking authored Sep 20, 2024
1 parent 3194009 commit a87c720
Show file tree
Hide file tree
Showing 11 changed files with 364 additions and 109 deletions.
25 changes: 22 additions & 3 deletions .github/workflows/bench-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,36 @@ jobs:

- name: Run benchmark
shell: bash
run: cargo bench --bench ${{ matrix.benchmark.id }} -- --output-format bencher | tee ${{ matrix.benchmark.id }}.txt
run: |
cargo install cargo-criterion
cargo criterion --bench ${{ matrix.benchmark.id }} --message-format=json 2>&1 | tee out.json
cat out.json
sudo apt-get update && sudo apt-get install -y jq
jq --raw-input --compact-output '
fromjson?
| [ (if .mean != null then {name: .id, value: .mean.estimate, unit: .unit, range: ((.mean.upper_bound - .mean.lower_bound) / 2) } else {} end),
(if .throughput != null then {name: (.id + " throughput"), value: .throughput[].per_iteration, unit: .throughput[].unit, range: 0} else {} end),
{name, value, unit, range} ]
| .[]
| select(.value != null)
' \
out.json \
| jq --slurp --compact-output '.' >${{ matrix.benchmark.id }}.json
cat ${{ matrix.benchmark.id }}.json
- name: Store benchmark result
if: '!cancelled()'
uses: benchmark-action/github-action-benchmark@v1
with:
name: ${{ matrix.benchmark.name }}
tool: 'cargo'
tool: 'customSmallerIsBetter'
gh-pages-branch: gh-pages-bench
github-token: ${{ secrets.GITHUB_TOKEN }}
output-file-path: ${{ matrix.benchmark.id }}.txt
output-file-path: ${{ matrix.benchmark.id }}.json
summary-always: true
comment-always: true
auto-push: false
Expand Down
27 changes: 23 additions & 4 deletions .github/workflows/bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,36 @@ jobs:

- name: Run benchmark
shell: bash
run: cargo bench --bench ${{ matrix.version.id }} -- --output-format bencher | tee ${{ matrix.version.id }}.txt
run: |
cargo install cargo-criterion
cargo criterion --bench ${{ matrix.benchmark.id }} --message-format=json 2>&1 | tee out.json
cat out.json
sudo apt-get update && sudo apt-get install -y jq
jq --raw-input --compact-output '
fromjson?
| [ (if .mean != null then {name: .id, value: .mean.estimate, unit: .unit, range: ((.mean.upper_bound - .mean.lower_bound) / 2) } else {} end),
(if .throughput != null then {name: (.id + " throughput"), value: .throughput[].per_iteration, unit: .throughput[].unit, range: 0} else {} end),
{name, value, unit, range} ]
| .[]
| select(.value != null)
' \
out.json \
| jq --slurp --compact-output '.' >${{ matrix.benchmark.id }}.json
cat ${{ matrix.benchmark.id }}.json
- name: Store benchmark result
if: '!cancelled()'
uses: benchmark-action/github-action-benchmark@v1
with:
name: ${{ matrix.version.name }}
tool: 'cargo'
name: ${{ matrix.benchmark.name }}
tool: 'customSmallerIsBetter'
gh-pages-branch: gh-pages-bench
github-token: ${{ secrets.GITHUB_TOKEN }}
output-file-path: ${{ matrix.version.id }}.txt
output-file-path: ${{ matrix.benchmark.id }}.json
summary-always: true
auto-push: true
fail-on-alert: false
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion bench-vortex/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1 @@
data
data
1 change: 1 addition & 0 deletions bench-vortex/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ rand = { workspace = true }
rayon = { workspace = true }
reqwest = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
simplelog = { workspace = true }
tar = { workspace = true }
tokio = { workspace = true, features = ["full"] }
Expand Down
Loading

0 comments on commit a87c720

Please sign in to comment.