feat: track compressed size & compare to parquet(zstd)? & canonical #882

danking · 2024-09-19T21:34:31Z

We now track these six values:

Compression time (s).
Compression throughput (bytes/s).
Compressed size (bytes).
Compressed size as fraction of a Vortex Canonical array.
Compressed Layout size as fraction of Parquet without block compression.
Compressed Layout size as fraction of Parquet with Zstd.

It's a bit janky: I just unconditionally compute these values for several datasets. I couldn't figure out how to ask criterion which benchmark regex is currently in use so, for example, cargo bench taxi will still run all the size benchmarks for every other dataset.

I also had to do some janky jq parsing to convert from Criterion's JSON output to the style expected by the benchmark-action GitHub action that we use.

Nevertheless, now, for each commit to develop, we should get all six numbers for the Taxi, Airline Sentiment, Arade, Bimbo, CMSprovider, Euro2016, Food, HashTags, and TPC-H l_comment datasets. They'll be displayed under Vortex
Compression at the benchmarks site.

I might need to delete some old data form the gh-pages-bench branch since I changed some benchmark names, but after a few commits, those plots should become useful measures of our compression performance in space and time.

We now track these six values: 1. Compression time (s). 2. Compression throughput (bytes/s). 3. Compressed size (bytes). 4. Compressed size as fraction of a Vortex Canonical array. 5. Compressed Layout size as fraction of Parquet without block compression. 6. Compressed Layout size as fraction of Parquet with Zstd. It's a bit janky: I just unconditionally compute these values for several datasets. I couldn't figure out how to ask criterion which benchmark regex is currently in use so, for example, `cargo bench taxi` will still run all the size benchmarks for every other dataset. I also had to do some janky jq parsing to convert from Criterion's JSON output to the style expected by the benchmark-action GitHub action that we use. Nevertheless, now, for each commit to `develop`, we should get all six numbers for the Taxi, Airline Sentiment, Arade, Bimbo, CMSprovider, Euro2016, Food, HashTags, and TPC-H l_comment datasets. They'll be displayed under [Vortex Compression](https://spiraldb.github.io/vortex/dev/bench/#Vortex_Compression) at the benchmarks site. I might need to delete some old data form the gh-pages-bench branch since I changed some benchmark names, but after a few commits, those plots should become useful measures of our compression performance in space and time.

github-actions

Vortex bytes_at

Benchmark suite	Current: `615466f`	Previous: `a96ff2c`	Ratio
`bytes_at/array_data`	`609.7744546778674` ns (`0.13420378286656387`)	`613` ns/iter (`± 8`)	`0.99`
`bytes_at/array_data #2`	`1039.372483865207` ns (`0.5307530144946213`)	`1043` ns/iter (`± 4`)	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions

Vortex random_access

Benchmark suite	Current: `615466f`	Previous: `a96ff2c`	Ratio
`vortex/tokio local disk`	`1245366.8620128417` ns (`4668.183183866553`)	`1308917` ns/iter (`± 29650`)	`0.95`
`vortex/localfs`	`1403735.5290472142` ns (`4471.246426050318`)	`1457592` ns/iter (`± 32225`)	`0.96`
`parquet/tokio local disk`	`194141199.46666664` ns (`2304201.5900000036`)	`178158170` ns/iter (`± 2466099`)	`1.09`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions

Vortex DataFusion

Benchmark suite	Current: `615466f`	Previous: `a96ff2c`	Ratio
`arrow/planning`	`816320.3626479161` ns (`2024.9349698948208`)	`813880` ns/iter (`± 4517`)	`1.00`
`arrow/exec`	`1771804.0215798502` ns (`10301.68340335507`)	`1774262` ns/iter (`± 18680`)	`1.00`
`vortex-pushdown-compressed/planning`	`515887.4864556427` ns (`1043.680268734548`)	`516095` ns/iter (`± 1831`)	`1.00`
`vortex-pushdown-compressed/exec`	`3078940.8429411757` ns (`2820.7001838223077`)	`3209669` ns/iter (`± 141970`)	`0.96`
`vortex-pushdown-uncompressed/planning`	`521224.8686081361` ns (`5009.368536001508`)	`514579` ns/iter (`± 1971`)	`1.01`
`vortex-pushdown-uncompressed/exec`	`2937165.6294444446` ns (`2031.9434375003912`)	`3336867` ns/iter (`± 9294`)	`0.88`
`vortex-nopushdown-compressed/planning`	`716020.948887611` ns (`384.9729955360526`)	`710291` ns/iter (`± 5322`)	`1.01`
`vortex-nopushdown-compressed/exec`	`8489314.503333332` ns (`73138.21887499839`)	`14988542` ns/iter (`± 251952`)	`0.57`
`vortex-nopushdown-uncompressed/planning`	`715991.6986297732` ns (`370.5863519538543`)	`715546` ns/iter (`± 3807`)	`1.00`
`vortex-nopushdown-uncompressed/exec`	`2007280.3079999995` ns (`1379.8270149999298`)	`2001661` ns/iter (`± 82038`)	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

robert3005 · 2024-09-19T21:50:48Z

.github/workflows/bench-pr.yml

+          cargo criterion --bench ${{ matrix.benchmark.id }} --message-format=json 2>&1 | tee out.json
+
+          cat out.json
+
+          sudo apt-get update && sudo apt-get install -y jq
+
+          jq --raw-input --compact-output '
+                 fromjson?
+                 | [ (if .mean != null then {name: .id, value: .mean.estimate, unit: .unit, range: ((.mean.upper_bound - .mean.lower_bound) / 2) } else {} end),
+                     (if .throughput != null then {name: (.id + " throughput"), value: .throughput[].per_iteration, unit: .throughput[].unit, range: 0} else {} end),
+                     {name, value, unit, range} ]
+                 | .[]
+                 | select(.value != null)
+              ' \
+              out.json \
+              | jq --slurp --compact-output '.' >${{ matrix.benchmark.id }}.json
+
+          cat ${{ matrix.benchmark.id }}.json


This is a bit excessive. I wonder if this is simpler if we write our own github action

Agreed, I think my preferred solution is either a CSV or a JSON Line file that we just append to.

github-actions

Vortex Compression

Benchmark suite	Current: `615466f`	Previous: `a96ff2c`	Ratio
`Yellow Taxi Trip Data Compression Time/taxi compression`	`2513565430.2` ns (`10122599.299999952`)
`Yellow Taxi Trip Data Compression Time/taxi compression throughput`	`470808924` bytes
`Yellow Taxi Trip Data Vortex-to-ParquetZstd Ratio/taxi`	`0.9560604643330857` ratio
`Yellow Taxi Trip Data Vortex-to-ParquetUncompressed Ratio/taxi`	`0.6137144059032362` ratio
`Yellow Taxi Trip Data Compression Ratio/taxi`	`0.10783895846460209` ratio
`Yellow Taxi Trip Data Compression Size/taxi`	`50771544` bytes
`Public BI Compression Time/AirlineSentiment compression`	`415039.5464639052` ns (`491.88687187436153`)
`Public BI Compression Time/AirlineSentiment compression throughput`	`2020` bytes
`Public BI Vortex-to-ParquetZstd Ratio/AirlineSentiment`	`6.400830737279335` ratio
`Public BI Vortex-to-ParquetUncompressed Ratio/AirlineSentiment`	`4.353107344632768` ratio
`Public BI Compression Ratio/AirlineSentiment`	`0.6207920792079208` ratio
`Public BI Compression Size/AirlineSentiment`	`1254` bytes
`Public BI Compression Time/Arade compression`	`3131902697.3` ns (`6480990.841249704`)
`Public BI Compression Time/Arade compression throughput`	`787023760` bytes
`Public BI Vortex-to-ParquetZstd Ratio/Arade`	`0.4927803394425952` ratio
`Public BI Vortex-to-ParquetUncompressed Ratio/Arade`	`0.4398463104814441` ratio
`Public BI Compression Ratio/Arade`	`0.1862664667201407` ratio
`Public BI Compression Size/Arade`	`146596135` bytes
`Public BI Compression Time/Bimbo compression`	`21855721191.1` ns (`20350694.538749695`)
`Public BI Compression Time/Bimbo compression throughput`	`7121333608` bytes
`Public BI Vortex-to-ParquetZstd Ratio/Bimbo`	`1.293293825007246` ratio
`Public BI Vortex-to-ParquetUncompressed Ratio/Bimbo`	`0.8768962136437118` ratio
`Public BI Compression Ratio/Bimbo`	`0.06423232573827764` ratio
`Public BI Compression Size/Bimbo`	`457419820` bytes
`Public BI Compression Time/CMSprovider compression`	`12917920336.7` ns (`26398202.20625019`)
`Public BI Compression Time/CMSprovider compression throughput`	`5149123964` bytes
`Public BI Vortex-to-ParquetZstd Ratio/CMSprovider`	`1.2021505846266516` ratio
`Public BI Vortex-to-ParquetUncompressed Ratio/CMSprovider`	`0.7762200888869946` ratio
`Public BI Compression Ratio/CMSprovider`	`0.17574964310958274` ratio
`Public BI Compression Size/CMSprovider`	`904956699` bytes
`Public BI Compression Time/Euro2016 compression`	`2219852099` ns (`15588005.231250286`)
`Public BI Compression Time/Euro2016 compression throughput`	`393253221` bytes
`Public BI Vortex-to-ParquetZstd Ratio/Euro2016`	`1.4705138909171633` ratio
`Public BI Vortex-to-ParquetUncompressed Ratio/Euro2016`	`0.6239071488283204` ratio
`Public BI Compression Ratio/Euro2016`	`0.43458292742120985` ratio
`Public BI Compression Size/Euro2016`	`170901136` bytes
`Public BI Compression Time/Food compression`	`1095478080.3` ns (`3527534.875`)
`Public BI Compression Time/Food compression throughput`	`332718229` bytes
`Public BI Vortex-to-ParquetZstd Ratio/Food`	`1.2297872376838528` ratio
`Public BI Vortex-to-ParquetUncompressed Ratio/Food`	`0.6953516685794864` ratio
`Public BI Compression Ratio/Food`	`0.13031750959458252` ratio
`Public BI Compression Size/Food`	`43359011` bytes
`Public BI Compression Time/HashTags compression`	`2930012702.6` ns (`17763756.576250076`)
`Public BI Compression Time/HashTags compression throughput`	`804495592` bytes
`Public BI Vortex-to-ParquetZstd Ratio/HashTags`	`1.6464093663569246` ratio
`Public BI Vortex-to-ParquetUncompressed Ratio/HashTags`	`0.4680774335616459` ratio
`Public BI Compression Ratio/HashTags`	`0.2652765038394393` ratio
`Public BI Compression Size/HashTags`	`213413778` bytes
`TPC-H l_comment Compression Time/chunked-without-fsst compression`	`187786756.78414685` ns (`925951.3523437679`)
`TPC-H l_comment Compression Time/chunked-without-fsst compression throughput`	`183010921` bytes
`TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-without-fsst`	`3.2154759555157804` ratio
`TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-without-fsst`	`0.9983658315767541` ratio
`TPC-H l_comment Compression Ratio/chunked-without-fsst`	`0.999965750677797` ratio
`TPC-H l_comment Compression Size/chunked-without-fsst`	`183004653` bytes
`TPC-H l_comment Compression Time/chunked-with-fsst compression`	`1134202541.95` ns (`2623869.8625000715`)
`TPC-H l_comment Compression Time/chunked-with-fsst compression throughput`	`183010921` bytes
`TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-with-fsst`	`1.504212244020189` ratio
`TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-with-fsst`	`0.4670394456823924` ratio
`TPC-H l_comment Compression Ratio/chunked-with-fsst`	`0.442999322428414` ratio
`TPC-H l_comment Compression Size/chunked-with-fsst`	`81073714` bytes
`TPC-H l_comment Compression Time/canonical-with-fsst compression`	`1131178437.95` ns (`761932.415624857`)
`TPC-H l_comment Compression Time/canonical-with-fsst compression throughput`	`183010937` bytes
`TPC-H l_comment Vortex-to-ParquetZstd Ratio/canonical-with-fsst`	`1.5059821792895995` ratio
`TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/canonical-with-fsst`	`0.46759301141944365` ratio
`TPC-H l_comment Compression Ratio/canonical-with-fsst`	`0.44354151358724536` ratio
`TPC-H l_comment Compression Size/canonical-with-fsst`	`81172948` bytes

This comment was automatically generated by workflow using github-action-benchmark.

robert3005 · 2024-09-19T22:02:35Z

Also can we not run benchmarks on every pr? I think label would be enough and then on every develop commit? It seems like a lot to run for every commit

lwwmanning · 2024-09-19T22:09:09Z

Also can we not run benchmarks on every pr? I think label would be enough and then on every develop commit? It seems like a lot to run for every commit

Am I missing something? This PR doesn't make it run on every PR...?

robert3005 · 2024-09-19T22:11:54Z

This pr doesn’t but they currently do run. This was mostly since we are making benchmark changes we should change that

github-actions

Vortex benchmarks

Benchmark suite	Current: `615466f`	Previous: `a96ff2c`	Ratio
`tpch_q1/vortex-in-memory-no-pushdown`	`464843920.35` ns (`2703997.192499995`)	`456752113` ns/iter (`± 3867547`)	`1.02`
`tpch_q1/vortex-in-memory-pushdown`	`532535430.5` ns (`1441835.3449999988`)	`532735558` ns/iter (`± 1607247`)	`1.00`
`tpch_q1/arrow`	`448641396.35` ns (`983773.7437499762`)	`443097274` ns/iter (`± 626790`)	`1.01`
`tpch_q1/parquet`	`651934200.8` ns (`1167224.8499999642`)	`653884935` ns/iter (`± 2663691`)	`1.00`
`tpch_q1/vortex-file-compressed`	`631224996.7` ns (`1159530.3299999833`)	`625869224` ns/iter (`± 2721859`)	`1.01`
`tpch_q1/vortex-file-uncompressed`	`636241881.9` ns (`2468183.79125005`)	`631514808` ns/iter (`± 9245882`)	`1.01`
`tpch_q2/vortex-in-memory-no-pushdown`	`146135948.23503968` ns (`596119.4463889003`)	`146858416` ns/iter (`± 2531945`)	`1.00`
`tpch_q2/vortex-in-memory-pushdown`	`144400001.83011904` ns (`438025.6820833236`)	`143317300` ns/iter (`± 2886213`)	`1.01`
`tpch_q2/arrow`	`122211692.76773807` ns (`168044.8480624929`)	`122568135` ns/iter (`± 397231`)	`1.00`
`tpch_q2/parquet`	`161668685.44722223` ns (`492714.924999997`)	`159787707` ns/iter (`± 5185870`)	`1.01`
`tpch_q2/vortex-file-compressed`	`156831594.42023808` ns (`853652.4739612937`)	`156466031` ns/iter (`± 1277797`)	`1.00`
`tpch_q2/vortex-file-uncompressed`	`166610068.98972222` ns (`501684.8970833421`)	`162139847` ns/iter (`± 3160753`)	`1.03`
`tpch_q3/vortex-in-memory-no-pushdown`	`152064205.7482143` ns (`344621.85196428`)	`153229410` ns/iter (`± 1562047`)	`0.99`
`tpch_q3/vortex-in-memory-pushdown`	`186610418.2` ns (`955602.7458333224`)	`186029987` ns/iter (`± 1669213`)	`1.00`
`tpch_q3/arrow`	`148540561.55555555` ns (`353743.82150000334`)	`147328590` ns/iter (`± 1417356`)	`1.01`
`tpch_q3/parquet`	`343069328.65` ns (`1382273.2912499905`)	`336975915` ns/iter (`± 2025628`)	`1.02`
`tpch_q3/vortex-file-compressed`	`311862426.95` ns (`998395.625`)	`309258408` ns/iter (`± 2757350`)	`1.01`
`tpch_q3/vortex-file-uncompressed`	`380565568.2` ns (`1064147.7043749988`)	`375568417` ns/iter (`± 4807910`)	`1.01`
`tpch_q4/vortex-in-memory-no-pushdown`	`109463963.10416666` ns (`575272.8522916734`)	`106376311` ns/iter (`± 304925`)	`1.03`
`tpch_q4/vortex-in-memory-pushdown`	`144525012.9798016` ns (`331599.61648860574`)	`141994138` ns/iter (`± 1248613`)	`1.02`
`tpch_q4/arrow`	`102683885.50023809` ns (`381721.1955833286`)	`101065390` ns/iter (`± 370932`)	`1.02`
`tpch_q4/parquet`	`219624434.13333336` ns (`916865.9166666567`)	`214248841` ns/iter (`± 2228222`)	`1.03`
`tpch_q4/vortex-file-compressed`	`275844832.4` ns (`779808.6381250024`)	`262684851` ns/iter (`± 1417453`)	`1.05`
`tpch_q4/vortex-file-uncompressed`	`322938093.15` ns (`2089683.4512500167`)	`322371836` ns/iter (`± 4220226`)	`1.00`
`tpch_q5/vortex-in-memory-no-pushdown`	`296875979.6` ns (`1442079.2993750274`)	`296691840` ns/iter (`± 6168057`)	`1.00`
`tpch_q5/vortex-in-memory-pushdown`	`311055426.75` ns (`2625022.0974999964`)	`321813589` ns/iter (`± 5635245`)	`0.97`
`tpch_q5/arrow`	`301113685` ns (`1192929.6487500072`)	`289107585` ns/iter (`± 2924141`)	`1.04`
`tpch_q5/parquet`	`463548366.3` ns (`1005448.7349999845`)	`449018047` ns/iter (`± 2489275`)	`1.03`
`tpch_q5/vortex-file-compressed`	`342553821.15` ns (`1811957.981249988`)	`341880037` ns/iter (`± 8522037`)	`1.00`
`tpch_q5/vortex-file-uncompressed`	`361094967.65` ns (`1257259.4193750024`)	`356647316` ns/iter (`± 5958948`)	`1.01`
`tpch_q6/vortex-in-memory-no-pushdown`	`38618631.72686508` ns (`166164.57365079597`)	`40138218` ns/iter (`± 630500`)	`0.96`
`tpch_q6/vortex-in-memory-pushdown`	`92286594.83333334` ns (`140081.3341666609`)	`92149267` ns/iter (`± 303889`)	`1.00`
`tpch_q6/arrow`	`36310182.61406084` ns (`165533.02780538797`)	`36334469` ns/iter (`± 211591`)	`1.00`
`tpch_q6/parquet`	`154528287.31761903` ns (`505493.8115416467`)	`151921473` ns/iter (`± 1264234`)	`1.02`
`tpch_q6/vortex-file-compressed`	`80680396.58406746` ns (`245124.53606721014`)	`78859071` ns/iter (`± 1115685`)	`1.02`
`tpch_q6/vortex-file-uncompressed`	`167328617.76924604` ns (`1367708.262455359`)	`167141882` ns/iter (`± 1751525`)	`1.00`
`tpch_q7/vortex-in-memory-no-pushdown`	`568360396.5` ns (`1364090.5849999785`)	`562119306` ns/iter (`± 3476977`)	`1.01`
`tpch_q7/vortex-in-memory-pushdown`	`632136235.7` ns (`1477559.2749999762`)	`611059188` ns/iter (`± 6446587`)	`1.03`
`tpch_q7/arrow`	`573491795.9` ns (`1734030.0337500572`)	`553024994` ns/iter (`± 2909226`)	`1.04`
`tpch_q7/parquet`	`733469206.1` ns (`3208002.5250000358`)	`710209548` ns/iter (`± 5017550`)	`1.03`
`tpch_q7/vortex-file-compressed`	`682726773.6` ns (`2675624.6862499714`)	`672453257` ns/iter (`± 5566775`)	`1.02`
`tpch_q7/vortex-file-uncompressed`	`759621166.1` ns (`3345556.2650000453`)	`744071550` ns/iter (`± 5659596`)	`1.02`
`tpch_q8/vortex-in-memory-no-pushdown`	`217474477.0333333` ns (`868808.4662500024`)	`216237880` ns/iter (`± 504152`)	`1.01`
`tpch_q8/vortex-in-memory-pushdown`	`234419944.0333333` ns (`589083.5312500149`)	`230296027` ns/iter (`± 963193`)	`1.02`
`tpch_q8/arrow`	`220631285.26666665` ns (`352138.8029166907`)	`215487494` ns/iter (`± 822806`)	`1.02`
`tpch_q8/parquet`	`494087190.7` ns (`937395.3787499964`)	`482558982` ns/iter (`± 1927968`)	`1.02`
`tpch_q8/vortex-file-compressed`	`264829153.8` ns (`518212.8099999875`)	`272225347` ns/iter (`± 3218905`)	`0.97`
`tpch_q8/vortex-file-uncompressed`	`297438551.9` ns (`3614358.862499982`)	`307092746` ns/iter (`± 4647118`)	`0.97`
`tpch_q9/vortex-in-memory-no-pushdown`	`412446957.85` ns (`953962.8081250191`)	`405778945` ns/iter (`± 3408198`)	`1.02`
`tpch_q9/vortex-in-memory-pushdown`	`414989723.45` ns (`1026939.4056250155`)	`409784837` ns/iter (`± 8637477`)	`1.01`
`tpch_q9/arrow`	`403587610.3` ns (`1367421.574999988`)	`400998246` ns/iter (`± 7870465`)	`1.01`
`tpch_q9/parquet`	`716911614.5` ns (`2406505.4037500024`)	`687723525` ns/iter (`± 2769586`)	`1.04`
`tpch_q9/vortex-file-compressed`	`464833005.2` ns (`957100.875`)	`449724976` ns/iter (`± 6436082`)	`1.03`
`tpch_q9/vortex-file-uncompressed`	`490149814.3` ns (`1457817.625`)	`482884495` ns/iter (`± 6141352`)	`1.02`
`tpch_q10/vortex-in-memory-no-pushdown`	`228002276.6` ns (`483031.4987499863`)	`224740155` ns/iter (`± 1207852`)	`1.01`
`tpch_q10/vortex-in-memory-pushdown`	`266292655.8` ns (`608651.8568750024`)	`265222009` ns/iter (`± 4544285`)	`1.00`
`tpch_q10/arrow`	`225509220.8666667` ns (`501292.3099999577`)	`219076345` ns/iter (`± 7024828`)	`1.03`
`tpch_q10/parquet`	`478092539.45` ns (`1642956.1399999857`)	`481426698` ns/iter (`± 4462077`)	`0.99`
`tpch_q10/vortex-file-compressed`	`473475771.1` ns (`706985.8381249905`)	`474019593` ns/iter (`± 4032038`)	`1.00`
`tpch_q10/vortex-file-uncompressed`	`407417004.75` ns (`988742.6606250107`)	`408859777` ns/iter (`± 3938984`)	`1.00`
`tpch_q11/vortex-in-memory-no-pushdown`	`224700780.53333336` ns (`467540.4262499958`)	`219129162` ns/iter (`± 1832776`)	`1.03`
`tpch_q11/vortex-in-memory-pushdown`	`225753908.73333335` ns (`1244039.6466666758`)	`220793553` ns/iter (`± 918518`)	`1.02`
`tpch_q11/arrow`	`177459193.11484125` ns (`327176.177959308`)	`175455464` ns/iter (`± 1125682`)	`1.01`
`tpch_q11/parquet`	`191560226.3` ns (`932539.5670833439`)	`185576140` ns/iter (`± 2442270`)	`1.03`
`tpch_q11/vortex-file-compressed`	`230677296.2333333` ns (`574339.2933333367`)	`229509379` ns/iter (`± 1644525`)	`1.01`
`tpch_q11/vortex-file-uncompressed`	`239172281.3666667` ns (`1564077.60041669`)	`232738732` ns/iter (`± 1873655`)	`1.03`
`tpch_q12/vortex-in-memory-no-pushdown`	`181748026.5152381` ns (`119161.99357143044`)	`179897756` ns/iter (`± 1962967`)	`1.01`
`tpch_q12/vortex-in-memory-pushdown`	`269784045.6` ns (`167057.17499998212`)	`268815014` ns/iter (`± 1804665`)	`1.00`
`tpch_q12/arrow`	`171955794.4640476` ns (`166796.68347024918`)	`170395809` ns/iter (`± 844024`)	`1.01`
`tpch_q12/parquet`	`365822822` ns (`725210.1762500107`)	`365760882` ns/iter (`± 5113024`)	`1.00`
`tpch_q12/vortex-file-compressed`	`613578776.5` ns (`2318134.6500000358`)	`611089999` ns/iter (`± 3516355`)	`1.00`
`tpch_q12/vortex-file-uncompressed`	`366636327.7` ns (`477821.15125000477`)	`363970552` ns/iter (`± 2594091`)	`1.01`
`tpch_q13/vortex-in-memory-no-pushdown`	`190910588.36666667` ns (`1207546.7833333164`)	`171007772` ns/iter (`± 4051193`)	`1.12`
`tpch_q13/vortex-in-memory-pushdown`	`186951125.5` ns (`1458854.6433333158`)	`169154998` ns/iter (`± 6178477`)	`1.11`
`tpch_q13/arrow`	`181922969.06813493` ns (`2688358.8900689334`)	`179394695` ns/iter (`± 11628817`)	`1.01`
`tpch_q13/parquet`	`335171461.75` ns (`1085733.2525000274`)	`343672528` ns/iter (`± 12687514`)	`0.98`
`tpch_q13/vortex-file-compressed`	`219688230.26666665` ns (`485762.50458332896`)	`221913395` ns/iter (`± 3831446`)	`0.99`
`tpch_q13/vortex-file-uncompressed`	`223551243.83333334` ns (`1028771.7762500048`)	`212642135` ns/iter (`± 1772424`)	`1.05`
`tpch_q14/vortex-in-memory-no-pushdown`	`39528689.122539684` ns (`129273.55500794202`)	`37691250` ns/iter (`± 577567`)	`1.05`
`tpch_q14/vortex-in-memory-pushdown`	`88687423.32335317` ns (`166015.09192808717`)	`90997812` ns/iter (`± 1433438`)	`0.97`
`tpch_q14/arrow`	`41340823.23357143` ns (`99626.081876982`)	`39535216` ns/iter (`± 535028`)	`1.05`
`tpch_q14/parquet`	`226617225.69999996` ns (`592733.4029166698`)	`227708765` ns/iter (`± 1731670`)	`1.00`
`tpch_q14/vortex-file-compressed`	`91236196.1070238` ns (`300096.2808660716`)	`90305017` ns/iter (`± 631935`)	`1.01`
`tpch_q14/vortex-file-uncompressed`	`146631346.60845238` ns (`509835.0686994046`)	`144470755` ns/iter (`± 708935`)	`1.01`
`tpch_q15/vortex-in-memory-no-pushdown`	`70106102.60710318` ns (`455189.1161160767`)	`71237883` ns/iter (`± 1431115`)	`0.98`
`tpch_q15/vortex-in-memory-pushdown`	`122245527.3222619` ns (`656851.0220178589`)	`124403185` ns/iter (`± 854376`)	`0.98`
`tpch_q15/arrow`	`68284718.06132935` ns (`596973.3149846196`)	`66195092` ns/iter (`± 1472663`)	`1.03`
`tpch_q15/parquet`	`305362649.85` ns (`1483126.974999994`)	`295437640` ns/iter (`± 1150294`)	`1.03`
`tpch_q15/vortex-file-compressed`	`166438923.76698413` ns (`1290993.2371706367`)	`157382540` ns/iter (`± 411484`)	`1.06`
`tpch_q15/vortex-file-uncompressed`	`281698067.55` ns (`1131462.449999988`)	`275891348` ns/iter (`± 6001720`)	`1.02`
`tpch_q16/vortex-in-memory-no-pushdown`	`123305430.06269841` ns (`154485.40356349945`)	`118867963` ns/iter (`± 629106`)	`1.04`
`tpch_q16/vortex-in-memory-pushdown`	`128782849.80015874` ns (`231569.55085118115`)	`124703683` ns/iter (`± 1081895`)	`1.03`
`tpch_q16/arrow`	`108126961.2945238` ns (`299681.0858184621`)	`107392480` ns/iter (`± 705642`)	`1.01`
`tpch_q16/parquet`	`126170624.60535714` ns (`165293.12652678043`)	`123485091` ns/iter (`± 3669376`)	`1.02`
`tpch_q16/vortex-file-compressed`	`140641349.5671429` ns (`388245.0111131072`)	`138265217` ns/iter (`± 832715`)	`1.02`
`tpch_q16/vortex-file-uncompressed`	`140168289.80829364` ns (`179870.77833086252`)	`137767991` ns/iter (`± 578019`)	`1.02`
`tpch_q17/vortex-in-memory-no-pushdown`	`721306336.4` ns (`5332481.033749998`)	`649086157` ns/iter (`± 16184725`)	`1.11`
`tpch_q17/vortex-in-memory-pushdown`	`725077462.5` ns (`5307768.612499952`)	`654515157` ns/iter (`± 14489515`)	`1.11`
`tpch_q17/arrow`	`653677774.7` ns (`5063423.765000045`)	`567239351` ns/iter (`± 11560937`)	`1.15`
`tpch_q17/parquet`	`606152570.1` ns (`3143144.972500026`)	`595915976` ns/iter (`± 6246602`)	`1.02`
`tpch_q17/vortex-file-compressed`	`649481190` ns (`2199092.0737499595`)	`612595783` ns/iter (`± 2547793`)	`1.06`
`tpch_q17/vortex-file-uncompressed`	`709473433.2` ns (`6775567.780000031`)	`667861291` ns/iter (`± 8580372`)	`1.06`
`tpch_q18/vortex-in-memory-no-pushdown`	`1116718640.7` ns (`6570566.079999924`)	`1034223912` ns/iter (`± 23067942`)	`1.08`
`tpch_q18/vortex-in-memory-pushdown`	`1119784218.6` ns (`9246197.169999957`)	`994376340` ns/iter (`± 5989420`)	`1.13`
`tpch_q18/arrow`	`1106066697.9` ns (`3555563.296250105`)	`1004004887` ns/iter (`± 4588695`)	`1.10`
`tpch_q18/parquet`	`1294545462` ns (`8926828.143750072`)	`1186490542` ns/iter (`± 18651939`)	`1.09`
`tpch_q18/vortex-file-compressed`	`1128273292.8` ns (`4267479.612499952`)	`1065012633` ns/iter (`± 14258649`)	`1.06`
`tpch_q18/vortex-file-uncompressed`	`1167309511.9` ns (`7950887.348750114`)	`1135720332` ns/iter (`± 32401940`)	`1.03`
`tpch_q19/vortex-in-memory-no-pushdown`	`166112949.20166668` ns (`325413.97500000894`)	`165874289` ns/iter (`± 732945`)	`1.00`
`tpch_q19/vortex-in-memory-pushdown`	`259734381.3` ns (`386084.44750000536`)	`260523501` ns/iter (`± 1589056`)	`1.00`
`tpch_q19/arrow`	`153134599.72357142` ns (`315480.78404167295`)	`153540134` ns/iter (`± 536959`)	`1.00`
`tpch_q19/parquet`	`479759370.1` ns (`673078.1499999762`)	`477195361` ns/iter (`± 3004902`)	`1.01`
`tpch_q19/vortex-file-compressed`	`788425593.3` ns (`1567368.550000012`)	`757301083` ns/iter (`± 6091638`)	`1.04`
`tpch_q19/vortex-file-uncompressed`	`369395385.7` ns (`860898.25`)	`374268649` ns/iter (`± 1709655`)	`0.99`
`tpch_q20/vortex-in-memory-no-pushdown`	`277640939.25` ns (`854839.7300000191`)	`267235874` ns/iter (`± 6059222`)	`1.04`
`tpch_q20/vortex-in-memory-pushdown`	`300427448.5` ns (`1794995.0856249928`)	`299117478` ns/iter (`± 6408411`)	`1.00`
`tpch_q20/arrow`	`255485112.5` ns (`1111991.8399999887`)	`256806267` ns/iter (`± 8213627`)	`0.99`
`tpch_q20/parquet`	`370123782.65` ns (`1381059.2018750012`)	`377456624` ns/iter (`± 5175006`)	`0.98`
`tpch_q20/vortex-file-compressed`	`335169433.7` ns (`2389999.5`)	`327554400` ns/iter (`± 7045439`)	`1.02`
`tpch_q20/vortex-file-uncompressed`	`422524469.55` ns (`986920.4343750179`)	`416259584` ns/iter (`± 5453969`)	`1.02`
`tpch_q21/vortex-in-memory-no-pushdown`	`867259857.2` ns (`1559337.7662500143`)	`839899381` ns/iter (`± 9947739`)	`1.03`
`tpch_q21/vortex-in-memory-pushdown`	`927417861.3` ns (`3553385.9662500024`)	`904150556` ns/iter (`± 17125645`)	`1.03`
`tpch_q21/arrow`	`862319475.5` ns (`2946472.407499969`)	`834172632` ns/iter (`± 6626356`)	`1.03`
`tpch_q21/parquet`	`1022675321.3` ns (`3296504.0487499833`)	`987756274` ns/iter (`± 14734132`)	`1.04`
`tpch_q21/vortex-file-compressed`	`1248097842.4` ns (`1881255.3212499619`)	`1173609856` ns/iter (`± 5160528`)	`1.06`
`tpch_q21/vortex-file-uncompressed`	`1373740637.7` ns (`4932148.789999843`)	`1328981880` ns/iter (`± 9120789`)	`1.03`
`tpch_q22/vortex-in-memory-no-pushdown`	`97738027.65321428` ns (`409296.28778125346`)	`97935522` ns/iter (`± 453887`)	`1.00`
`tpch_q22/vortex-in-memory-pushdown`	`98659317.80075397` ns (`1492063.2876845077`)	`97710962` ns/iter (`± 829935`)	`1.01`
`tpch_q22/arrow`	`67136981.28025793` ns (`241315.54801810905`)	`69526644` ns/iter (`± 272226`)	`0.97`
`tpch_q22/parquet`	`98115234.07230158` ns (`519330.05611111224`)	`96659104` ns/iter (`± 1019394`)	`1.02`
`tpch_q22/vortex-file-compressed`	`103439443.46646826` ns (`360589.396852687`)	`103294285` ns/iter (`± 932885`)	`1.00`
`tpch_q22/vortex-file-uncompressed`	`110684712.36460316` ns (`458732.90418849885`)	`111618839` ns/iter (`± 1236391`)	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

a10y · 2024-09-20T00:15:41Z

bench-vortex/src/lib.rs

-        // .with_limit(100_000)
-        .build()
-        .unwrap();
+    let reader = builder.with_batch_size(BATCH_SIZE).build().unwrap();


nit: i believe this is already the DEFAULT_BATCH_SIZE for the reader

The Parquet crate claims otherwise:

/// Set the size of [`RecordBatch`] to produce. Defaults to 1024 /// If the batch_size more than the file row count, use the file row count. pub fn with_batch_size(self, batch_size: usize) -> Self {

We define BATCH_SIZE as:

pub const BATCH_SIZE: usize = 65_536;

You're right, I confused this with our LayoutReaderBuilder

a10y · 2024-09-20T00:27:43Z

bench-vortex/benches/compress_benchmark.rs

+}
+
+fn parquet_written_size(array: &Array, filepath: &str, compression: Compression) -> usize {
+    let mut file = std::fs::File::create(Path::new(filepath)).unwrap();


Just a thought, but the ArrowWriter just needs something that impls Write, so instead of writing to file you could just give it a Vec<u8> and not worry about pushing random files

Man, does everyone come to hate deriving impl's in Rust? /s

Yeah, you're totally right this is silly unnecessary pollution of the filesystem and causes the tests to blow out their disk. I switched to a Cursor<Vec> which tracks how many bytes have been written.

a10y · 2024-09-20T00:28:31Z

bench-vortex/benches/compress_benchmark.rs

+    n_bytes
+}
+
+fn vortex_written_size(array: &Array, filepath: &str) -> u64 {


same thing here: you can use a Vec instead of file if all you wanna do is measure the compressed size

though i could understand if you wanna dump them to disk to poke at manually, persisting would make that easier

Yeah, I did poke at them while I was iterating but at least for testing and benchmarking it seems best to be fast and not use disk.

robert3005 · 2024-09-20T10:42:17Z

Apologies I was confused. Just didn't realize we had benchmark label added in quite a lot of prs

Ratio benchmarks are not supported by criterion. Instead, back in #882, I added some code to generate ratios and print them in the format expected by our GitHub Action. Unfortunately, this code currently runs unconditionally which is annoying when you are filtering benchmarks. Now you can do this: ``` BENCH_VORTEX_RATIOS=AirlineSentiment cargo bench --bench compress_noci -- AirlineSentiment ``` And you'll receive both ratios and compression time benchmarks for AirlineSentiment and no output for other datasets. But when you do this: ``` cargo bench --bench compress_noci -- AirlineSentiment ``` You only get compression time benchmarks for AirlineSentiment.

danking added the benchmark Run benchmarks on this branch label Sep 19, 2024

github-actions bot removed the benchmark Run benchmarks on this branch label Sep 19, 2024

github-actions bot reviewed Sep 19, 2024

View reviewed changes

lwwmanning approved these changes Sep 19, 2024

View reviewed changes

lwwmanning enabled auto-merge (squash) September 19, 2024 21:48

robert3005 reviewed Sep 19, 2024

View reviewed changes

github-actions bot reviewed Sep 19, 2024

View reviewed changes

lwwmanning disabled auto-merge September 19, 2024 22:03

github-actions bot reviewed Sep 19, 2024

View reviewed changes

a10y reviewed Sep 20, 2024

View reviewed changes

write to RAM

fafb1ad

danking enabled auto-merge (squash) September 20, 2024 14:38

danking merged commit a87c720 into develop Sep 20, 2024
5 checks passed

danking deleted the dk/bench-compression branch September 20, 2024 14:54

danking mentioned this pull request Oct 3, 2024

feat: add BENCH_VORTEX_RATIOS variable to filter ratio benchmarks #970

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: track compressed size & compare to parquet(zstd)? & canonical #882

feat: track compressed size & compare to parquet(zstd)? & canonical #882

danking commented Sep 19, 2024

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

robert3005 Sep 19, 2024

danking Sep 20, 2024

github-actions bot left a comment

robert3005 commented Sep 19, 2024

lwwmanning commented Sep 19, 2024

robert3005 commented Sep 19, 2024

github-actions bot left a comment

a10y Sep 20, 2024

danking Sep 20, 2024

a10y Sep 20, 2024

a10y Sep 20, 2024

danking Sep 20, 2024

a10y Sep 20, 2024

a10y Sep 20, 2024

danking Sep 20, 2024

robert3005 commented Sep 20, 2024

feat: track compressed size & compare to parquet(zstd)? & canonical #882

feat: track compressed size & compare to parquet(zstd)? & canonical #882

Conversation

danking commented Sep 19, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

Vortex bytes_at

github-actions bot left a comment

Choose a reason for hiding this comment

Vortex random_access

github-actions bot left a comment

Choose a reason for hiding this comment

Vortex DataFusion

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

Vortex Compression

robert3005 commented Sep 19, 2024

lwwmanning commented Sep 19, 2024

robert3005 commented Sep 19, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

Vortex benchmarks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robert3005 commented Sep 20, 2024