-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: track compressed size & compare to parquet(zstd)? & canonical #882
Conversation
We now track these six values: 1. Compression time (s). 2. Compression throughput (bytes/s). 3. Compressed size (bytes). 4. Compressed size as fraction of a Vortex Canonical array. 5. Compressed Layout size as fraction of Parquet without block compression. 6. Compressed Layout size as fraction of Parquet with Zstd. It's a bit janky: I just unconditionally compute these values for several datasets. I couldn't figure out how to ask criterion which benchmark regex is currently in use so, for example, `cargo bench taxi` will still run all the size benchmarks for every other dataset. I also had to do some janky jq parsing to convert from Criterion's JSON output to the style expected by the benchmark-action GitHub action that we use. Nevertheless, now, for each commit to `develop`, we should get all six numbers for the Taxi, Airline Sentiment, Arade, Bimbo, CMSprovider, Euro2016, Food, HashTags, and TPC-H l_comment datasets. They'll be displayed under [Vortex Compression](https://spiraldb.github.io/vortex/dev/bench/#Vortex_Compression) at the benchmarks site. I might need to delete some old data form the gh-pages-bench branch since I changed some benchmark names, but after a few commits, those plots should become useful measures of our compression performance in space and time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vortex bytes_at
Benchmark suite | Current: 615466f | Previous: a96ff2c | Ratio |
---|---|---|---|
bytes_at/array_data |
609.7744546778674 ns (0.13420378286656387 ) |
613 ns/iter (± 8 ) |
0.99 |
bytes_at/array_data #2 |
1039.372483865207 ns (0.5307530144946213 ) |
1043 ns/iter (± 4 ) |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vortex random_access
Benchmark suite | Current: 615466f | Previous: a96ff2c | Ratio |
---|---|---|---|
vortex/tokio local disk |
1245366.8620128417 ns (4668.183183866553 ) |
1308917 ns/iter (± 29650 ) |
0.95 |
vortex/localfs |
1403735.5290472142 ns (4471.246426050318 ) |
1457592 ns/iter (± 32225 ) |
0.96 |
parquet/tokio local disk |
194141199.46666664 ns (2304201.5900000036 ) |
178158170 ns/iter (± 2466099 ) |
1.09 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vortex DataFusion
Benchmark suite | Current: 615466f | Previous: a96ff2c | Ratio |
---|---|---|---|
arrow/planning |
816320.3626479161 ns (2024.9349698948208 ) |
813880 ns/iter (± 4517 ) |
1.00 |
arrow/exec |
1771804.0215798502 ns (10301.68340335507 ) |
1774262 ns/iter (± 18680 ) |
1.00 |
vortex-pushdown-compressed/planning |
515887.4864556427 ns (1043.680268734548 ) |
516095 ns/iter (± 1831 ) |
1.00 |
vortex-pushdown-compressed/exec |
3078940.8429411757 ns (2820.7001838223077 ) |
3209669 ns/iter (± 141970 ) |
0.96 |
vortex-pushdown-uncompressed/planning |
521224.8686081361 ns (5009.368536001508 ) |
514579 ns/iter (± 1971 ) |
1.01 |
vortex-pushdown-uncompressed/exec |
2937165.6294444446 ns (2031.9434375003912 ) |
3336867 ns/iter (± 9294 ) |
0.88 |
vortex-nopushdown-compressed/planning |
716020.948887611 ns (384.9729955360526 ) |
710291 ns/iter (± 5322 ) |
1.01 |
vortex-nopushdown-compressed/exec |
8489314.503333332 ns (73138.21887499839 ) |
14988542 ns/iter (± 251952 ) |
0.57 |
vortex-nopushdown-uncompressed/planning |
715991.6986297732 ns (370.5863519538543 ) |
715546 ns/iter (± 3807 ) |
1.00 |
vortex-nopushdown-uncompressed/exec |
2007280.3079999995 ns (1379.8270149999298 ) |
2001661 ns/iter (± 82038 ) |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
cargo criterion --bench ${{ matrix.benchmark.id }} --message-format=json 2>&1 | tee out.json | ||
|
||
cat out.json | ||
|
||
sudo apt-get update && sudo apt-get install -y jq | ||
|
||
jq --raw-input --compact-output ' | ||
fromjson? | ||
| [ (if .mean != null then {name: .id, value: .mean.estimate, unit: .unit, range: ((.mean.upper_bound - .mean.lower_bound) / 2) } else {} end), | ||
(if .throughput != null then {name: (.id + " throughput"), value: .throughput[].per_iteration, unit: .throughput[].unit, range: 0} else {} end), | ||
{name, value, unit, range} ] | ||
| .[] | ||
| select(.value != null) | ||
' \ | ||
out.json \ | ||
| jq --slurp --compact-output '.' >${{ matrix.benchmark.id }}.json | ||
|
||
cat ${{ matrix.benchmark.id }}.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit excessive. I wonder if this is simpler if we write our own github action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think my preferred solution is either a CSV or a JSON Line file that we just append to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vortex Compression
Benchmark suite | Current: 615466f | Previous: a96ff2c | Ratio |
---|---|---|---|
Yellow Taxi Trip Data Compression Time/taxi compression |
2513565430.2 ns (10122599.299999952 ) |
||
Yellow Taxi Trip Data Compression Time/taxi compression throughput |
470808924 bytes |
||
Yellow Taxi Trip Data Vortex-to-ParquetZstd Ratio/taxi |
0.9560604643330857 ratio |
||
Yellow Taxi Trip Data Vortex-to-ParquetUncompressed Ratio/taxi |
0.6137144059032362 ratio |
||
Yellow Taxi Trip Data Compression Ratio/taxi |
0.10783895846460209 ratio |
||
Yellow Taxi Trip Data Compression Size/taxi |
50771544 bytes |
||
Public BI Compression Time/AirlineSentiment compression |
415039.5464639052 ns (491.88687187436153 ) |
||
Public BI Compression Time/AirlineSentiment compression throughput |
2020 bytes |
||
Public BI Vortex-to-ParquetZstd Ratio/AirlineSentiment |
6.400830737279335 ratio |
||
Public BI Vortex-to-ParquetUncompressed Ratio/AirlineSentiment |
4.353107344632768 ratio |
||
Public BI Compression Ratio/AirlineSentiment |
0.6207920792079208 ratio |
||
Public BI Compression Size/AirlineSentiment |
1254 bytes |
||
Public BI Compression Time/Arade compression |
3131902697.3 ns (6480990.841249704 ) |
||
Public BI Compression Time/Arade compression throughput |
787023760 bytes |
||
Public BI Vortex-to-ParquetZstd Ratio/Arade |
0.4927803394425952 ratio |
||
Public BI Vortex-to-ParquetUncompressed Ratio/Arade |
0.4398463104814441 ratio |
||
Public BI Compression Ratio/Arade |
0.1862664667201407 ratio |
||
Public BI Compression Size/Arade |
146596135 bytes |
||
Public BI Compression Time/Bimbo compression |
21855721191.1 ns (20350694.538749695 ) |
||
Public BI Compression Time/Bimbo compression throughput |
7121333608 bytes |
||
Public BI Vortex-to-ParquetZstd Ratio/Bimbo |
1.293293825007246 ratio |
||
Public BI Vortex-to-ParquetUncompressed Ratio/Bimbo |
0.8768962136437118 ratio |
||
Public BI Compression Ratio/Bimbo |
0.06423232573827764 ratio |
||
Public BI Compression Size/Bimbo |
457419820 bytes |
||
Public BI Compression Time/CMSprovider compression |
12917920336.7 ns (26398202.20625019 ) |
||
Public BI Compression Time/CMSprovider compression throughput |
5149123964 bytes |
||
Public BI Vortex-to-ParquetZstd Ratio/CMSprovider |
1.2021505846266516 ratio |
||
Public BI Vortex-to-ParquetUncompressed Ratio/CMSprovider |
0.7762200888869946 ratio |
||
Public BI Compression Ratio/CMSprovider |
0.17574964310958274 ratio |
||
Public BI Compression Size/CMSprovider |
904956699 bytes |
||
Public BI Compression Time/Euro2016 compression |
2219852099 ns (15588005.231250286 ) |
||
Public BI Compression Time/Euro2016 compression throughput |
393253221 bytes |
||
Public BI Vortex-to-ParquetZstd Ratio/Euro2016 |
1.4705138909171633 ratio |
||
Public BI Vortex-to-ParquetUncompressed Ratio/Euro2016 |
0.6239071488283204 ratio |
||
Public BI Compression Ratio/Euro2016 |
0.43458292742120985 ratio |
||
Public BI Compression Size/Euro2016 |
170901136 bytes |
||
Public BI Compression Time/Food compression |
1095478080.3 ns (3527534.875 ) |
||
Public BI Compression Time/Food compression throughput |
332718229 bytes |
||
Public BI Vortex-to-ParquetZstd Ratio/Food |
1.2297872376838528 ratio |
||
Public BI Vortex-to-ParquetUncompressed Ratio/Food |
0.6953516685794864 ratio |
||
Public BI Compression Ratio/Food |
0.13031750959458252 ratio |
||
Public BI Compression Size/Food |
43359011 bytes |
||
Public BI Compression Time/HashTags compression |
2930012702.6 ns (17763756.576250076 ) |
||
Public BI Compression Time/HashTags compression throughput |
804495592 bytes |
||
Public BI Vortex-to-ParquetZstd Ratio/HashTags |
1.6464093663569246 ratio |
||
Public BI Vortex-to-ParquetUncompressed Ratio/HashTags |
0.4680774335616459 ratio |
||
Public BI Compression Ratio/HashTags |
0.2652765038394393 ratio |
||
Public BI Compression Size/HashTags |
213413778 bytes |
||
TPC-H l_comment Compression Time/chunked-without-fsst compression |
187786756.78414685 ns (925951.3523437679 ) |
||
TPC-H l_comment Compression Time/chunked-without-fsst compression throughput |
183010921 bytes |
||
TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-without-fsst |
3.2154759555157804 ratio |
||
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-without-fsst |
0.9983658315767541 ratio |
||
TPC-H l_comment Compression Ratio/chunked-without-fsst |
0.999965750677797 ratio |
||
TPC-H l_comment Compression Size/chunked-without-fsst |
183004653 bytes |
||
TPC-H l_comment Compression Time/chunked-with-fsst compression |
1134202541.95 ns (2623869.8625000715 ) |
||
TPC-H l_comment Compression Time/chunked-with-fsst compression throughput |
183010921 bytes |
||
TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-with-fsst |
1.504212244020189 ratio |
||
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-with-fsst |
0.4670394456823924 ratio |
||
TPC-H l_comment Compression Ratio/chunked-with-fsst |
0.442999322428414 ratio |
||
TPC-H l_comment Compression Size/chunked-with-fsst |
81073714 bytes |
||
TPC-H l_comment Compression Time/canonical-with-fsst compression |
1131178437.95 ns (761932.415624857 ) |
||
TPC-H l_comment Compression Time/canonical-with-fsst compression throughput |
183010937 bytes |
||
TPC-H l_comment Vortex-to-ParquetZstd Ratio/canonical-with-fsst |
1.5059821792895995 ratio |
||
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/canonical-with-fsst |
0.46759301141944365 ratio |
||
TPC-H l_comment Compression Ratio/canonical-with-fsst |
0.44354151358724536 ratio |
||
TPC-H l_comment Compression Size/canonical-with-fsst |
81172948 bytes |
This comment was automatically generated by workflow using github-action-benchmark.
Also can we not run benchmarks on every pr? I think label would be enough and then on every develop commit? It seems like a lot to run for every commit |
Am I missing something? This PR doesn't make it run on every PR...? |
This pr doesn’t but they currently do run. This was mostly since we are making benchmark changes we should change that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vortex benchmarks
Benchmark suite | Current: 615466f | Previous: a96ff2c | Ratio |
---|---|---|---|
tpch_q1/vortex-in-memory-no-pushdown |
464843920.35 ns (2703997.192499995 ) |
456752113 ns/iter (± 3867547 ) |
1.02 |
tpch_q1/vortex-in-memory-pushdown |
532535430.5 ns (1441835.3449999988 ) |
532735558 ns/iter (± 1607247 ) |
1.00 |
tpch_q1/arrow |
448641396.35 ns (983773.7437499762 ) |
443097274 ns/iter (± 626790 ) |
1.01 |
tpch_q1/parquet |
651934200.8 ns (1167224.8499999642 ) |
653884935 ns/iter (± 2663691 ) |
1.00 |
tpch_q1/vortex-file-compressed |
631224996.7 ns (1159530.3299999833 ) |
625869224 ns/iter (± 2721859 ) |
1.01 |
tpch_q1/vortex-file-uncompressed |
636241881.9 ns (2468183.79125005 ) |
631514808 ns/iter (± 9245882 ) |
1.01 |
tpch_q2/vortex-in-memory-no-pushdown |
146135948.23503968 ns (596119.4463889003 ) |
146858416 ns/iter (± 2531945 ) |
1.00 |
tpch_q2/vortex-in-memory-pushdown |
144400001.83011904 ns (438025.6820833236 ) |
143317300 ns/iter (± 2886213 ) |
1.01 |
tpch_q2/arrow |
122211692.76773807 ns (168044.8480624929 ) |
122568135 ns/iter (± 397231 ) |
1.00 |
tpch_q2/parquet |
161668685.44722223 ns (492714.924999997 ) |
159787707 ns/iter (± 5185870 ) |
1.01 |
tpch_q2/vortex-file-compressed |
156831594.42023808 ns (853652.4739612937 ) |
156466031 ns/iter (± 1277797 ) |
1.00 |
tpch_q2/vortex-file-uncompressed |
166610068.98972222 ns (501684.8970833421 ) |
162139847 ns/iter (± 3160753 ) |
1.03 |
tpch_q3/vortex-in-memory-no-pushdown |
152064205.7482143 ns (344621.85196428 ) |
153229410 ns/iter (± 1562047 ) |
0.99 |
tpch_q3/vortex-in-memory-pushdown |
186610418.2 ns (955602.7458333224 ) |
186029987 ns/iter (± 1669213 ) |
1.00 |
tpch_q3/arrow |
148540561.55555555 ns (353743.82150000334 ) |
147328590 ns/iter (± 1417356 ) |
1.01 |
tpch_q3/parquet |
343069328.65 ns (1382273.2912499905 ) |
336975915 ns/iter (± 2025628 ) |
1.02 |
tpch_q3/vortex-file-compressed |
311862426.95 ns (998395.625 ) |
309258408 ns/iter (± 2757350 ) |
1.01 |
tpch_q3/vortex-file-uncompressed |
380565568.2 ns (1064147.7043749988 ) |
375568417 ns/iter (± 4807910 ) |
1.01 |
tpch_q4/vortex-in-memory-no-pushdown |
109463963.10416666 ns (575272.8522916734 ) |
106376311 ns/iter (± 304925 ) |
1.03 |
tpch_q4/vortex-in-memory-pushdown |
144525012.9798016 ns (331599.61648860574 ) |
141994138 ns/iter (± 1248613 ) |
1.02 |
tpch_q4/arrow |
102683885.50023809 ns (381721.1955833286 ) |
101065390 ns/iter (± 370932 ) |
1.02 |
tpch_q4/parquet |
219624434.13333336 ns (916865.9166666567 ) |
214248841 ns/iter (± 2228222 ) |
1.03 |
tpch_q4/vortex-file-compressed |
275844832.4 ns (779808.6381250024 ) |
262684851 ns/iter (± 1417453 ) |
1.05 |
tpch_q4/vortex-file-uncompressed |
322938093.15 ns (2089683.4512500167 ) |
322371836 ns/iter (± 4220226 ) |
1.00 |
tpch_q5/vortex-in-memory-no-pushdown |
296875979.6 ns (1442079.2993750274 ) |
296691840 ns/iter (± 6168057 ) |
1.00 |
tpch_q5/vortex-in-memory-pushdown |
311055426.75 ns (2625022.0974999964 ) |
321813589 ns/iter (± 5635245 ) |
0.97 |
tpch_q5/arrow |
301113685 ns (1192929.6487500072 ) |
289107585 ns/iter (± 2924141 ) |
1.04 |
tpch_q5/parquet |
463548366.3 ns (1005448.7349999845 ) |
449018047 ns/iter (± 2489275 ) |
1.03 |
tpch_q5/vortex-file-compressed |
342553821.15 ns (1811957.981249988 ) |
341880037 ns/iter (± 8522037 ) |
1.00 |
tpch_q5/vortex-file-uncompressed |
361094967.65 ns (1257259.4193750024 ) |
356647316 ns/iter (± 5958948 ) |
1.01 |
tpch_q6/vortex-in-memory-no-pushdown |
38618631.72686508 ns (166164.57365079597 ) |
40138218 ns/iter (± 630500 ) |
0.96 |
tpch_q6/vortex-in-memory-pushdown |
92286594.83333334 ns (140081.3341666609 ) |
92149267 ns/iter (± 303889 ) |
1.00 |
tpch_q6/arrow |
36310182.61406084 ns (165533.02780538797 ) |
36334469 ns/iter (± 211591 ) |
1.00 |
tpch_q6/parquet |
154528287.31761903 ns (505493.8115416467 ) |
151921473 ns/iter (± 1264234 ) |
1.02 |
tpch_q6/vortex-file-compressed |
80680396.58406746 ns (245124.53606721014 ) |
78859071 ns/iter (± 1115685 ) |
1.02 |
tpch_q6/vortex-file-uncompressed |
167328617.76924604 ns (1367708.262455359 ) |
167141882 ns/iter (± 1751525 ) |
1.00 |
tpch_q7/vortex-in-memory-no-pushdown |
568360396.5 ns (1364090.5849999785 ) |
562119306 ns/iter (± 3476977 ) |
1.01 |
tpch_q7/vortex-in-memory-pushdown |
632136235.7 ns (1477559.2749999762 ) |
611059188 ns/iter (± 6446587 ) |
1.03 |
tpch_q7/arrow |
573491795.9 ns (1734030.0337500572 ) |
553024994 ns/iter (± 2909226 ) |
1.04 |
tpch_q7/parquet |
733469206.1 ns (3208002.5250000358 ) |
710209548 ns/iter (± 5017550 ) |
1.03 |
tpch_q7/vortex-file-compressed |
682726773.6 ns (2675624.6862499714 ) |
672453257 ns/iter (± 5566775 ) |
1.02 |
tpch_q7/vortex-file-uncompressed |
759621166.1 ns (3345556.2650000453 ) |
744071550 ns/iter (± 5659596 ) |
1.02 |
tpch_q8/vortex-in-memory-no-pushdown |
217474477.0333333 ns (868808.4662500024 ) |
216237880 ns/iter (± 504152 ) |
1.01 |
tpch_q8/vortex-in-memory-pushdown |
234419944.0333333 ns (589083.5312500149 ) |
230296027 ns/iter (± 963193 ) |
1.02 |
tpch_q8/arrow |
220631285.26666665 ns (352138.8029166907 ) |
215487494 ns/iter (± 822806 ) |
1.02 |
tpch_q8/parquet |
494087190.7 ns (937395.3787499964 ) |
482558982 ns/iter (± 1927968 ) |
1.02 |
tpch_q8/vortex-file-compressed |
264829153.8 ns (518212.8099999875 ) |
272225347 ns/iter (± 3218905 ) |
0.97 |
tpch_q8/vortex-file-uncompressed |
297438551.9 ns (3614358.862499982 ) |
307092746 ns/iter (± 4647118 ) |
0.97 |
tpch_q9/vortex-in-memory-no-pushdown |
412446957.85 ns (953962.8081250191 ) |
405778945 ns/iter (± 3408198 ) |
1.02 |
tpch_q9/vortex-in-memory-pushdown |
414989723.45 ns (1026939.4056250155 ) |
409784837 ns/iter (± 8637477 ) |
1.01 |
tpch_q9/arrow |
403587610.3 ns (1367421.574999988 ) |
400998246 ns/iter (± 7870465 ) |
1.01 |
tpch_q9/parquet |
716911614.5 ns (2406505.4037500024 ) |
687723525 ns/iter (± 2769586 ) |
1.04 |
tpch_q9/vortex-file-compressed |
464833005.2 ns (957100.875 ) |
449724976 ns/iter (± 6436082 ) |
1.03 |
tpch_q9/vortex-file-uncompressed |
490149814.3 ns (1457817.625 ) |
482884495 ns/iter (± 6141352 ) |
1.02 |
tpch_q10/vortex-in-memory-no-pushdown |
228002276.6 ns (483031.4987499863 ) |
224740155 ns/iter (± 1207852 ) |
1.01 |
tpch_q10/vortex-in-memory-pushdown |
266292655.8 ns (608651.8568750024 ) |
265222009 ns/iter (± 4544285 ) |
1.00 |
tpch_q10/arrow |
225509220.8666667 ns (501292.3099999577 ) |
219076345 ns/iter (± 7024828 ) |
1.03 |
tpch_q10/parquet |
478092539.45 ns (1642956.1399999857 ) |
481426698 ns/iter (± 4462077 ) |
0.99 |
tpch_q10/vortex-file-compressed |
473475771.1 ns (706985.8381249905 ) |
474019593 ns/iter (± 4032038 ) |
1.00 |
tpch_q10/vortex-file-uncompressed |
407417004.75 ns (988742.6606250107 ) |
408859777 ns/iter (± 3938984 ) |
1.00 |
tpch_q11/vortex-in-memory-no-pushdown |
224700780.53333336 ns (467540.4262499958 ) |
219129162 ns/iter (± 1832776 ) |
1.03 |
tpch_q11/vortex-in-memory-pushdown |
225753908.73333335 ns (1244039.6466666758 ) |
220793553 ns/iter (± 918518 ) |
1.02 |
tpch_q11/arrow |
177459193.11484125 ns (327176.177959308 ) |
175455464 ns/iter (± 1125682 ) |
1.01 |
tpch_q11/parquet |
191560226.3 ns (932539.5670833439 ) |
185576140 ns/iter (± 2442270 ) |
1.03 |
tpch_q11/vortex-file-compressed |
230677296.2333333 ns (574339.2933333367 ) |
229509379 ns/iter (± 1644525 ) |
1.01 |
tpch_q11/vortex-file-uncompressed |
239172281.3666667 ns (1564077.60041669 ) |
232738732 ns/iter (± 1873655 ) |
1.03 |
tpch_q12/vortex-in-memory-no-pushdown |
181748026.5152381 ns (119161.99357143044 ) |
179897756 ns/iter (± 1962967 ) |
1.01 |
tpch_q12/vortex-in-memory-pushdown |
269784045.6 ns (167057.17499998212 ) |
268815014 ns/iter (± 1804665 ) |
1.00 |
tpch_q12/arrow |
171955794.4640476 ns (166796.68347024918 ) |
170395809 ns/iter (± 844024 ) |
1.01 |
tpch_q12/parquet |
365822822 ns (725210.1762500107 ) |
365760882 ns/iter (± 5113024 ) |
1.00 |
tpch_q12/vortex-file-compressed |
613578776.5 ns (2318134.6500000358 ) |
611089999 ns/iter (± 3516355 ) |
1.00 |
tpch_q12/vortex-file-uncompressed |
366636327.7 ns (477821.15125000477 ) |
363970552 ns/iter (± 2594091 ) |
1.01 |
tpch_q13/vortex-in-memory-no-pushdown |
190910588.36666667 ns (1207546.7833333164 ) |
171007772 ns/iter (± 4051193 ) |
1.12 |
tpch_q13/vortex-in-memory-pushdown |
186951125.5 ns (1458854.6433333158 ) |
169154998 ns/iter (± 6178477 ) |
1.11 |
tpch_q13/arrow |
181922969.06813493 ns (2688358.8900689334 ) |
179394695 ns/iter (± 11628817 ) |
1.01 |
tpch_q13/parquet |
335171461.75 ns (1085733.2525000274 ) |
343672528 ns/iter (± 12687514 ) |
0.98 |
tpch_q13/vortex-file-compressed |
219688230.26666665 ns (485762.50458332896 ) |
221913395 ns/iter (± 3831446 ) |
0.99 |
tpch_q13/vortex-file-uncompressed |
223551243.83333334 ns (1028771.7762500048 ) |
212642135 ns/iter (± 1772424 ) |
1.05 |
tpch_q14/vortex-in-memory-no-pushdown |
39528689.122539684 ns (129273.55500794202 ) |
37691250 ns/iter (± 577567 ) |
1.05 |
tpch_q14/vortex-in-memory-pushdown |
88687423.32335317 ns (166015.09192808717 ) |
90997812 ns/iter (± 1433438 ) |
0.97 |
tpch_q14/arrow |
41340823.23357143 ns (99626.081876982 ) |
39535216 ns/iter (± 535028 ) |
1.05 |
tpch_q14/parquet |
226617225.69999996 ns (592733.4029166698 ) |
227708765 ns/iter (± 1731670 ) |
1.00 |
tpch_q14/vortex-file-compressed |
91236196.1070238 ns (300096.2808660716 ) |
90305017 ns/iter (± 631935 ) |
1.01 |
tpch_q14/vortex-file-uncompressed |
146631346.60845238 ns (509835.0686994046 ) |
144470755 ns/iter (± 708935 ) |
1.01 |
tpch_q15/vortex-in-memory-no-pushdown |
70106102.60710318 ns (455189.1161160767 ) |
71237883 ns/iter (± 1431115 ) |
0.98 |
tpch_q15/vortex-in-memory-pushdown |
122245527.3222619 ns (656851.0220178589 ) |
124403185 ns/iter (± 854376 ) |
0.98 |
tpch_q15/arrow |
68284718.06132935 ns (596973.3149846196 ) |
66195092 ns/iter (± 1472663 ) |
1.03 |
tpch_q15/parquet |
305362649.85 ns (1483126.974999994 ) |
295437640 ns/iter (± 1150294 ) |
1.03 |
tpch_q15/vortex-file-compressed |
166438923.76698413 ns (1290993.2371706367 ) |
157382540 ns/iter (± 411484 ) |
1.06 |
tpch_q15/vortex-file-uncompressed |
281698067.55 ns (1131462.449999988 ) |
275891348 ns/iter (± 6001720 ) |
1.02 |
tpch_q16/vortex-in-memory-no-pushdown |
123305430.06269841 ns (154485.40356349945 ) |
118867963 ns/iter (± 629106 ) |
1.04 |
tpch_q16/vortex-in-memory-pushdown |
128782849.80015874 ns (231569.55085118115 ) |
124703683 ns/iter (± 1081895 ) |
1.03 |
tpch_q16/arrow |
108126961.2945238 ns (299681.0858184621 ) |
107392480 ns/iter (± 705642 ) |
1.01 |
tpch_q16/parquet |
126170624.60535714 ns (165293.12652678043 ) |
123485091 ns/iter (± 3669376 ) |
1.02 |
tpch_q16/vortex-file-compressed |
140641349.5671429 ns (388245.0111131072 ) |
138265217 ns/iter (± 832715 ) |
1.02 |
tpch_q16/vortex-file-uncompressed |
140168289.80829364 ns (179870.77833086252 ) |
137767991 ns/iter (± 578019 ) |
1.02 |
tpch_q17/vortex-in-memory-no-pushdown |
721306336.4 ns (5332481.033749998 ) |
649086157 ns/iter (± 16184725 ) |
1.11 |
tpch_q17/vortex-in-memory-pushdown |
725077462.5 ns (5307768.612499952 ) |
654515157 ns/iter (± 14489515 ) |
1.11 |
tpch_q17/arrow |
653677774.7 ns (5063423.765000045 ) |
567239351 ns/iter (± 11560937 ) |
1.15 |
tpch_q17/parquet |
606152570.1 ns (3143144.972500026 ) |
595915976 ns/iter (± 6246602 ) |
1.02 |
tpch_q17/vortex-file-compressed |
649481190 ns (2199092.0737499595 ) |
612595783 ns/iter (± 2547793 ) |
1.06 |
tpch_q17/vortex-file-uncompressed |
709473433.2 ns (6775567.780000031 ) |
667861291 ns/iter (± 8580372 ) |
1.06 |
tpch_q18/vortex-in-memory-no-pushdown |
1116718640.7 ns (6570566.079999924 ) |
1034223912 ns/iter (± 23067942 ) |
1.08 |
tpch_q18/vortex-in-memory-pushdown |
1119784218.6 ns (9246197.169999957 ) |
994376340 ns/iter (± 5989420 ) |
1.13 |
tpch_q18/arrow |
1106066697.9 ns (3555563.296250105 ) |
1004004887 ns/iter (± 4588695 ) |
1.10 |
tpch_q18/parquet |
1294545462 ns (8926828.143750072 ) |
1186490542 ns/iter (± 18651939 ) |
1.09 |
tpch_q18/vortex-file-compressed |
1128273292.8 ns (4267479.612499952 ) |
1065012633 ns/iter (± 14258649 ) |
1.06 |
tpch_q18/vortex-file-uncompressed |
1167309511.9 ns (7950887.348750114 ) |
1135720332 ns/iter (± 32401940 ) |
1.03 |
tpch_q19/vortex-in-memory-no-pushdown |
166112949.20166668 ns (325413.97500000894 ) |
165874289 ns/iter (± 732945 ) |
1.00 |
tpch_q19/vortex-in-memory-pushdown |
259734381.3 ns (386084.44750000536 ) |
260523501 ns/iter (± 1589056 ) |
1.00 |
tpch_q19/arrow |
153134599.72357142 ns (315480.78404167295 ) |
153540134 ns/iter (± 536959 ) |
1.00 |
tpch_q19/parquet |
479759370.1 ns (673078.1499999762 ) |
477195361 ns/iter (± 3004902 ) |
1.01 |
tpch_q19/vortex-file-compressed |
788425593.3 ns (1567368.550000012 ) |
757301083 ns/iter (± 6091638 ) |
1.04 |
tpch_q19/vortex-file-uncompressed |
369395385.7 ns (860898.25 ) |
374268649 ns/iter (± 1709655 ) |
0.99 |
tpch_q20/vortex-in-memory-no-pushdown |
277640939.25 ns (854839.7300000191 ) |
267235874 ns/iter (± 6059222 ) |
1.04 |
tpch_q20/vortex-in-memory-pushdown |
300427448.5 ns (1794995.0856249928 ) |
299117478 ns/iter (± 6408411 ) |
1.00 |
tpch_q20/arrow |
255485112.5 ns (1111991.8399999887 ) |
256806267 ns/iter (± 8213627 ) |
0.99 |
tpch_q20/parquet |
370123782.65 ns (1381059.2018750012 ) |
377456624 ns/iter (± 5175006 ) |
0.98 |
tpch_q20/vortex-file-compressed |
335169433.7 ns (2389999.5 ) |
327554400 ns/iter (± 7045439 ) |
1.02 |
tpch_q20/vortex-file-uncompressed |
422524469.55 ns (986920.4343750179 ) |
416259584 ns/iter (± 5453969 ) |
1.02 |
tpch_q21/vortex-in-memory-no-pushdown |
867259857.2 ns (1559337.7662500143 ) |
839899381 ns/iter (± 9947739 ) |
1.03 |
tpch_q21/vortex-in-memory-pushdown |
927417861.3 ns (3553385.9662500024 ) |
904150556 ns/iter (± 17125645 ) |
1.03 |
tpch_q21/arrow |
862319475.5 ns (2946472.407499969 ) |
834172632 ns/iter (± 6626356 ) |
1.03 |
tpch_q21/parquet |
1022675321.3 ns (3296504.0487499833 ) |
987756274 ns/iter (± 14734132 ) |
1.04 |
tpch_q21/vortex-file-compressed |
1248097842.4 ns (1881255.3212499619 ) |
1173609856 ns/iter (± 5160528 ) |
1.06 |
tpch_q21/vortex-file-uncompressed |
1373740637.7 ns (4932148.789999843 ) |
1328981880 ns/iter (± 9120789 ) |
1.03 |
tpch_q22/vortex-in-memory-no-pushdown |
97738027.65321428 ns (409296.28778125346 ) |
97935522 ns/iter (± 453887 ) |
1.00 |
tpch_q22/vortex-in-memory-pushdown |
98659317.80075397 ns (1492063.2876845077 ) |
97710962 ns/iter (± 829935 ) |
1.01 |
tpch_q22/arrow |
67136981.28025793 ns (241315.54801810905 ) |
69526644 ns/iter (± 272226 ) |
0.97 |
tpch_q22/parquet |
98115234.07230158 ns (519330.05611111224 ) |
96659104 ns/iter (± 1019394 ) |
1.02 |
tpch_q22/vortex-file-compressed |
103439443.46646826 ns (360589.396852687 ) |
103294285 ns/iter (± 932885 ) |
1.00 |
tpch_q22/vortex-file-uncompressed |
110684712.36460316 ns (458732.90418849885 ) |
111618839 ns/iter (± 1236391 ) |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
// .with_limit(100_000) | ||
.build() | ||
.unwrap(); | ||
let reader = builder.with_batch_size(BATCH_SIZE).build().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i believe this is already the DEFAULT_BATCH_SIZE for the reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Parquet crate claims otherwise:
/// Set the size of [`RecordBatch`] to produce. Defaults to 1024
/// If the batch_size more than the file row count, use the file row count.
pub fn with_batch_size(self, batch_size: usize) -> Self {
We define BATCH_SIZE as:
pub const BATCH_SIZE: usize = 65_536;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I confused this with our LayoutReaderBuilder
} | ||
|
||
fn parquet_written_size(array: &Array, filepath: &str, compression: Compression) -> usize { | ||
let mut file = std::fs::File::create(Path::new(filepath)).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, but the ArrowWriter just needs something that impls Write
, so instead of writing to file you could just give it a Vec<u8>
and not worry about pushing random files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, does everyone come to hate deriving impl's in Rust? /s
Yeah, you're totally right this is silly unnecessary pollution of the filesystem and causes the tests to blow out their disk. I switched to a Cursor<Vec> which tracks how many bytes have been written.
n_bytes | ||
} | ||
|
||
fn vortex_written_size(array: &Array, filepath: &str) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here: you can use a Vec instead of file if all you wanna do is measure the compressed size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though i could understand if you wanna dump them to disk to poke at manually, persisting would make that easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did poke at them while I was iterating but at least for testing and benchmarking it seems best to be fast and not use disk.
Apologies I was confused. Just didn't realize we had benchmark label added in quite a lot of prs |
Ratio benchmarks are not supported by criterion. Instead, back in #882, I added some code to generate ratios and print them in the format expected by our GitHub Action. Unfortunately, this code currently runs unconditionally which is annoying when you are filtering benchmarks. Now you can do this: ``` BENCH_VORTEX_RATIOS=AirlineSentiment cargo bench --bench compress_noci -- AirlineSentiment ``` And you'll receive both ratios and compression time benchmarks for AirlineSentiment and no output for other datasets. But when you do this: ``` cargo bench --bench compress_noci -- AirlineSentiment ``` You only get compression time benchmarks for AirlineSentiment.
We now track these six values:
It's a bit janky: I just unconditionally compute these values for several datasets. I couldn't figure out how to ask criterion which benchmark regex is currently in use so, for example,
cargo bench taxi
will still run all the size benchmarks for every other dataset.I also had to do some janky jq parsing to convert from Criterion's JSON output to the style expected by the benchmark-action GitHub action that we use.
Nevertheless, now, for each commit to
develop
, we should get all six numbers for the Taxi, Airline Sentiment, Arade, Bimbo, CMSprovider, Euro2016, Food, HashTags, and TPC-H l_comment datasets. They'll be displayed under VortexCompression at the benchmarks site.
I might need to delete some old data form the gh-pages-bench branch since I changed some benchmark names, but after a few commits, those plots should become useful measures of our compression performance in space and time.