Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: compress chunk n+1 like chunk n #969

Closed
wants to merge 10 commits into from

Conversation

danking
Copy link
Member

@danking danking commented Oct 3, 2024

EDIT: still needs some work, sorry for the ping

@danking danking added the benchmark Run benchmarks on this branch label Oct 3, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Oct 3, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex bytes_at

Benchmark suite Current: 52800a2 Previous: e240a50 Ratio
bytes_at/array_data 594.2038829169993 ns (2.2438457860494623) 587.002243166888 ns (0.28828533272832146) 1.01
bytes_at/array_view 872.0782663165314 ns (0.38793713239715544) 875.550848166955 ns (0.3197943645942587) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFusion

Benchmark suite Current: 52800a2 Previous: e240a50 Ratio
arrow/planning 811479.1314518038 ns (1916.2791496255086) 820769.6717707266 ns (2964.241430480324) 0.99
arrow/exec 1761101.1180039144 ns (6089.72203859943) 1766515.64480085 ns (1931.4973330415087) 1.00
vortex-pushdown-compressed/planning 526184.3363168914 ns (5257.426532392157) 519012.2720842995 ns (1946.171176115633) 1.01
vortex-pushdown-compressed/exec 2980510.804705882 ns (15521.958500000415) 3104198.92764706 ns (3106.5659338235855) 0.96
vortex-pushdown-uncompressed/planning 522600.28982722503 ns (2043.5975712207437) 513255.72211541404 ns (567.7411759887764) 1.02
vortex-pushdown-uncompressed/exec 3029319.886470588 ns (7030.478485294152) 2937476.8838888896 ns (926.5853541668039) 1.03
vortex-nopushdown-compressed/planning 826475.6869202903 ns (1350.9952727508498) 827576.6900502801 ns (1745.7633304443443) 1.00
vortex-nopushdown-compressed/exec 1797606.448784647 ns (6546.8790466270875) 13370473.59 ns (47343.5370625006) 0.13
vortex-nopushdown-uncompressed/planning 827095.6997231506 ns (2265.2339747321093) 822909.1194767295 ns (704.5339979942655) 1.01
vortex-nopushdown-uncompressed/exec 1806184.5937890417 ns (5051.113721008995) 1787052.7691237137 ns (1822.381101552397) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random Access

Benchmark suite Current: 52800a2 Previous: e240a50 Ratio
random-access/vortex-tokio-local-disk 2580835.1754999994 ns (8140.831499999622) 1172560.9752617518 ns (4420.402836336172) 2.20
random-access/vortex-local-fs 2507470.0279999995 ns (8626.615000000224) 1301281.5255627078 ns (4618.475637272699) 1.93
random-access/parquet-tokio-local-disk 196352294.86666667 ns (4529434.129583299) 187416521.75428572 ns (2927455.906154752) 1.05

This comment was automatically generated by workflow using github-action-benchmark.

@danking danking force-pushed the dk/carry-forward-chunk-information branch from fb31155 to 80c4e97 Compare October 3, 2024 17:16
@danking danking added the benchmark Run benchmarks on this branch label Oct 3, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Oct 3, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex Compression

Benchmark suite Current: 52800a2 Previous: 1f21e30 Ratio
Yellow Taxi Trip Data Compression Time/taxi compression 2169415493 ns (3908408.0499999523) 3003771425 ns (11606232.212499857) 0.72
Yellow Taxi Trip Data Compression Time/taxi compression throughput 470808924 bytes 470808924 bytes 1
Yellow Taxi Trip Data Vortex-to-ParquetZstd Ratio/taxi 1.02055277911686 ratio 0.9513732335805112 ratio 1.07
Yellow Taxi Trip Data Vortex-to-ParquetUncompressed Ratio/taxi 0.655113317509165 ratio 0.6107055783824222 ratio 1.07
Yellow Taxi Trip Data Compression Ratio/taxi 0.11363061801266962 ratio 0.10570442160947655 ratio 1.07
Yellow Taxi Trip Data Compression Size/taxi 53498309 bytes 49766585 bytes 1.07
Public BI Compression Time/AirlineSentiment compression 311545.8603621711 ns (240.99165715373238) 338881.71842607285 ns (3632.6704182245885) 0.92
Public BI Compression Time/AirlineSentiment compression throughput 2020 bytes 2020 bytes 1
Public BI Vortex-to-ParquetZstd Ratio/AirlineSentiment 5.204569055036345 ratio 6.666666666666667 ratio 0.78
Public BI Vortex-to-ParquetUncompressed Ratio/AirlineSentiment 3.5395480225988702 ratio 4.533898305084746 ratio 0.78
Public BI Compression Ratio/AirlineSentiment 0.6316831683168317 ratio 0.6207920792079208 ratio 1.02
Public BI Compression Size/AirlineSentiment 1276 bytes 1254 bytes 1.02
Public BI Compression Time/Arade compression 3142126852.1 ns (3297848.25) 3883536382.9 ns (1302367.25) 0.81
Public BI Compression Time/Arade compression throughput 787023760 bytes 787023760 bytes 1
Public BI Vortex-to-ParquetZstd Ratio/Arade 0.4885065375757265 ratio 0.47635136137416784 ratio 1.03
Public BI Vortex-to-ParquetUncompressed Ratio/Arade 0.4360315966375492 ratio 0.42518211873111533 ratio 1.03
Public BI Compression Ratio/Arade 0.18251435763514942 ratio 0.17718731007561958 ratio 1.03
Public BI Compression Size/Arade 143643136 bytes 139450623 bytes 1.03
Public BI Compression Time/Bimbo compression 20827864534.7 ns (19337839.261247635) 26015462894.7 ns (18767004.09874916) 0.80
Public BI Compression Time/Bimbo compression throughput 7121333608 bytes 7121333608 bytes 1
Public BI Vortex-to-ParquetZstd Ratio/Bimbo 1.3051215227156838 ratio 1.1859965176655975 ratio 1.10
Public BI Vortex-to-ParquetUncompressed Ratio/Bimbo 0.8849157859451519 ratio 0.8041450717742066 ratio 1.10
Public BI Compression Ratio/Bimbo 0.06460223594681509 ratio 0.05758613338087559 ratio 1.12
Public BI Compression Size/Bimbo 460054074 bytes 410090067 bytes 1.12
Public BI Compression Time/CMSprovider compression 13355691675.3 ns (9926214.899999619) 17092990607.8 ns (8687542.926249504) 0.78
Public BI Compression Time/CMSprovider compression throughput 5149123964 bytes 5149123964 bytes 1
Public BI Vortex-to-ParquetZstd Ratio/CMSprovider 1.1933322180554227 ratio 1.109934639018743 ratio 1.08
Public BI Vortex-to-ParquetUncompressed Ratio/CMSprovider 0.7705261322635127 ratio 0.7166769081807277 ratio 1.08
Public BI Compression Ratio/CMSprovider 0.17346186579399275 ratio 0.1591471655623943 ratio 1.09
Public BI Compression Size/CMSprovider 893176650 bytes 819468484 bytes 1.09
Public BI Compression Time/Euro2016 compression 2084102915.3 ns (869509.4325000048) 2019809765.4 ns (2911170.205000043) 1.03
Public BI Compression Time/Euro2016 compression throughput 393253221 bytes 393253221 bytes 1
Public BI Vortex-to-ParquetZstd Ratio/Euro2016 1.4188934399430781 ratio 1.3744154006772082 ratio 1.03
Public BI Vortex-to-ParquetUncompressed Ratio/Euro2016 0.6020057111150144 ratio 0.5831346437723351 ratio 1.03
Public BI Compression Ratio/Euro2016 0.4239824293772282 ratio 0.4103797029039465 ratio 1.03
Public BI Compression Size/Euro2016 166732456 bytes 161383140 bytes 1.03
Public BI Compression Time/Food compression 1123384903.1 ns (736223.3999999762) 1323866729.6 ns (2736023.5237500668) 0.85
Public BI Compression Time/Food compression throughput 332718229 bytes 332718229 bytes 1
Public BI Vortex-to-ParquetZstd Ratio/Food 1.3697263186358202 ratio 1.2492959959805032 ratio 1.10
Public BI Vortex-to-ParquetUncompressed Ratio/Food 0.7744766346367821 ratio 0.7063823958612547 ratio 1.10
Public BI Compression Ratio/Food 0.14467887480850952 ratio 0.12975672276736 ratio 1.12
Public BI Compression Size/Food 48137299 bytes 43172427 bytes 1.12
Public BI Compression Time/HashTags compression 2841132659 ns (2151238.027499914) 3065764839.1 ns (3761069.962499857) 0.93
Public BI Compression Time/HashTags compression throughput 804495592 bytes 804495592 bytes 1
Public BI Vortex-to-ParquetZstd Ratio/HashTags 1.6658522516943424 ratio 1.507473738263557 ratio 1.11
Public BI Vortex-to-ParquetUncompressed Ratio/HashTags 0.47360508425152836 ratio 0.4285777601771831 ratio 1.11
Public BI Compression Ratio/HashTags 0.2741778204795931 ratio 0.24675172987150437 ratio 1.11
Public BI Compression Size/HashTags 220574848 bytes 198510679 bytes 1.11
TPC-H l_comment Compression Time/chunked-without-fsst compression 27467180.59518651 ns (11869.38886706531) 192707417.29091272 ns (320473.389638409) 0.14
TPC-H l_comment Compression Time/chunked-without-fsst compression throughput 183010921 bytes 183010921 bytes 1
TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-without-fsst 3.2243851483620243 ratio 3.2155362739484055 ratio 1.00
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-without-fsst 1.0011107361237086 ratio 0.9983804351200692 ratio 1.00
TPC-H l_comment Compression Ratio/chunked-without-fsst 1.000008043235846 ratio 0.999965750677797 ratio 1.00
TPC-H l_comment Compression Size/chunked-without-fsst 183012393 bytes 183004653 bytes 1.00
TPC-H l_comment Compression Time/chunked-with-fsst compression 1039254540.95 ns (4330330.299999952) 1215772050.45 ns (1766667.4624999762) 0.85
TPC-H l_comment Compression Time/chunked-with-fsst compression throughput 183010921 bytes 183010921 bytes 1
TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-with-fsst 1.3546288646705673 ratio 1.16478593890354 ratio 1.16
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-with-fsst 0.4205866971486598 ratio 0.3616502484906857 ratio 1.16
TPC-H l_comment Compression Ratio/chunked-with-fsst 0.4194435096034515 ratio 0.35994538271298027 ratio 1.17
TPC-H l_comment Compression Size/chunked-with-fsst 76762743 bytes 65873936 bytes 1.17
TPC-H l_comment Compression Time/canonical-with-fsst compression 799253318.85 ns (758743.2450000048) 1202553634.15 ns (558064.5750000477) 0.66
TPC-H l_comment Compression Time/canonical-with-fsst compression throughput 183010937 bytes 183010937 bytes 1
TPC-H l_comment Vortex-to-ParquetZstd Ratio/canonical-with-fsst 1.375582371189466 ratio 1.164785325149757 ratio 1.18
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/canonical-with-fsst 0.42709272555256855 ratio 0.36165026229632463 ratio 1.18
TPC-H l_comment Compression Ratio/canonical-with-fsst 0.42595138999807425 ratio 0.3599373954355526 ratio 1.18
TPC-H l_comment Compression Size/canonical-with-fsst 77953763 bytes 65872480 bytes 1.18

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPC-H

Benchmark suite Current: 52800a2 Previous: e240a50 Ratio
tpch_q1/vortex-in-memory-no-pushdown 467778861.6 ns (3443270.599999994) 470589477.6 ns (1053936.40625) 0.99
tpch_q1/vortex-in-memory-pushdown 515440551.8 ns (1205185.9037500024) 529707068.7 ns (1186701.9399999976) 0.97
tpch_q1/arrow 452756863.05 ns (2700391.900000006) 455404546.45 ns (1482139.2887499928) 0.99
tpch_q1/parquet 646624357.4 ns (1966457.3962500095) 666867661.2 ns (1295695.5924999714) 0.97
tpch_q1/vortex-file-compressed 642010605.2 ns (1511972.300000012) 660484421.3 ns (1886639.5037500262) 0.97
tpch_q1/vortex-file-uncompressed 541117144.3 ns (1719939.7999999523) 537482610.1 ns (1520821.639999956) 1.01
tpch_q2/vortex-in-memory-no-pushdown 122853558.22749999 ns (392860.9799999893) 123051365.74718253 ns (257698.20425346494) 1.00
tpch_q2/vortex-in-memory-pushdown 122616947.85329366 ns (244314.48357141763) 123451055.12817462 ns (153247.00467856228) 0.99
tpch_q2/arrow 120192575.0463889 ns (136338.17362498492) 121920930.44615078 ns (208327.72284524888) 0.99
tpch_q2/parquet 153907151.2229365 ns (381504.2743650824) 156972364.3824603 ns (496786.5161617249) 0.98
tpch_q2/vortex-file-compressed 153328343.55138886 ns (631260.252461791) 154905296.84920636 ns (412482.84915179014) 0.99
tpch_q2/vortex-file-uncompressed 154408102.83464286 ns (2581034.9200535566) 153655456.1931746 ns (361454.0633264035) 1.00
tpch_q3/vortex-in-memory-no-pushdown 157708369.85345238 ns (3511687.3689032793) 153629900.8154762 ns (446477.99316963553) 1.03
tpch_q3/vortex-in-memory-pushdown 181019672.58972222 ns (2575838.827333346) 183273474.5333333 ns (682627.8741666675) 0.99
tpch_q3/arrow 144515545.5045635 ns (545460.7631785721) 144453346.61809522 ns (235125.35807141662) 1.00
tpch_q3/parquet 330122870.25 ns (907480.2000000179) 340316547.6 ns (951217.2531250119) 0.97
tpch_q3/vortex-file-compressed 370295766.5 ns (2458930.8506250083) 430479264 ns (654377.4806250036) 0.86
tpch_q3/vortex-file-uncompressed 277781570 ns (1266492.5) 276793434.4 ns (1253060.6287499964) 1.00
tpch_q4/vortex-in-memory-no-pushdown 108746484.73638889 ns (678306.0879999846) 108712836.41805556 ns (422031.2028888911) 1.00
tpch_q4/vortex-in-memory-pushdown 135663909.93773812 ns (904861.9728021026) 136905778.7657143 ns (223471.66806547344) 0.99
tpch_q4/arrow 100698082.61654761 ns (1121515.8695699424) 101234618.6452381 ns (282505.8128660768) 0.99
tpch_q4/parquet 216478279.7333334 ns (1485007.7166666538) 217660170.1 ns (386642.8300000131) 0.99
tpch_q4/vortex-file-compressed 320493433.25 ns (2996651.646874994) 405200362.5 ns (965928.6987499893) 0.79
tpch_q4/vortex-file-uncompressed 227838221.86666664 ns (1657425.8437500298) 225585481.7 ns (701467.6166666597) 1.01
tpch_q5/vortex-in-memory-no-pushdown 292222192.3 ns (2890664.404374987) 297052560.65 ns (747243.7725000083) 0.98
tpch_q5/vortex-in-memory-pushdown 297347111.45 ns (654150.8656250238) 307960697.45 ns (673134.9212499857) 0.97
tpch_q5/arrow 279318154.5 ns (1353529.84375) 286638619.4 ns (658061.75) 0.97
tpch_q5/parquet 431414072.9 ns (2681259.576875001) 449777361.45 ns (1212335.900000006) 0.96
tpch_q5/vortex-file-compressed 349775200.85 ns (4429622.459374994) 342568488.65 ns (1429213.5743749738) 1.02
tpch_q5/vortex-file-uncompressed 342391128.05 ns (5002808.174999982) 344655537.75 ns (1650879.5699999928) 0.99
tpch_q6/vortex-in-memory-no-pushdown 41063242.17318784 ns (434110.7813425958) 39887318.29818783 ns (97987.97643981501) 1.03
tpch_q6/vortex-in-memory-pushdown 93757982.48436508 ns (848783.4215942472) 88216035.51170634 ns (136366.88904762268) 1.06
tpch_q6/arrow 35088866.09541006 ns (169050.04250000045) 34510059.367420636 ns (33083.91784325242) 1.02
tpch_q6/parquet 150175857.14087301 ns (257062.09383930266) 150712174.3799603 ns (164900.6799801588) 1.00
tpch_q6/vortex-file-compressed 80886727.92898811 ns (311094.1405892819) 66474245.63871032 ns (179480.4024055116) 1.22
tpch_q6/vortex-file-uncompressed 180760649.36666667 ns (1611358.1754166782) 175352802.7738492 ns (399374.2746230215) 1.03
tpch_q7/vortex-in-memory-no-pushdown 567581047.6 ns (17585258.995000064) 565118939.3 ns (1409487.8175000548) 1.00
tpch_q7/vortex-in-memory-pushdown 616056030.3 ns (10480650.402500033) 606852132.6 ns (1246202.5250000358) 1.02
tpch_q7/arrow 551002620.3 ns (4366114.924999952) 558281848.4 ns (1244840.655000031) 0.99
tpch_q7/parquet 694556615.3 ns (3963106.6562500596) 706553335.6 ns (1982560.5662499666) 0.98
tpch_q7/vortex-file-compressed 775355925.6 ns (5103679.199999988) 788311090.1 ns (2399381.128750026) 0.98
tpch_q7/vortex-file-uncompressed 684361902.7 ns (2515657.449999988) 709882968.5 ns (2196415.563749969) 0.96
tpch_q8/vortex-in-memory-no-pushdown 221547384.6333333 ns (1463400.8795833588) 216499540.4666667 ns (407087.606250003) 1.02
tpch_q8/vortex-in-memory-pushdown 235867022.96666664 ns (1676926.2275000066) 231304794.3 ns (731021.2262500077) 1.02
tpch_q8/arrow 211831987.89999998 ns (1305503.7104166597) 209799291.13333336 ns (228377.90958333015) 1.01
tpch_q8/parquet 481720088.95 ns (3538877.700000018) 471938402.95 ns (428184.2631250024) 1.02
tpch_q8/vortex-file-compressed 295168323.25 ns (2812695.411249995) 280050937.6 ns (802685.8275000155) 1.05
tpch_q8/vortex-file-uncompressed 269947969.05 ns (766552.9662500024) 271224586.3 ns (2748677.349999994) 1.00
tpch_q9/vortex-in-memory-no-pushdown 404025900.2 ns (1012465.1587499678) 398763884.55 ns (606670.7100000083) 1.01
tpch_q9/vortex-in-memory-pushdown 408073503.2 ns (1200565.7068749964) 402653711.5 ns (789469.1125000119) 1.01
tpch_q9/arrow 403262871.15 ns (1988262.300000012) 386242622.8 ns (467136.0474999845) 1.04
tpch_q9/parquet 695777931.1 ns (1588168.5200000405) 687930496.8 ns (2068533.167500019) 1.01
tpch_q9/vortex-file-compressed 450631564.4 ns (1347850.0193750262) 490490909.9 ns (4587771.274999976) 0.92
tpch_q9/vortex-file-uncompressed 452513905.5 ns (1175677.951249987) 427739878.4 ns (1360363.6443749964) 1.06
tpch_q10/vortex-in-memory-no-pushdown 228913729.73333335 ns (788363.8670833409) 227640478.8 ns (426119.2675000131) 1.01
tpch_q10/vortex-in-memory-pushdown 259905157 ns (1226705.575000003) 259158731.05 ns (542610.7806249857) 1.00
tpch_q10/arrow 217983725.46666664 ns (410287.75) 219644916.0333333 ns (402043.1166666746) 0.99
tpch_q10/parquet 478162448.8 ns (3113888.1581249833) 481744845.1 ns (1091041.915625006) 0.99
tpch_q10/vortex-file-compressed 461222996.15 ns (995848.6943750083) 498007282.5 ns (836865.8218749762) 0.93
tpch_q10/vortex-file-uncompressed 358639247.6 ns (1628960.966874987) 358158489.1 ns (1272176.5775000155) 1.00
tpch_q11/vortex-in-memory-no-pushdown 176677011.8841667 ns (649112.8595208079) 176201896.47880954 ns (218743.26687203348) 1.00
tpch_q11/vortex-in-memory-pushdown 177835797.6947222 ns (1421763.9273645729) 174405472.49857143 ns (328256.40597617626) 1.02
tpch_q11/arrow 174891843.7752381 ns (1506962.9685952514) 173686421.69896823 ns (118107.88274204731) 1.01
tpch_q11/parquet 184529459.66666666 ns (440586.2145833522) 182893056.1 ns (282001.2429166585) 1.01
tpch_q11/vortex-file-compressed 233007656.9333333 ns (2791903.5779166967) 229187324.4 ns (454563.3666666448) 1.02
tpch_q11/vortex-file-uncompressed 234983167.5 ns (2036627.8312499821) 226893534.6 ns (453315.15958334506) 1.04
tpch_q12/vortex-in-memory-no-pushdown 197849551.03333336 ns (256940.27208332717) 199192144.36666667 ns (274587.5150000155) 0.99
tpch_q12/vortex-in-memory-pushdown 238737704.46666664 ns (1539416.157083273) 241476097.26666665 ns (212400.60708335042) 0.99
tpch_q12/arrow 165595479.6986111 ns (85726.60472221673) 166501879.39468256 ns (316050.20115078986) 0.99
tpch_q12/parquet 353268889.4 ns (435232.1599999964) 358736708.1 ns (461691.74375000596) 0.98
tpch_q12/vortex-file-compressed 666080779.8 ns (2742186.0775000453) 638034161.6 ns (1608178.5224999785) 1.04
tpch_q12/vortex-file-uncompressed 349060660.75 ns (2778898.3787499964) 349798942.45 ns (798031.4474999905) 1.00
tpch_q13/vortex-in-memory-no-pushdown 170158455.8904365 ns (694741.3213888854) 160633321.87829366 ns (699312.4429965317) 1.06
tpch_q13/vortex-in-memory-pushdown 170451841.79718253 ns (2125443.6992857307) 161011706.54615077 ns (567468.7762465179) 1.06
tpch_q13/arrow 168016234.20936507 ns (3924952.428333342) 157790092.14884922 ns (545367.7031745911) 1.06
tpch_q13/parquet 343119488.35 ns (4912990.951249987) 298407405.15 ns (1468582.1349999905) 1.15
tpch_q13/vortex-file-compressed 204154072.63333333 ns (1380477.670416668) 198416178.20000002 ns (483981.2841666788) 1.03
tpch_q13/vortex-file-uncompressed 200539312.43333337 ns (2056568.4554166645) 187454955.4 ns (822220.3666666448) 1.07
tpch_q14/vortex-in-memory-no-pushdown 44076730.36670635 ns (309899.5197266862) 44143741.00785714 ns (271624.87781845033) 1.00
tpch_q14/vortex-in-memory-pushdown 83548994.6325 ns (254576.8795833215) 84703962.09311508 ns (306871.4231063947) 0.99
tpch_q14/arrow 37942758.99720899 ns (269859.56494709104) 37278512.90113756 ns (150446.57095006853) 1.02
tpch_q14/parquet 222684254.36666664 ns (821535.846666649) 221577115 ns (863962.119999975) 1.00
tpch_q14/vortex-file-compressed 151904207.58384922 ns (369751.5963561535) 119302874.86869049 ns (225605.79097618908) 1.27
tpch_q14/vortex-file-uncompressed 153544726.65194446 ns (532171.7216041684) 152716136.12607142 ns (583361.3146339357) 1.01
tpch_q15/vortex-in-memory-no-pushdown 71631453.842123 ns (601208.2583422586) 71102519.32591268 ns (341839.73190475255) 1.01
tpch_q15/vortex-in-memory-pushdown 115833211.27575397 ns (149956.23130208254) 116053504.27646828 ns (350781.06618800014) 1.00
tpch_q15/arrow 61826329.69549602 ns (181084.83049950376) 63463992.7733135 ns (113425.86143377796) 0.97
tpch_q15/parquet 293622008.2 ns (730171.775000006) 295865698.85 ns (1268373.9350000024) 0.99
tpch_q15/vortex-file-compressed 268207800.5 ns (3397565.3712499887) 218678251.63333336 ns (922927.5970833302) 1.23
tpch_q15/vortex-file-uncompressed 304148818.25 ns (2123822.898750007) 306757192.9 ns (674798.3962499797) 0.99
tpch_q16/vortex-in-memory-no-pushdown 104367426.80857143 ns (150554.94700000435) 107813866.56325397 ns (593385.0507013947) 0.97
tpch_q16/vortex-in-memory-pushdown 123158476.01456349 ns (180530.7770163566) 127293055.67365082 ns (372717.95785316825) 0.97
tpch_q16/arrow 103853046.4422619 ns (127796.55751489103) 106085699.78051588 ns (131293.1899523735) 0.98
tpch_q16/parquet 119509234.78968255 ns (173208.06278374046) 124453643.45111112 ns (464117.36039581895) 0.96
tpch_q16/vortex-file-compressed 135929751.38337302 ns (262443.3466130942) 139376366.58194444 ns (413071.6788749993) 0.98
tpch_q16/vortex-file-uncompressed 132010796.37912698 ns (505463.881244041) 133320098.22015874 ns (334279.73054364324) 0.99
tpch_q17/vortex-in-memory-no-pushdown 543347568.6 ns (3231269.9724999666) 571296785 ns (7044452.014999986) 0.95
tpch_q17/vortex-in-memory-pushdown 632224319.2 ns (10861890.508749962) 640229173.7 ns (8158950.285000026) 0.99
tpch_q17/arrow 540630527.3 ns (7139084.882499993) 555781734 ns (3261612.1212500334) 0.97
tpch_q17/parquet 579640242.4 ns (1502871.6924999952) 600748516.4 ns (1517490.2087500095) 0.96
tpch_q17/vortex-file-compressed 606613860.1 ns (1162961.6500000358) 717715811.7 ns (2182131.816250026) 0.85
tpch_q17/vortex-file-uncompressed 601304464.3 ns (2013597.5850000381) 620855146.6 ns (1961284.25) 0.97
tpch_q18/vortex-in-memory-no-pushdown 972246032.6 ns (2336560.887499988) 1031753735.4 ns (2083014.863749981) 0.94
tpch_q18/vortex-in-memory-pushdown 985346161.5 ns (2958554.75) 1017809564.4 ns (6143617.051249981) 0.97
tpch_q18/arrow 982714488.9 ns (4096792.699999988) 1018223029.1 ns (5481618.336250007) 0.97
tpch_q18/parquet 1159727620.3 ns (1734829.5637500286) 1218825439 ns (5293612.889999986) 0.95
tpch_q18/vortex-file-compressed 1059604276 ns (7378093.727500021) 1167733945.9 ns (6749840.63499999) 0.91
tpch_q18/vortex-file-uncompressed 1008767942 ns (3330076.79125005) 1069954936 ns (2063990.9387500286) 0.94
tpch_q19/vortex-in-memory-no-pushdown 160894213.30146825 ns (974848.60857144) 160840263.660873 ns (362592.2883551419) 1.00
tpch_q19/vortex-in-memory-pushdown 253890578.65 ns (2411939.4131250083) 241633281.3666667 ns (347149.8941666484) 1.05
tpch_q19/arrow 151596372.82119048 ns (2123696.1402380913) 150349279.01329365 ns (241819.16418899596) 1.01
tpch_q19/parquet 469678341.65 ns (428400.71125000715) 482316417.35 ns (836343.724999994) 0.97
tpch_q19/vortex-file-compressed 618043129.7 ns (1190738.2737500072) 894070468.7 ns (2971395.776250005) 0.69
tpch_q19/vortex-file-uncompressed 328293237.45 ns (1958089.276875019) 319514048.2 ns (1086060.6943750083) 1.03
tpch_q20/vortex-in-memory-no-pushdown 239554866.89999995 ns (537700.2887500376) 247658234.56666666 ns (772735.346666649) 0.97
tpch_q20/vortex-in-memory-pushdown 263343961.9 ns (1660078.5868750066) 269684231.7 ns (718561.8675000072) 0.98
tpch_q20/arrow 237079870.00000006 ns (318591.74958333373) 242394940.83333334 ns (1308852.740833342) 0.98
tpch_q20/parquet 355459417.5 ns (598984.9712499678) 362000923.15 ns (1338120.449999988) 0.98
tpch_q20/vortex-file-compressed 377955166.65 ns (770536.0749999881) 396112548.5 ns (1253650.9837500155) 0.95
tpch_q20/vortex-file-uncompressed 384284369.9 ns (2060684.426249981) 391926561.7 ns (1310734.025000006) 0.98
tpch_q21/vortex-in-memory-no-pushdown 840921765.6 ns (1319720.1299999356) 855512535.1 ns (2098939.0999999642) 0.98
tpch_q21/vortex-in-memory-pushdown 875349476.4 ns (4178137.2537499666) 902522305.2 ns (2145315.588750005) 0.97
tpch_q21/arrow 814481930.8 ns (3298057.907499969) 830618923.8 ns (2733076.9000000358) 0.98
tpch_q21/parquet 948760941.1 ns (1268088.7262500525) 989790126.1 ns (3453532.151250005) 0.96
tpch_q21/vortex-file-compressed 1264295115.4 ns (4158925.4500000477) 1479086340.4 ns (3183734.451249957) 0.85
tpch_q21/vortex-file-uncompressed 1102324812.8 ns (9339402.5) 1106202697.8 ns (3866813.973749876) 1.00
tpch_q22/vortex-in-memory-no-pushdown 67110778.33174601 ns (466987.9226984121) 67601454.64011905 ns (172759.92097470164) 0.99
tpch_q22/vortex-in-memory-pushdown 67188453.95007937 ns (426674.1819047667) 67973963.65410714 ns (281457.08708333224) 0.99
tpch_q22/arrow 66432923.779265866 ns (328829.5441567488) 66780137.09359126 ns (143333.9966768399) 0.99
tpch_q22/parquet 96089410.4411508 ns (606749.7875669673) 93906713.42654762 ns (201391.85263094306) 1.02
tpch_q22/vortex-file-compressed 106273983.67408729 ns (605659.9452316388) 102271299.23936507 ns (278414.4775734171) 1.04
tpch_q22/vortex-file-uncompressed 102307948.45325397 ns (338022.55269542336) 101486721.17309524 ns (293698.39450000226) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@danking danking added the benchmark Run benchmarks on this branch label Oct 3, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Oct 3, 2024
danking added a commit that referenced this pull request Oct 3, 2024
We assume elsewhere that this statistic is a u64 (usize is
auto-converted to u64 when we convert to Scalar). I am not
entirely sure how I triggered this, but it happens on PR
#969.
danking added a commit that referenced this pull request Oct 3, 2024
We assume elsewhere that this statistic is a u64 (usize is
auto-converted to u64 when we convert to Scalar). I am not
entirely sure how I triggered this, but it happens on PR
#969.
@danking danking added the benchmark Run benchmarks on this branch label Oct 3, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Oct 3, 2024
danking added a commit that referenced this pull request Oct 3, 2024
We assume elsewhere that this statistic is a u64 (usize is
auto-converted to u64 when we convert to Scalar). I am not entirely sure
how I triggered this, but it happens on PR #969.
0
}

fn can_compress(&self, _array: &Array) -> Option<&dyn EncodingCompressor> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could say here that you can only compress chunked arrays and remove the filter in the main compressor function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do that, I'll never be called because the sample is never a chunked array :|

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, forgot that sample is contiguouos

@danking danking added the benchmark Run benchmarks on this branch label Oct 4, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Oct 4, 2024
@danking
Copy link
Member Author

danking commented Oct 4, 2024

A summary of the vortex compression situation as of 52800a2

Benchmark suite vortex:zstd PR vortex:zstd PR:develop time PR:develop
taxi 1.02 1.08 0.78
AirlineSentiment 5.20 0.78 0.95
Arade 0.49 1.03 0.82
Bimbo 1.38 1.16 0.89
CMSprovider 1.19 1.08 0.86
Euro2016 1.42 1.03 1.06
Food 1.37 1.10 0.89
HashTags 1.67 1.10 0.93
l_comment 1.35 1.16 0.86

@robert3005
Copy link
Member

I don't quite follow the table you've made, are the first two columns compression ratios vs what and the last one is time vs?

@danking
Copy link
Member Author

danking commented Oct 4, 2024

  1. First column is, in this PR, what's the ratio of Vortex (compressed) size to Parquet-with-zstd size.
  2. Second column is a ratio of ratios of sizes: (vortex-PR/parquet_zstd) / (vortex-develop/parquet_zstd)
  3. Third column is time to run Vortex compress in this PR divided by that time using origin/develop.

@danking danking force-pushed the dk/carry-forward-chunk-information branch from 52800a2 to 344003e Compare October 8, 2024 16:35
@danking danking closed this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants