Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: specialized IntoCanonical for DictArray utf8/binary #1146

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Oct 28, 2024

Fixes #1041

Change the IntoCanonical implementation for DictArray to do canonicalize-then-take for stringy things, take-then-canonicalize for everything else.

It is always the case for strings that canonicalize-then-take will be faster, regardless of the compression of the values array.

Overrides #1136

@lwwmanning lwwmanning added the benchmark Run benchmarks on this branch label Oct 28, 2024
@github-actions github-actions bot removed benchmark Run benchmarks on this branch labels Oct 28, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex bytes_at

Benchmark suite Current: 5afe635 Previous: bdfcbd6 Ratio
bytes_at/array_data 697.2035264339002 ns (1.0345124769200424) 692.5297271371617 ns (1.7033763327987685) 1.01
bytes_at/array_view 491.8111729009432 ns (0.8359145179388747) 482.5191545420025 ns (1.2900430974009396) 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFusion

Benchmark suite Current: 5afe635 Previous: bdfcbd6 Ratio
arrow/planning 777311.5328924098 ns (1654.2085635276162) 775438.8272732662 ns (3344.9368190071546) 1.00
arrow/exec 1388702.5241269725 ns (4916.906633442966) 1376867.925229548 ns (5187.853448298876) 1.01
vortex-pushdown-compressed/planning 484950.71124583506 ns (1340.1055007358664) 486051.2571730659 ns (1432.3657346233667) 1.00
vortex-pushdown-compressed/exec 2727776.9394736844 ns (10598.687684210017) 2595844.7185000004 ns (9742.701699999627) 1.05
vortex-pushdown-uncompressed/planning 486028.3225913006 ns (1237.0291404857417) 485624.59246180125 ns (1057.78081983648) 1.00
vortex-pushdown-uncompressed/exec 2552043.718000001 ns (4212.950231249211) 2914516.0122222216 ns (3337.2452569443267) 0.88
vortex-nopushdown-compressed/planning 794841.3065952745 ns (1639.266759253107) 794454.4454222171 ns (1459.8778262784472) 1.00
vortex-nopushdown-compressed/exec 3279475.088125 ns (23815.670421875082) 3047885.2417647066 ns (26826.62084558769) 1.08
vortex-nopushdown-uncompressed/planning 792021.5814324587 ns (1781.315408375871) 788912.2457652066 ns (1640.1507408049656) 1.00
vortex-nopushdown-uncompressed/exec 4885931.494545454 ns (20884.96971590817) 6417048.4 ns (147884.85139062535) 0.76

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random Access

Benchmark suite Current: 5afe635 Previous: bdfcbd6 Ratio
random-access/vortex-tokio-local-disk 1080657.4189667262 ns (9761.322254059487) 1090611.2755052792 ns (10444.696077970671) 0.99
random-access/vortex-local-fs 1194928.5339299804 ns (9121.457004190423) 1206490.2726040732 ns (11737.176186817931) 0.99
random-access/parquet-tokio-local-disk 290255005.45 ns (7657748.599999994) 276279418.45 ns (3665822.925000012) 1.05

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPC-H

Benchmark suite Current: 5afe635 Previous: bdfcbd6 Ratio
tpch_q1/vortex-in-memory-no-pushdown 570773058.4 ns (2106154.800000012) 573394744 ns (4864517.800000012) 1.00
tpch_q1/vortex-in-memory-pushdown 455200665.35 ns (1255713.7606250048) 455668941.65 ns (2921010.924999982) 1.00
tpch_q1/arrow 549623860.6 ns (2695741.889999926) 570030521 ns (2546937.311250031) 0.96
tpch_q1/parquet 688274626.5 ns (2636673.399999976) 712943367.5 ns (2538814.9950000644) 0.97
tpch_q1/vortex-file-compressed 511170156.1 ns (2933684.851249993) 531527946.8 ns (2901847.1099999845) 0.96
tpch_q1/vortex-file-uncompressed 532232107.2 ns (3913748.450000018) 540725848.3 ns (3259328.800000012) 0.98
tpch_q2/vortex-in-memory-no-pushdown 121782192.1870238 ns (583039.1120907739) 123021227.04416668 ns (589598.8849999905) 0.99
tpch_q2/vortex-in-memory-pushdown 120540912.6131349 ns (339131.9614087269) 119991059.10531747 ns (613116.6134523824) 1.00
tpch_q2/arrow 118969349.08325395 ns (680132.6953283623) 117606760.81511906 ns (774038.1604136974) 1.01
tpch_q2/parquet 150694696.53186506 ns (510763.4880530685) 149223734.08547622 ns (503331.5955386758) 1.01
tpch_q2/vortex-file-compressed 172848141.16511905 ns (855514.7379642874) 167618689.21702382 ns (1725320.9509821385) 1.03
tpch_q2/vortex-file-uncompressed 173972221.7411905 ns (1713944.371357128) 173631162.62630954 ns (2922208.173400283) 1.00
tpch_q3/vortex-in-memory-no-pushdown 159596655.67019844 ns (1021844.3692564517) 164086155.80388886 ns (1169102.3521666974) 0.97
tpch_q3/vortex-in-memory-pushdown 179680251.25365075 ns (733476.6287628859) 176914210.17277777 ns (1438711.8327777833) 1.02
tpch_q3/arrow 151194789.6304365 ns (780171.956051603) 152153570.3161508 ns (824016.8217718303) 0.99
tpch_q3/parquet 339807501.35 ns (1892272.175000012) 342691857.9 ns (1338216.4587499797) 0.99
tpch_q3/vortex-file-compressed 296073120.25 ns (989560.4081249833) 299769011.55 ns (1838724.009375006) 0.99
tpch_q3/vortex-file-uncompressed 249213296.9 ns (1048297.087500006) 246825536 ns (1071502.4383333325) 1.01
tpch_q4/vortex-in-memory-no-pushdown 111635363.79412699 ns (480290.36514980346) 111406677.37392858 ns (747314.2480952367) 1.00
tpch_q4/vortex-in-memory-pushdown 132609572.49523811 ns (598436.6998184547) 129213099.00218253 ns (573740.6067614108) 1.03
tpch_q4/arrow 186556129.43333334 ns (811728.0216666758) 179352696.26666665 ns (858728.6137499958) 1.04
tpch_q4/parquet 204715583.93333334 ns (509674.4487500042) 202967205.39999998 ns (830380.3320833296) 1.01
tpch_q4/vortex-file-compressed 254413889.55 ns (1159970.631249994) 256492837.85 ns (578803.753124997) 0.99
tpch_q4/vortex-file-uncompressed 206755893.43333334 ns (752720.3804166764) 204165062.49999997 ns (2173338.5387500077) 1.01
tpch_q5/vortex-in-memory-no-pushdown 289632503 ns (1043692.2993749976) 292985495.5 ns (3240136.104375005) 0.99
tpch_q5/vortex-in-memory-pushdown 291917017.5 ns (2425124.253750026) 295550515.35 ns (1116555.6887499988) 0.99
tpch_q5/arrow 274747247.45 ns (1668627.125) 269310877.9 ns (1693440.2281250209) 1.02
tpch_q5/parquet 449888077.55 ns (2296995.731250018) 452566744.65 ns (2260832.1243749857) 0.99
tpch_q5/vortex-file-compressed 369698173.75 ns (2829052.7943749726) 374666616.2 ns (1875753.4162499905) 0.99
tpch_q5/vortex-file-uncompressed 343091214 ns (3333492.934374988) 349653440.1 ns (4826022.491250008) 0.98
tpch_q6/vortex-in-memory-no-pushdown 35124084.760238096 ns (176017.827269841) 35563347.67363757 ns (510885.18542493135) 0.99
tpch_q6/vortex-in-memory-pushdown 72451312.85232142 ns (178753.48851711303) 70473274.93563493 ns (685923.7586775869) 1.03
tpch_q6/arrow 25737260.56896826 ns (140333.0628437493) 25032757.004295632 ns (184906.6697625257) 1.03
tpch_q6/parquet 138683448.82218254 ns (322131.2982366234) 136531325.13222224 ns (438144.3574861139) 1.02
tpch_q6/vortex-file-compressed 71753481.7924008 ns (315423.48731101304) 70576991.16480158 ns (813808.8850882947) 1.02
tpch_q6/vortex-file-uncompressed 171619145.30246034 ns (295723.43465179205) 166166979.0784127 ns (709192.3158997744) 1.03
tpch_q7/vortex-in-memory-no-pushdown 556362203.4 ns (3287022.305000007) 567669367.9 ns (5212412.086249948) 0.98
tpch_q7/vortex-in-memory-pushdown 592327540.3 ns (3790258.2562499642) 575248423.8 ns (2442989.244999945) 1.03
tpch_q7/arrow 542818180.1 ns (3555426.9124999642) 548914267 ns (6410116.282500029) 0.99
tpch_q7/parquet 687593463.4 ns (2833022.5649999976) 685249996.5 ns (5639491.0737499595) 1.00
tpch_q7/vortex-file-compressed 707595550 ns (3475939.9850000143) 706113831.9 ns (8526621.149999976) 1.00
tpch_q7/vortex-file-uncompressed 680614753.7 ns (4089104.6025000215) 660708293.1 ns (6122273.069999993) 1.03
tpch_q8/vortex-in-memory-no-pushdown 223892246.2666667 ns (795491.7245833427) 222517032.86666664 ns (2886199.575000018) 1.01
tpch_q8/vortex-in-memory-pushdown 229737976.4666666 ns (1043785.0641666651) 231218702.2 ns (2127959.816666648) 0.99
tpch_q8/arrow 209146846.46666667 ns (1486089.087500006) 216556021.83333334 ns (1332351.2833333164) 0.97
tpch_q8/parquet 489380883.05 ns (2188138.930000007) 493815160.1 ns (2114737.2037499845) 0.99
tpch_q8/vortex-file-compressed 314302134.6 ns (1268904.0431249738) 319835528.7 ns (3325706.0068750083) 0.98
tpch_q8/vortex-file-uncompressed 300066906.45 ns (3749876.4212499857) 308724781.1 ns (4100477.7150000036) 0.97
tpch_q9/vortex-in-memory-no-pushdown 427495192.8 ns (2489293.2974999845) 431257076.05 ns (6340404.535625011) 0.99
tpch_q9/vortex-in-memory-pushdown 433129674.15 ns (1543785.0693750083) 436529130.7 ns (3495190.623750001) 0.99
tpch_q9/arrow 407628782.6 ns (2053994.5249999762) 398552893.35 ns (2674936.012499988) 1.02
tpch_q9/parquet 709546304.4 ns (2901780.457499981) 707835197.8 ns (5657332.732500017) 1.00
tpch_q9/vortex-file-compressed 514352915.7 ns (3790535.9387500286) 530436315.3 ns (6713835.269999981) 0.97
tpch_q9/vortex-file-uncompressed 498887365.45 ns (2453271.5431250036) 481572829.85 ns (2600420.103125006) 1.04
tpch_q10/vortex-in-memory-no-pushdown 284651576.35 ns (1830025.1837500036) 286588533.5 ns (2918616.2524999976) 0.99
tpch_q10/vortex-in-memory-pushdown 309532575.55 ns (1315528.6850000322) 309786297.75 ns (1831683.4206249714) 1.00
tpch_q10/arrow 270339142.65 ns (1282222.1499999762) 267193689.05 ns (1797554.7931249887) 1.01
tpch_q10/parquet 507284763.7 ns (991707.9224999845) 511117133.8 ns (3766806.201249987) 0.99
tpch_q10/vortex-file-compressed 418020149.3 ns (1088588.409375012) 413089546.25 ns (2414368.775000006) 1.01
tpch_q10/vortex-file-uncompressed 391929016.45 ns (1007590.6868749857) 391503041.05 ns (3082767.882499993) 1.00
tpch_q11/vortex-in-memory-no-pushdown 181768161.67634922 ns (780283.4822916836) 185130028.77420634 ns (1923846.3774295747) 0.98
tpch_q11/vortex-in-memory-pushdown 180065555.01007938 ns (931606.6174752116) 181338161.26666668 ns (838028.0258333236) 0.99
tpch_q11/arrow 178607343.3732143 ns (919216.8553973138) 176572455.9773016 ns (1125413.572777778) 1.01
tpch_q11/parquet 190979380.20000002 ns (973430.9337500185) 191431692.56666666 ns (2511448.96541664) 1.00
tpch_q11/vortex-file-compressed 274777438.3 ns (2701303.5275000036) 271974904.8 ns (4261212.934374988) 1.01
tpch_q11/vortex-file-uncompressed 278099071.65 ns (2336204.4056250155) 265500128.85 ns (1517668.8450000137) 1.05
tpch_q12/vortex-in-memory-no-pushdown 233174504.6 ns (542508.6333333254) 232927322.4333333 ns (465253.9333333224) 1.00
tpch_q12/vortex-in-memory-pushdown 254093143.9 ns (479642.58437500894) 250332497.15 ns (431029.7974999994) 1.02
tpch_q12/arrow 187494062.23333335 ns (442706.94750000536) 188817179.56666666 ns (572709.4554166794) 0.99
tpch_q12/parquet 337305470.7 ns (631262.5) 335804901.8 ns (1326391.3999999762) 1.00
tpch_q12/vortex-file-compressed 402438471 ns (1700966.224999994) 399127012.55 ns (1401216.746874988) 1.01
tpch_q12/vortex-file-uncompressed 392080608.45 ns (1614870.1081250012) 386393407.9 ns (2522397.1818749905) 1.01
tpch_q13/vortex-in-memory-no-pushdown 188424110.7 ns (1535876.4224999696) 178361150.40000004 ns (3397197.102916658) 1.06
tpch_q13/vortex-in-memory-pushdown 185881256.00000003 ns (1584745.9833333194) 170851970.3333333 ns (1985171.396666661) 1.09
tpch_q13/arrow 182051176.7 ns (2096931.9616666883) 181425300.04555553 ns (6245968.964027762) 1.00
tpch_q13/parquet 342149186.9 ns (1046324.821875006) 331944977.15 ns (4474146.939999998) 1.03
tpch_q13/vortex-file-compressed 211818703.4 ns (1002436.0166666657) 203791348.1666667 ns (2212622.1729166657) 1.04
tpch_q13/vortex-file-uncompressed 212041243.83333334 ns (2186996.6558333486) 213318498.30000004 ns (3138494.00166665) 0.99
tpch_q14/vortex-in-memory-no-pushdown 45255136.88380952 ns (523685.76440476626) 43987749.07973545 ns (324784.0576769188) 1.03
tpch_q14/vortex-in-memory-pushdown 74545716.0914881 ns (288528.8495401889) 74641903.03222223 ns (650821.8111388981) 1.00
tpch_q14/arrow 36506635.75935185 ns (280924.54196180776) 38253329.868928574 ns (635981.4514191486) 0.95
tpch_q14/parquet 229483810.9333333 ns (741134.2933333218) 228553378.7333333 ns (965999.8512500077) 1.00
tpch_q14/vortex-file-compressed 118588744.5901984 ns (626710.2384637967) 118910864.30484128 ns (775294.3688601032) 1.00
tpch_q14/vortex-file-uncompressed 137179200.4861905 ns (648246.7320118994) 136362924.99662697 ns (688815.7658224106) 1.01
tpch_q15/vortex-in-memory-no-pushdown 72542078.4248611 ns (266261.72333506495) 71905784.61974205 ns (510528.9306222722) 1.01
tpch_q15/vortex-in-memory-pushdown 103407420.12809524 ns (346921.99738094956) 104288631.27388889 ns (311793.6344444379) 0.99
tpch_q15/arrow 57580112.46692459 ns (392418.11117361486) 59296913.413908735 ns (409870.7478707805) 0.97
tpch_q15/parquet 300212954.35 ns (955567.5643750131) 300591926.4 ns (1290761.1150000095) 1.00
tpch_q15/vortex-file-compressed 221460109.53333336 ns (772843.7608333528) 222092939.9 ns (1168105) 1.00
tpch_q15/vortex-file-uncompressed 254426243.3 ns (1721387.059375003) 252712051.6 ns (2846026.684375003) 1.01
tpch_q16/vortex-in-memory-no-pushdown 109265346.4145238 ns (289126.5476190448) 110750897.46416667 ns (1305949.247374989) 0.99
tpch_q16/vortex-in-memory-pushdown 120517332.80599205 ns (477143.47807044536) 121221150.31111112 ns (1007182.0997291729) 0.99
tpch_q16/arrow 107151244.05503969 ns (318740.4882614091) 106060014.1547619 ns (421455.0977380872) 1.01
tpch_q16/parquet 118943365.70900795 ns (584143.514107123) 119596403.39166665 ns (574495.9249166548) 0.99
tpch_q16/vortex-file-compressed 134399616.03761905 ns (729178.8910862952) 135149025.8560714 ns (547796.9648273885) 0.99
tpch_q16/vortex-file-uncompressed 135243227.6163492 ns (728796.0134553611) 133320945.7898016 ns (433319.4379667565) 1.01
tpch_q17/vortex-in-memory-no-pushdown 582875826 ns (8088361.912500024) 566009513.9 ns (18644448.68874997) 1.03
tpch_q17/vortex-in-memory-pushdown 655269437.4 ns (6686080.476249993) 634889028.4 ns (12569915.25) 1.03
tpch_q17/arrow 562419664.7 ns (7171977.840000033) 516671414.25 ns (7283011.96875) 1.09
tpch_q17/parquet 643888047.5 ns (4322590.5) 651493467.8 ns (7559845.6362499595) 0.99
tpch_q17/vortex-file-compressed 643606087.4 ns (4637169.582499981) 618334858.8 ns (8513670.128750026) 1.04
tpch_q17/vortex-file-uncompressed 613834015.8 ns (4175465.315000057) 616069907.5 ns (5413667.810000002) 1.00
tpch_q18/vortex-in-memory-no-pushdown 1099716903.2 ns (7931409.192499995) 1068967468.4 ns (15075958.840000033) 1.03
tpch_q18/vortex-in-memory-pushdown 1104323828.3 ns (4523527.028749943) 1066799670.2 ns (8855656) 1.04
tpch_q18/arrow 1105096144.3 ns (5742992.99000001) 1069127105.2 ns (17995984.246250033) 1.03
tpch_q18/parquet 1264909907.1 ns (7208558.301249981) 1245860925.9 ns (8104210.826250076) 1.02
tpch_q18/vortex-file-compressed 1166749790.6 ns (7512307.100000024) 1142182667.6 ns (12020210.050000072) 1.02
tpch_q18/vortex-file-uncompressed 1125279203 ns (5778185.081249952) 1120022046.6 ns (18764469.75) 1.00
tpch_q19/vortex-in-memory-no-pushdown 180606079.98261905 ns (267738.61927084625) 181044095.69869047 ns (307798.17158332467) 1.00
tpch_q19/vortex-in-memory-pushdown 260787816.2 ns (525260.487499997) 249184254.65 ns (415805.80687500536) 1.05
tpch_q19/arrow 165088325.4576587 ns (572905.5928913653) 167696989.3278175 ns (706608.2395491004) 0.98
tpch_q19/parquet 456493200.75 ns (629476.8512500226) 452039036.2 ns (1702806.099999994) 1.01
tpch_q19/vortex-file-compressed 354595222.45 ns (1055043.4493749738) 352017068.4 ns (1725305.553124994) 1.01
tpch_q19/vortex-file-uncompressed 363679207.95 ns (1414059.5837500095) 358767870.45 ns (1560466.2043749988) 1.01
tpch_q20/vortex-in-memory-no-pushdown 257007060.5 ns (1882806.4731249958) 252558044.65 ns (3775143.375) 1.02
tpch_q20/vortex-in-memory-pushdown 273637435.9 ns (1601897.698750019) 280646015.8 ns (4270053.212500006) 0.98
tpch_q20/arrow 245567123.7 ns (1420588.7629166692) 248845346 ns (1695215.8604166657) 0.99
tpch_q20/parquet 365779043.55 ns (2211501.199999988) 356077619.55 ns (2877032.6287499964) 1.03
tpch_q20/vortex-file-compressed 362970499.45 ns (2868546.464999974) 362348946.85 ns (5796122.703125) 1.00
tpch_q20/vortex-file-uncompressed 379946002.45 ns (2816570.425000012) 374641735.25 ns (6679745.107500017) 1.01
tpch_q21/vortex-in-memory-no-pushdown 894619636.2 ns (5423043.204999983) 892008277.1 ns (10589400.462500036) 1.00
tpch_q21/vortex-in-memory-pushdown 924248687.2 ns (5276795.550000012) 925122104.6 ns (8746447.675000072) 1.00
tpch_q21/arrow 868832890.4 ns (5342766.449999988) 867150181.5 ns (7643214.850000024) 1.00
tpch_q21/parquet 999052790.3 ns (5317808.953750014) 973254517.8 ns (11620389.806249976) 1.03
tpch_q21/vortex-file-compressed 1214240407.6 ns (7222012.158749938) 1219601061 ns (14858211) 1.00
tpch_q21/vortex-file-uncompressed 1069046554.4 ns (6247917.11500001) 1065788995.1 ns (6465167.610000014) 1.00
tpch_q22/vortex-in-memory-no-pushdown 77161903.83017856 ns (155705.0986108631) 76367538.54660714 ns (187439.69270832837) 1.01
tpch_q22/vortex-in-memory-pushdown 76582602.38575396 ns (166405.92642857134) 76401684.01152779 ns (463706.3655989617) 1.00
tpch_q22/arrow 75288424.14238094 ns (149086.445404768) 75535092.60785714 ns (323235.9124642834) 1.00
tpch_q22/parquet 94343645.92309524 ns (306409.25552678853) 96362100.22361112 ns (481560.3551562503) 0.98
tpch_q22/vortex-file-compressed 120754566.670873 ns (357525.1540079415) 121824378.32257938 ns (533160.4641007036) 0.99
tpch_q22/vortex-file-uncompressed 118617815.01626983 ns (500910.7960098982) 118432190.62603173 ns (564208.2624007985) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex Compression

Benchmark suite Current: 5afe635 Previous: bdfcbd6 Ratio
compress time/taxi 1192692487.7 ns (2850773.5675001144) 1226799217 ns (1866513.9500000477) 0.97
compress time/taxi throughput 470808924 bytes 470808924 bytes 1
parquet_rs-zstd compress time/taxi 1755578763.2 ns (3153127.6437500715) 1784378010.7 ns (4951053.496250033) 0.98
parquet_rs-zstd compress time/taxi throughput 470808924 bytes 470808924 bytes 1
decompress time/taxi 406400705.55 ns (2613266.6793750226) 414520715.25 ns (1276485.5037500262) 0.98
decompress time/taxi throughput 470808924 bytes 470808924 bytes 1
parquet_rs-zstd decompress time/taxi 307140646.35 ns (517498.1118749678) 310460713.7 ns (549347.7893749774) 0.99
parquet_rs-zstd decompress time/taxi throughput 470808924 bytes 470808924 bytes 1
vortex:parquet-zstd size/taxi 0.9370256077422754 ratio 0.9452736656242109 ratio 0.99
vortex:raw size/taxi 0.1113757818235408 ratio 0.11235506657473637 ratio 0.99
vortex size/taxi 52436712 bytes 52897768 bytes 0.99
compress time/AirlineSentiment 686229.8715153957 ns (1589.8372746598907) 685416.7848442534 ns (2552.3678446114645) 1.00
compress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
parquet_rs-zstd compress time/AirlineSentiment 56024.304990829354 ns (128.57610950624803) 56263.32408578754 ns (209.47217593040477) 1.00
parquet_rs-zstd compress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
decompress time/AirlineSentiment 39731.852022937 ns (150.41254788429796) 39659.076920731495 ns (58.97398864753268) 1.00
decompress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
parquet_rs-zstd decompress time/AirlineSentiment 32368.31871432523 ns (66.50115771798664) 32320.32062187161 ns (51.803502279701206) 1.00
parquet_rs-zstd decompress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
vortex:parquet-zstd size/AirlineSentiment 6.196483971044468 ratio 6.196483971044468 ratio 1
vortex:raw size/AirlineSentiment 2.9663366336633663 ratio 2.9663366336633663 ratio 1
vortex size/AirlineSentiment 5992 bytes 5992 bytes 1
compress time/Arade 2166297651.5 ns (10484870.971250057) 2142874083.5 ns (2180073.25) 1.01
compress time/Arade throughput 787023760 bytes 787023760 bytes 1
parquet_rs-zstd compress time/Arade 3013345022.8 ns (17317581.861249924) 3060419918.6 ns (9591824.812500238) 0.98
parquet_rs-zstd compress time/Arade throughput 787023760 bytes 787023760 bytes 1
decompress time/Arade 779231178.8 ns (2547882.0887500644) 786225314.5 ns (2458330.7999999523) 0.99
decompress time/Arade throughput 787023760 bytes 787023760 bytes 1
parquet_rs-zstd decompress time/Arade 645415935.1 ns (2787078.8837500215) 645641569.4 ns (1519107.6349999905) 1.00
parquet_rs-zstd decompress time/Arade throughput 787023760 bytes 787023760 bytes 1
vortex:parquet-zstd size/Arade 0.47890649124129325 ratio 0.4789069103731621 ratio 1.00
vortex:raw size/Arade 0.1858328749820717 ratio 0.1858328749820717 ratio 1
vortex size/Arade 146254888 bytes 146254888 bytes 1
compress time/Bimbo 10632923692.2 ns (9495407.32875061) 10353315072.7 ns (12596800.83125019) 1.03
compress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
parquet_rs-zstd compress time/Bimbo 20240520643 ns (49026118.27125168) 21014046233 ns (45056148.5) 0.96
parquet_rs-zstd compress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
decompress time/Bimbo 4984156776.8 ns (21147733.299999714) 4868028175.3 ns (33502125.433750153) 1.02
decompress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
parquet_rs-zstd decompress time/Bimbo 2631222149 ns (9003500.408750057) 2621862761.2 ns (7098532.799999952) 1.00
parquet_rs-zstd decompress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
vortex:parquet-zstd size/Bimbo 1.1848830675342639 ratio 1.2341237843615334 ratio 0.96
vortex:raw size/Bimbo 0.064582497789928 ratio 0.0672663816032813 ratio 0.96
vortex size/Bimbo 459913512 bytes 479026344 bytes 0.96
compress time/CMSprovider 11891849071.1 ns (35822218.69999981) 11916416744.1 ns (10828325.237499237) 1.00
compress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
parquet_rs-zstd compress time/CMSprovider 19906789611.2 ns (48423057.065000534) 20023776078 ns (12905144.849998474) 0.99
parquet_rs-zstd compress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
decompress time/CMSprovider 8053081952.2 ns (39584278.87624979) 7245502799 ns (15776584.056250095) 1.11
decompress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
parquet_rs-zstd decompress time/CMSprovider 4894492839.3 ns (22916283.877500057) 4998257393.8 ns (24864173.079999924) 0.98
parquet_rs-zstd decompress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
vortex:parquet-zstd size/CMSprovider 1.2143059743719786 ratio 1.2016406990745132 ratio 1.01
vortex:raw size/CMSprovider 0.18147218177946356 ratio 0.1795806289506531 ratio 1.01
vortex size/CMSprovider 934422760 bytes 924682920 bytes 1.01
compress time/Euro2016 2669135308.3 ns (7740041.592499971) 2657498045.2 ns (7624071.63499999) 1.00
compress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
parquet_rs-zstd compress time/Euro2016 1570861071.9 ns (4334062.7624999285) 1590130774.9 ns (3411769.3000000715) 0.99
parquet_rs-zstd compress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
decompress time/Euro2016 296623154.5 ns (2114242.701875031) 296577297.1 ns (644323.4356250167) 1.00
decompress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
parquet_rs-zstd decompress time/Euro2016 490274055.75 ns (2881139.625) 487763062.7 ns (1241619.7706249952) 1.01
parquet_rs-zstd decompress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
vortex:parquet-zstd size/Euro2016 1.4383522286595696 ratio 1.4383489987803337 ratio 1.00
vortex:raw size/Euro2016 0.4348484255644533 ratio 0.4348474490943839 ratio 1.00
vortex size/Euro2016 171005544 bytes 171005160 bytes 1.00
compress time/Food 1118893450.3 ns (8904593.657500029) 1141880411.3 ns (2097494.649999976) 0.98
compress time/Food throughput 332718229 bytes 332718229 bytes 1
parquet_rs-zstd compress time/Food 1109451063 ns (3526278.7612499) 1130198029.4 ns (1887444.9500000477) 0.98
parquet_rs-zstd compress time/Food throughput 332718229 bytes 332718229 bytes 1
decompress time/Food 204005747.63333333 ns (854018.9833333343) 202240639.50000003 ns (797375.2587500215) 1.01
decompress time/Food throughput 332718229 bytes 332718229 bytes 1
parquet_rs-zstd decompress time/Food 216093769.1333333 ns (519453.73833332956) 214584080.86666667 ns (445171.6441666633) 1.01
parquet_rs-zstd decompress time/Food throughput 332718229 bytes 332718229 bytes 1
vortex:parquet-zstd size/Food 1.238933256800101 ratio 1.238933256800101 ratio 1
vortex:raw size/Food 0.1349086526906225 ratio 0.1349090374005327 ratio 1.00
vortex size/Food 44886568 bytes 44886696 bytes 1.00
compress time/HashTags 2563933346.4 ns (10075764.533750057) 2560874067 ns (2278455.5137500763) 1.00
compress time/HashTags throughput 804495592 bytes 804495592 bytes 1
parquet_rs-zstd compress time/HashTags 2492356087 ns (7610520.730000019) 2517663613.3 ns (5458188.549999952) 0.99
parquet_rs-zstd compress time/HashTags throughput 804495592 bytes 804495592 bytes 1
decompress time/HashTags 617157207.9 ns (2846063.699999988) 598508788.5 ns (1834668.7899999022) 1.03
decompress time/HashTags throughput 804495592 bytes 804495592 bytes 1
parquet_rs-zstd decompress time/HashTags 788753134.7 ns (7136122.402499974) 784446921 ns (2511140.566250026) 1.01
parquet_rs-zstd decompress time/HashTags throughput 804495592 bytes 804495592 bytes 1
vortex:parquet-zstd size/HashTags 1.6563035357058875 ratio 1.656176458210158 ratio 1.00
vortex:raw size/HashTags 0.2758095012657322 ratio 0.27578834018024057 ratio 1.00
vortex size/HashTags 221887528 bytes 221870504 bytes 1.00
compress time/TPC-H l_comment chunked without fsst 3822655387.3 ns (25221809.242500067) 3898409457.2 ns (7545279.182500124) 0.98
compress time/TPC-H l_comment chunked without fsst throughput 249197090 bytes 249197090 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst 909217483.9 ns (2257411.711250007) 913703311.6 ns (1247102.8162499666) 1.00
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst throughput 249197090 bytes 249197090 bytes 1
decompress time/TPC-H l_comment chunked without fsst 114171495.42182538 ns (981686.3210714161) 112642222.97583333 ns (1475219.5998645797) 1.01
decompress time/TPC-H l_comment chunked without fsst throughput 249197090 bytes 249197090 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst 250734085.95 ns (571974.0693750083) 253377295.35 ns (1077320.450000003) 0.99
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst throughput 249197090 bytes 249197090 bytes 1
vortex:parquet-zstd size/TPC-H l_comment chunked without fsst 4.6074997113436655 ratio 4.607769750772028 ratio 1.00
vortex:raw size/TPC-H l_comment chunked without fsst 1.0527153748063431 ratio 1.0527089541856207 ratio 1.00
vortex size/TPC-H l_comment chunked without fsst 262333608 bytes 262332008 bytes 1.00
compress time/TPC-H l_comment chunked 924270571.8 ns (3813865.588749945) 920568668.1 ns (1212691.7749999762) 1.00
compress time/TPC-H l_comment chunked throughput 249197090 bytes 249197090 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment chunked 909314637.6 ns (1923499.9749999642) 915052101.5 ns (1900237.3712499738) 0.99
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput 249197090 bytes 249197090 bytes 1
decompress time/TPC-H l_comment chunked 132006344.1345238 ns (383113.9406934604) 131222959.92428572 ns (375544.4134226367) 1.01
decompress time/TPC-H l_comment chunked throughput 249197090 bytes 249197090 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment chunked 249818739.5 ns (1001009.4862499982) 251980757.25 ns (941983.2775000036) 0.99
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput 249197090 bytes 249197090 bytes 1
vortex:parquet-zstd size/TPC-H l_comment chunked 1.347924163139884 ratio 1.348015881393932 ratio 1.00
vortex:raw size/TPC-H l_comment chunked 0.3079719109079484 ratio 0.30797293820726396 ratio 1.00
vortex size/TPC-H l_comment chunked 76745704 bytes 76745960 bytes 1.00
compress time/TPC-H l_comment canonical 917216400.55 ns (1439070.761250019) 916374263.45 ns (1104641.2381249666) 1.00
compress time/TPC-H l_comment canonical throughput 249197106 bytes 249197106 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment canonical 919043963.9 ns (1394955.675000012) 920755421.9 ns (952637.4750000238) 1.00
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput 249197106 bytes 249197106 bytes 1
decompress time/TPC-H l_comment canonical 131755023.82593915 ns (314767.92713277787) 133676730.28079364 ns (245265.83480158448) 0.99
decompress time/TPC-H l_comment canonical throughput 249197106 bytes 249197106 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment canonical 248781614.97761902 ns (362647.8276607245) 251916068.02561507 ns (704825.1072448194) 0.99
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput 249197106 bytes 249197106 bytes 1
vortex:parquet-zstd size/TPC-H l_comment canonical 1.347923097797971 ratio 1.34802061689584 ratio 1.00
vortex:raw size/TPC-H l_comment canonical 0.30797189113424134 ratio 0.30797291843349095 ratio 1.00
vortex size/TPC-H l_comment canonical 76745704 bytes 76745960 bytes 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@lwwmanning
Copy link
Member

benchmarks are mostly within tolerances, but CMS provider decompression got 11% slower

Screenshot 2024-10-28 at 14 25 35

@a10y
Copy link
Contributor Author

a10y commented Oct 28, 2024

All of the TPCH queries seem unaffected so think this is good to go

@lwwmanning lwwmanning merged commit 8467f64 into develop Oct 28, 2024
9 checks passed
@lwwmanning lwwmanning deleted the aduffy/canonicalize-dict-specialized branch October 28, 2024 18:42
@a10y
Copy link
Contributor Author

a10y commented Oct 28, 2024

Missed your comment @lwwmanning , the slowdown for CMSprovider is a bit surprising. It has a few integer columns that maybe dict encode but is mostly strings, which should be unaffected by this change

@lwwmanning
Copy link
Member

Missed your comment @lwwmanning , the slowdown for CMSprovider is a bit surprising. It has a few integer columns that maybe dict encode but is mostly strings, which should be unaffected by this change

@a10y yeah, there's maybe something there, but I tend to agree with Nick's assessment that we should dig into it / fix that lower down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-evaluate values canonicalization in DictArray::into_canonical
2 participants