Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement search_sorted_many #840

Merged
merged 10 commits into from
Sep 17, 2024
Merged

feat: implement search_sorted_many #840

merged 10 commits into from
Sep 17, 2024

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Sep 16, 2024

Coming out of #823 , we find that search_sorted on BitPackedArray is slow due to wastefully re-building BitPackedArray

This PR creates a new search_sorted_bulk that allows arrays to do some up-front initialization before doing loops of repeated searches, like RunEndArray::find_physical_indices

We're still about ~50% slower than #823 , unpack_single + branch mispredicts (which I think is all of the self time in search_sorted) seem to be the slowdown

image

@a10y a10y added benchmark Run benchmarks on this branch labels Sep 16, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Sep 16, 2024
@a10y
Copy link
Contributor Author

a10y commented Sep 16, 2024

image

I'm pretty sure the first unpack_single is for validity, and the second unchecked_unpack_single is for the actual values unpacking. I don't have a great idea on how to somehow avoid validity checks in index_cmp

@robert3005
Copy link
Member

Since the array is sorted and the value is not null (search sorted guarantees that) then searching sliced packed values is equivalent since no value would fall in that null range

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex benchmarks

Benchmark suite Current: b63030a Previous: 3b27edb Ratio
tpch_q1/vortex-in-memory-no-pushdown 463630931 ns/iter (± 5145292) 466845942 ns/iter (± 5252319) 0.99
tpch_q1/vortex-in-memory-pushdown 520094403 ns/iter (± 2840533) 519034307 ns/iter (± 2212826) 1.00
tpch_q1/arrow 444244371 ns/iter (± 760853) 446531031 ns/iter (± 2255019) 0.99
tpch_q1/parquet 665509039 ns/iter (± 1749665) 656407639 ns/iter (± 2960950) 1.01
tpch_q1/vortex-file-compressed 601694464 ns/iter (± 4832236) 612253477 ns/iter (± 4730162) 0.98
tpch_q1/vortex-file-uncompressed 619644767 ns/iter (± 8366331) 638789621 ns/iter (± 4083923) 0.97
tpch_q2/vortex-in-memory-no-pushdown 143959748 ns/iter (± 1524144) 152376530 ns/iter (± 1446434) 0.94
tpch_q2/vortex-in-memory-pushdown 147173330 ns/iter (± 1664255) 152749172 ns/iter (± 1396179) 0.96
tpch_q2/arrow 126506525 ns/iter (± 1549629) 125285537 ns/iter (± 585401) 1.01
tpch_q2/parquet 164482792 ns/iter (± 1234409) 160670527 ns/iter (± 605403) 1.02
tpch_q2/vortex-file-compressed 158393605 ns/iter (± 2462963) 160147024 ns/iter (± 853795) 0.99
tpch_q2/vortex-file-uncompressed 167989306 ns/iter (± 1420640) 168472394 ns/iter (± 1036996) 1.00
tpch_q3/vortex-in-memory-no-pushdown 175177448 ns/iter (± 556365) 158842258 ns/iter (± 1009085) 1.10
tpch_q3/vortex-in-memory-pushdown 186926681 ns/iter (± 3214873) 192936228 ns/iter (± 1583241) 0.97
tpch_q3/arrow 175456380 ns/iter (± 1181407) 154178394 ns/iter (± 594058) 1.14
tpch_q3/parquet 342962621 ns/iter (± 4301729) 346962228 ns/iter (± 1571617) 0.99
tpch_q3/vortex-file-compressed 587031986 ns/iter (± 4804623) 621345621 ns/iter (± 2284728) 0.94
tpch_q3/vortex-file-uncompressed 354892299 ns/iter (± 3266851) 369213053 ns/iter (± 2116455) 0.96
tpch_q4/vortex-in-memory-no-pushdown 134685406 ns/iter (± 1124937) 123339286 ns/iter (± 1683505) 1.09
tpch_q4/vortex-in-memory-pushdown 134994457 ns/iter (± 1817851) 148413712 ns/iter (± 1615251) 0.91
tpch_q4/arrow 128033123 ns/iter (± 2700814) 116128625 ns/iter (± 1012601) 1.10
tpch_q4/parquet 237193951 ns/iter (± 2529276) 229925286 ns/iter (± 4507823) 1.03
tpch_q4/vortex-file-compressed 596997071 ns/iter (± 3522515) 625648385 ns/iter (± 3118608) 0.95
tpch_q4/vortex-file-uncompressed 310922219 ns/iter (± 5000997) 327888351 ns/iter (± 6623198) 0.95
tpch_q5/vortex-in-memory-no-pushdown 313553440 ns/iter (± 3543403) 290069615 ns/iter (± 7146984) 1.08
tpch_q5/vortex-in-memory-pushdown 329153263 ns/iter (± 7524928) 314465574 ns/iter (± 11179785) 1.05
tpch_q5/arrow 315351245 ns/iter (± 2972078) 297118614 ns/iter (± 1649595) 1.06
tpch_q5/parquet 466140249 ns/iter (± 7522064) 465254387 ns/iter (± 15101517) 1.00
tpch_q5/vortex-file-compressed 341632516 ns/iter (± 2782025) 343032670 ns/iter (± 11395885) 1.00
tpch_q5/vortex-file-uncompressed 344981775 ns/iter (± 9594855) 354492577 ns/iter (± 5519705) 0.97
tpch_q6/vortex-in-memory-no-pushdown 42331106 ns/iter (± 337966) 40294952 ns/iter (± 641907) 1.05
tpch_q6/vortex-in-memory-pushdown 85677901 ns/iter (± 666562) 89615576 ns/iter (± 1347002) 0.96
tpch_q6/arrow 36319449 ns/iter (± 651429) 36150275 ns/iter (± 704303) 1.00
tpch_q6/parquet 152103975 ns/iter (± 1852985) 157547731 ns/iter (± 3157638) 0.97
tpch_q6/vortex-file-compressed 76260096 ns/iter (± 2065043) 76111011 ns/iter (± 411803) 1.00
tpch_q6/vortex-file-uncompressed 163486093 ns/iter (± 1707684) 163609561 ns/iter (± 825738) 1.00
tpch_q7/vortex-in-memory-no-pushdown 577261839 ns/iter (± 5075049) 553329679 ns/iter (± 2175264) 1.04
tpch_q7/vortex-in-memory-pushdown 636884395 ns/iter (± 7176941) 614349222 ns/iter (± 2201534) 1.04
tpch_q7/arrow 585074826 ns/iter (± 12206673) 557893102 ns/iter (± 4153836) 1.05
tpch_q7/parquet 758435984 ns/iter (± 14220117) 710404899 ns/iter (± 4127309) 1.07
tpch_q7/vortex-file-compressed 846648129 ns/iter (± 5324550) 852802470 ns/iter (± 4269968) 0.99
tpch_q7/vortex-file-uncompressed 766718693 ns/iter (± 12637312) 737563446 ns/iter (± 4194360) 1.04
tpch_q8/vortex-in-memory-no-pushdown 225968966 ns/iter (± 1704240) 211672418 ns/iter (± 944157) 1.07
tpch_q8/vortex-in-memory-pushdown 234466911 ns/iter (± 1533547) 224654520 ns/iter (± 739161) 1.04
tpch_q8/arrow 209015357 ns/iter (± 1551336) 212012381 ns/iter (± 1345111) 0.99
tpch_q8/parquet 484593838 ns/iter (± 5281110) 474099975 ns/iter (± 781220) 1.02
tpch_q8/vortex-file-compressed 269842038 ns/iter (± 3126950) 263637059 ns/iter (± 880478) 1.02
tpch_q8/vortex-file-uncompressed 294677446 ns/iter (± 6326368) 298940834 ns/iter (± 5541087) 0.99
tpch_q9/vortex-in-memory-no-pushdown 415278145 ns/iter (± 6844457) 395816762 ns/iter (± 1971739) 1.05
tpch_q9/vortex-in-memory-pushdown 432100993 ns/iter (± 5535169) 398933970 ns/iter (± 1719678) 1.08
tpch_q9/arrow 403261135 ns/iter (± 8118916) 396456472 ns/iter (± 2080228) 1.02
tpch_q9/parquet 693468165 ns/iter (± 4138175) 686738559 ns/iter (± 2029738) 1.01
tpch_q9/vortex-file-compressed 460101299 ns/iter (± 7803818) 430138948 ns/iter (± 12131658) 1.07
tpch_q9/vortex-file-uncompressed 470292241 ns/iter (± 8032042) 460858512 ns/iter (± 4289726) 1.02
tpch_q10/vortex-in-memory-no-pushdown 235714774 ns/iter (± 2894408) 231187678 ns/iter (± 714352) 1.02
tpch_q10/vortex-in-memory-pushdown 257865068 ns/iter (± 2426547) 260822507 ns/iter (± 792672) 0.99
tpch_q10/arrow 225499326 ns/iter (± 928419) 223785245 ns/iter (± 838896) 1.01
tpch_q10/parquet 485116895 ns/iter (± 4479695) 478961010 ns/iter (± 1332096) 1.01
tpch_q10/vortex-file-compressed 586697055 ns/iter (± 3574419) 591138925 ns/iter (± 3307076) 0.99
tpch_q10/vortex-file-uncompressed 395778173 ns/iter (± 1630070) 390606780 ns/iter (± 1321466) 1.01
tpch_q11/vortex-in-memory-no-pushdown 219011232 ns/iter (± 2205317) 227221589 ns/iter (± 3010472) 0.96
tpch_q11/vortex-in-memory-pushdown 218403750 ns/iter (± 2472517) 224544076 ns/iter (± 827137) 0.97
tpch_q11/arrow 184989717 ns/iter (± 1977626) 181068943 ns/iter (± 557691) 1.02
tpch_q11/parquet 189053235 ns/iter (± 1089245) 188963008 ns/iter (± 813539) 1.00
tpch_q11/vortex-file-compressed 228938860 ns/iter (± 2994086) 231053458 ns/iter (± 1797441) 0.99
tpch_q11/vortex-file-uncompressed 234572494 ns/iter (± 1617366) 234025030 ns/iter (± 1826174) 1.00
tpch_q12/vortex-in-memory-no-pushdown 180552869 ns/iter (± 430985) 176785720 ns/iter (± 333660) 1.02
tpch_q12/vortex-in-memory-pushdown 252281462 ns/iter (± 753504) 251044206 ns/iter (± 376180) 1.00
tpch_q12/arrow 167823500 ns/iter (± 911489) 166222703 ns/iter (± 138836) 1.01
tpch_q12/parquet 368030656 ns/iter (± 2026983) 355508277 ns/iter (± 820096) 1.04
tpch_q12/vortex-file-compressed 590179362 ns/iter (± 8032877) 587964864 ns/iter (± 7236716) 1.00
tpch_q12/vortex-file-uncompressed 347164337 ns/iter (± 2753494) 344573523 ns/iter (± 1063500) 1.01
tpch_q13/vortex-in-memory-no-pushdown 217947121 ns/iter (± 6237785) 201421925 ns/iter (± 1298185) 1.08
tpch_q13/vortex-in-memory-pushdown 208393404 ns/iter (± 4244736) 201804199 ns/iter (± 3568142) 1.03
tpch_q13/arrow 226341955 ns/iter (± 6418221) 197821185 ns/iter (± 1535916) 1.14
tpch_q13/parquet 369442781 ns/iter (± 13006506) 327878657 ns/iter (± 1936309) 1.13
tpch_q13/vortex-file-compressed 255494597 ns/iter (± 5272803) 236944350 ns/iter (± 2300171) 1.08
tpch_q13/vortex-file-uncompressed 225982917 ns/iter (± 1998763) 226881711 ns/iter (± 3105833) 1.00
tpch_q14/vortex-in-memory-no-pushdown 38477957 ns/iter (± 513238) 38531241 ns/iter (± 418181) 1.00
tpch_q14/vortex-in-memory-pushdown 89198711 ns/iter (± 1347800) 86787423 ns/iter (± 267997) 1.03
tpch_q14/arrow 39149362 ns/iter (± 734500) 38195209 ns/iter (± 202473) 1.02
tpch_q14/parquet 221154327 ns/iter (± 2010311) 219965575 ns/iter (± 1266994) 1.01
tpch_q14/vortex-file-compressed 87862427 ns/iter (± 1393458) 88893087 ns/iter (± 562954) 0.99
tpch_q14/vortex-file-uncompressed 137229719 ns/iter (± 1705777) 140888198 ns/iter (± 1233981) 0.97
tpch_q15/vortex-in-memory-no-pushdown 66673743 ns/iter (± 1056392) 65897462 ns/iter (± 307551) 1.01
tpch_q15/vortex-in-memory-pushdown 122585896 ns/iter (± 1585504) 121122576 ns/iter (± 346857) 1.01
tpch_q15/arrow 67325712 ns/iter (± 842080) 63692317 ns/iter (± 505791) 1.06
tpch_q15/parquet 299499491 ns/iter (± 4407041) 291947708 ns/iter (± 2122551) 1.03
tpch_q15/vortex-file-compressed 159401408 ns/iter (± 3080109) 157232669 ns/iter (± 554338) 1.01
tpch_q15/vortex-file-uncompressed 264808851 ns/iter (± 4258984) 270871317 ns/iter (± 1722412) 0.98
tpch_q16/vortex-in-memory-no-pushdown 119023039 ns/iter (± 1010582) 117343554 ns/iter (± 219560) 1.01
tpch_q16/vortex-in-memory-pushdown 126885302 ns/iter (± 361301) 121492910 ns/iter (± 323601) 1.04
tpch_q16/arrow 105560251 ns/iter (± 776503) 102898865 ns/iter (± 801646) 1.03
tpch_q16/parquet 121211139 ns/iter (± 722196) 119174062 ns/iter (± 272457) 1.02
tpch_q16/vortex-file-compressed 139365053 ns/iter (± 1465752) 134515972 ns/iter (± 750613) 1.04
tpch_q16/vortex-file-uncompressed 138618693 ns/iter (± 1467768) 134935844 ns/iter (± 599459) 1.03
tpch_q17/vortex-in-memory-no-pushdown 646059952 ns/iter (± 9122350) 622837714 ns/iter (± 8047537) 1.04
tpch_q17/vortex-in-memory-pushdown 643746857 ns/iter (± 10237861) 625237379 ns/iter (± 20020090) 1.03
tpch_q17/arrow 589578038 ns/iter (± 33119524) 538477723 ns/iter (± 6685929) 1.09
tpch_q17/parquet 592687587 ns/iter (± 5340466) 577140541 ns/iter (± 2327357) 1.03
tpch_q17/vortex-file-compressed 639426477 ns/iter (± 8308897) 605051065 ns/iter (± 5324789) 1.06
tpch_q17/vortex-file-uncompressed 664328343 ns/iter (± 8136078) 666435966 ns/iter (± 2391820) 1.00
tpch_q18/vortex-in-memory-no-pushdown 1082800680 ns/iter (± 26033255) 1043444544 ns/iter (± 5801365) 1.04
tpch_q18/vortex-in-memory-pushdown 1099457702 ns/iter (± 40363411) 1039708460 ns/iter (± 9949214) 1.06
tpch_q18/arrow 1077944044 ns/iter (± 30979607) 1037215011 ns/iter (± 7607330) 1.04
tpch_q18/parquet 1230134294 ns/iter (± 26507189) 1217688772 ns/iter (± 8276815) 1.01
tpch_q18/vortex-file-compressed 1097946474 ns/iter (± 25941637) 1091101577 ns/iter (± 6715858) 1.01
tpch_q18/vortex-file-uncompressed 1182162974 ns/iter (± 37247199) 1151942251 ns/iter (± 10881477) 1.03
tpch_q19/vortex-in-memory-no-pushdown 172564892 ns/iter (± 684550) 170556451 ns/iter (± 230144) 1.01
tpch_q19/vortex-in-memory-pushdown 248773882 ns/iter (± 1159657) 251514028 ns/iter (± 262356) 0.99
tpch_q19/arrow 157968619 ns/iter (± 522002) 157591488 ns/iter (± 1545728) 1.00
tpch_q19/parquet 478136857 ns/iter (± 3921775) 482145084 ns/iter (± 2161081) 0.99
tpch_q19/vortex-file-compressed 748554955 ns/iter (± 7227845) 724423382 ns/iter (± 1769973) 1.03
tpch_q19/vortex-file-uncompressed 361070304 ns/iter (± 3735707) 355322251 ns/iter (± 3265779) 1.02
tpch_q20/vortex-in-memory-no-pushdown 270540780 ns/iter (± 3410403) 251637011 ns/iter (± 799161) 1.08
tpch_q20/vortex-in-memory-pushdown 281032892 ns/iter (± 6628659) 276585995 ns/iter (± 1193507) 1.02
tpch_q20/arrow 247683662 ns/iter (± 5959063) 233322332 ns/iter (± 1474523) 1.06
tpch_q20/parquet 378714396 ns/iter (± 4925853) 349443639 ns/iter (± 1373732) 1.08
tpch_q20/vortex-file-compressed 342271709 ns/iter (± 4142924) 313814578 ns/iter (± 1665087) 1.09
tpch_q20/vortex-file-uncompressed 406182636 ns/iter (± 7303008) 402155752 ns/iter (± 9551010) 1.01
tpch_q21/vortex-in-memory-no-pushdown 924515490 ns/iter (± 7015193) 855061347 ns/iter (± 2375487) 1.08
tpch_q21/vortex-in-memory-pushdown 973685081 ns/iter (± 10991824) 895995176 ns/iter (± 5415176) 1.09
tpch_q21/arrow 929350732 ns/iter (± 14466983) 849913391 ns/iter (± 7109063) 1.09
tpch_q21/parquet 1054428424 ns/iter (± 15869328) 1000142144 ns/iter (± 19029117) 1.05
tpch_q21/vortex-file-compressed 1967408351 ns/iter (± 21273311) 1912006459 ns/iter (± 12870926) 1.03
tpch_q21/vortex-file-uncompressed 1347001125 ns/iter (± 9213834) 1322914013 ns/iter (± 14687367) 1.02
tpch_q22/vortex-in-memory-no-pushdown 96172319 ns/iter (± 610590) 95277643 ns/iter (± 265787) 1.01
tpch_q22/vortex-in-memory-pushdown 100235738 ns/iter (± 761676) 96497650 ns/iter (± 281271) 1.04
tpch_q22/arrow 68455149 ns/iter (± 592561) 65164774 ns/iter (± 153300) 1.05
tpch_q22/parquet 98665052 ns/iter (± 1203519) 94011440 ns/iter (± 382135) 1.05
tpch_q22/vortex-file-compressed 102619402 ns/iter (± 1051162) 101700604 ns/iter (± 309610) 1.01
tpch_q22/vortex-file-uncompressed 111017250 ns/iter (± 1574551) 110847739 ns/iter (± 706894) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@a10y
Copy link
Contributor Author

a10y commented Sep 16, 2024

So the current state of the branch is better, but we're still a good ~100ms slower than #823 was.

One possible optimization that @lwwmanning and I were chatting about was actually changing find_physical_indices to not use binary search but actually to do more of a sorted merge of chunk ends and indices, when we know that the take indices are strict-sorted (which is most of the time).

From_Clipboard.speedscope-7.json

image

vortex-array/src/lib.rs Outdated Show resolved Hide resolved
@robert3005
Copy link
Member

I guess we still have a one off overhead of creating scalars which we could get rid of

@a10y
Copy link
Contributor Author

a10y commented Sep 17, 2024

tpch_q4/vortex-file-compressed
                        time:   [468.03 ms 471.25 ms 475.02 ms]
                        change: [-15.024% -13.992% -13.002%] (p = 0.00 < 0.05)
                        Performance has improved.

Getting rid of the scalar in/out seems to close the gap. Will cleanup and push final patch shortly

@lwwmanning lwwmanning changed the title feat: implement search_sorted_bulk feat: implement search_sorted_many Sep 17, 2024
@a10y a10y added the benchmark Run benchmarks on this branch label Sep 17, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Sep 17, 2024
@a10y a10y marked this pull request as ready for review September 17, 2024 15:30
@a10y a10y merged commit 79f816c into develop Sep 17, 2024
5 checks passed
@a10y a10y deleted the aduffy/search-sorted-bulk branch September 17, 2024 16:38
robert3005 added a commit that referenced this pull request Sep 18, 2024
seems to be a regression from #840
lwwmanning pushed a commit that referenced this pull request Sep 18, 2024
seems to be a regression from #840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants