Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add neon simd version of find_authority_delimiter #752

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Oct 7, 2024

Improves the performance of find_authority_delimiter on NEON architecture.

Before

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations        GHz cycle/byte cycles/url instructions/byte instructions/cycle instructions/ns instructions/url     ns/url      speed  time/byte   time/url      url/s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BasicBench_AdaURL_href                  2184 ns         2177 ns       324169    4.95059      12.19   1.16889k           54.2804            4.45285         22.0442         5.20489k    236.111 396.448M/s   2.5224ns   241.87ns 4.13445M/s
BasicBench_AdaURL_aggregator_href       1325 ns         1320 ns       540499     5.9944     8.6825    832.556           40.0104            4.60817         27.6232         3.83656k    138.889 653.695M/s  1.52977ns  146.687ns 6.81721M/s
BasicBench_AdaURL_CanParse               941 ns          939 ns       742469    6.95086    7.04751    675.778           32.5736              4.622         32.1269         3.12344k    97.2222 919.306M/s  1.08778ns  104.306ns  9.5872M/s
BasicBench_whatwg                       4265 ns         4256 ns       165302    4.01608     19.387     1.859k           97.5875            5.03365         20.2156         9.35756k    462.889 202.782M/s  4.93141ns  472.868ns 2.11476M/s
BasicBench_CURL                        10933 ns        10922 ns        63206    3.55107    44.2341   4.24156k           218.441            4.93831         17.5363         20.9461k   1.19444k 79.0172M/s  12.6555ns  1.21352us  824.05k/s

After

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations        GHz cycle/byte cycles/url instructions/byte instructions/cycle instructions/ns instructions/url     ns/url      speed  time/byte   time/url      url/s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BasicBench_AdaURL_href                  2189 ns         2187 ns       326079    4.79309    11.5689   1.10933k            54.343            4.69732         22.5146         5.21089k    231.444 394.551M/s  2.53453ns  243.033ns 4.11467M/s
BasicBench_AdaURL_aggregator_href       1317 ns         1316 ns       531616     5.9616    8.63499        828           39.9826             4.6303          27.604         3.83389k    138.889 655.728M/s  1.52502ns  146.233ns 6.83841M/s
BasicBench_AdaURL_CanParse               939 ns          939 ns       748015    7.04229    7.14021    684.667           32.5875            4.56394         32.1406         3.12478k    97.2222 919.365M/s  1.08771ns  104.299ns 9.58782M/s
BasicBench_whatwg                       4236 ns         4234 ns       165406    4.06455    19.6257   1.88189k           97.5689            4.97148         20.2069         9.35578k        463 203.843M/s  4.90574ns  470.406ns 2.12582M/s
BasicBench_CURL                        10873 ns        10870 ns        64574    3.54577     44.168   4.23522k           218.513            4.94732          17.542          20.953k   1.19444k 79.3933M/s  12.5955ns  1.20777us 827.972k/s

@anonrig anonrig requested a review from lemire October 7, 2024 01:37
@anonrig anonrig force-pushed the simd-neon-authority-delimiter branch 2 times, most recently from 6a890cb to 28c668d Compare October 7, 2024 01:50
@lemire
Copy link
Member

lemire commented Oct 7, 2024

@anonrig I have pushed some fixes but I don't expect that this will make a measurable impact on the performance.

@anonrig anonrig force-pushed the simd-neon-authority-delimiter branch from 9867434 to ffe4b5e Compare October 10, 2024 13:54
@anonrig
Copy link
Member Author

anonrig commented Oct 10, 2024

@lemire I updated the PR description with the benchmark results. I'm not sure if it's worth merging. What do you think?

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy for this to get merged if you are getting positive benchmark results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants