Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve scalar string replace performance for long strings #7415

Merged
merged 10 commits into from
Mar 1, 2021

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Feb 19, 2021

Fixes #7370.

This adds a scalar string replace algorithm with character-level parallelism which significantly improves the performance of scalar string replacement on longer strings. It can involve many more kernel launches than the row-based algorithm and does not always outperform on short strings. Therefore a heuristic based on the average character length of valid string rows is used to automatically select which algorithm to use.

@jlowe jlowe added libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 19, 2021
@jlowe jlowe self-assigned this Feb 19, 2021
@jlowe jlowe requested a review from a team as a code owner February 19, 2021 16:57
@jlowe
Copy link
Member Author

jlowe commented Feb 19, 2021

Performance comparisons from the scalar benchmark.
Before:

---------------------------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------
StringReplaceScalar/replace_scalar/4096/32/manual_time          0.151 ms        0.168 ms         4367 bytes_per_second=459.181M/s
StringReplaceScalar/replace_scalar/4096/128/manual_time         0.422 ms        0.440 ms         1653 bytes_per_second=643.174M/s
StringReplaceScalar/replace_scalar/4096/512/manual_time          2.48 ms         2.49 ms          282 bytes_per_second=442.666M/s
StringReplaceScalar/replace_scalar/4096/2048/manual_time         27.3 ms         27.3 ms           26 bytes_per_second=159.837M/s
StringReplaceScalar/replace_scalar/4096/8192/manual_time          453 ms          453 ms            2 bytes_per_second=38.304M/s
StringReplace/replace_scalar/32768/32/manual_time         0.166 ms        0.184 ms         3998 bytes_per_second=3.24747G/s
StringReplaceScalar/replace_scalar/32768/128/manual_time        0.452 ms        0.471 ms         1545 bytes_per_second=4.71488G/s
StringReplaceScalar/replace_scalar/32768/512/manual_time         2.71 ms         2.73 ms          258 bytes_per_second=3.13709G/s
StringReplaceScalar/replace_scalar/32768/2048/manual_time        34.6 ms         34.7 ms           20 bytes_per_second=1009.18M/s
StringReplaceScalar/replace_scalar/32768/8192/manual_time         526 ms          526 ms            1 bytes_per_second=265.082M/s
StringReplaceScalar/replace_scalar/262144/32/manual_time        0.337 ms        0.356 ms         2068 bytes_per_second=12.8219G/s
StringReplaceScalar/replace_scalar/262144/128/manual_time        1.55 ms         1.57 ms          448 bytes_per_second=10.9859G/s
StringReplaceScalar/replace_scalar/262144/512/manual_time        17.0 ms         17.1 ms           41 bytes_per_second=3.99909G/s
StringReplaceScalar/replace_scalar/262144/2048/manual_time        193 ms          193 ms            4 bytes_per_second=1.41648G/s
StringReplaceScalar/replace_scalar/2097152/32/manual_time        1.84 ms         1.86 ms          379 bytes_per_second=18.7388G/s
StringReplaceScalar/replace_scalar/2097152/128/manual_time       9.47 ms         9.49 ms           74 bytes_per_second=14.4086G/s
StringReplaceScalar/replace_scalar/2097152/512/manual_time       88.8 ms         88.8 ms            8 bytes_per_second=6.14807G/s
StringReplaceScalar/replace_scalar/16777216/32/manual_time       13.9 ms         13.9 ms           50 bytes_per_second=19.9076G/s

After:

-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                               Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------------------------
StringReplaceScalar/replace_scalar_autoalg_single/4096/32/manual_time               0.152 ms        0.169 ms         4495 bytes_per_second=456.924M/s
StringReplaceScalar/replace_scalar_autoalg_single/4096/128/manual_time              0.113 ms        0.131 ms         5786 bytes_per_second=2.33732G/s
StringReplaceScalar/replace_scalar_autoalg_single/4096/512/manual_time              0.150 ms        0.166 ms         4419 bytes_per_second=7.1418G/s
StringReplaceScalar/replace_scalar_autoalg_single/4096/2048/manual_time             0.301 ms        0.318 ms         2319 bytes_per_second=14.1505G/s
StrilaceScalar/replace_scalar_autoalg_single/4096/8192/manual_time             0.946 ms        0.964 ms          730 bytes_per_second=17.9302G/s
StringReplaceScalar/replace_scalar_autoalg_single/32768/32/manual_time              0.165 ms        0.183 ms         4139 bytes_per_second=3.26231G/s
StringReplaceScalar/replace_scalar_autoalg_single/32768/128/manual_time             0.198 ms        0.214 ms         3403 bytes_per_second=10.7608G/s
StringReplaceScalar/replace_scalar_autoalg_single/32768/512/manual_time             0.513 ms        0.532 ms         1288 bytes_per_second=16.5836G/s
StringReplaceScalar/replace_scalar_autoalg_single/32768/2048/manual_time             1.86 ms         1.87 ms          376 bytes_per_second=18.3948G/s
StringReplaceScalar/replace_scalar_autoalg_single/32768/8192/manual_time             7.57 ms         7.59 ms           92 bytes_per_second=17.9735G/s
StringReplaceScalar/replace_scalar_autoalg_single/262144/32/manual_time             0.336 ms        0.355 ms         2061 bytes_per_second=12.8375G/s
StringReplaceScalar/replace_scalar_autoalg_single/262144/128/manual_time            0.952 ms        0.970 ms          719 bytes_per_second=17.9219G/s
StringReplaceScalar/replace_scalar_autoalg_single/262144/512/manual_time             3.73 ms         3.75 ms          188 bytes_per_second=18.29G/s
StringReplaceScalar/replace_scalar_autoalg_single/262144/2048/manual_time            15.6 ms         15.6 ms           45 bytes_per_second=17.5188G/s
StringReplaceScalar/replace_scalar_autoalg_single/2097152/32/manual_time             1.84 ms         1.86 ms          379 bytes_per_second=18.7479G/s
StringReplaceScalar/replace_scalar_autoalg_single/2097152/128/manual_time            7.68 ms         7.69 ms           91 bytes_per_second=17.7849G/s
StringReplaceScalar/replace_scalar_autoalg_single/2097152/512/manual_time            32.0 ms         32.0 ms           22 bytes_per_second=17.0394G/s
StringReplaceScalar/replace_scalar_autoalg_single/16777216/32/manual_time            13.9 ms         13.9 ms           50 bytes_per_second=19.9068G/s

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is alot of work and will take some time go through. It looks good to me so far. These are my first-pass comments right now.

cpp/benchmarks/string/replace_scalar_benchmark.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/string/replace_scalar_benchmark.cpp Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
@jlowe jlowe added the 3 - Ready for Review Ready for review by team label Feb 20, 2021
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 22, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@580f9a2). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.19    #7415   +/-   ##
==============================================
  Coverage               ?   82.26%           
==============================================
  Files                  ?      101           
  Lines                  ?    17072           
  Branches               ?        0           
==============================================
  Hits                   ?    14045           
  Misses                 ?     3027           
  Partials               ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 580f9a2...8e9c751. Read the comment docs.

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. It would probably be worth exploring this for replace-mutliple as well. But not in this PR.

cpp/include/cudf/strings/detail/replace.hpp Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Show resolved Hide resolved
cpp/src/strings/replace/replace.cu Show resolved Hide resolved
@jlowe
Copy link
Member Author

jlowe commented Mar 1, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 3135f1b into rapidsai:branch-0.19 Mar 1, 2021
@jlowe jlowe deleted the string-replace-scalar-perf branch September 10, 2021 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Improved scalar string::replace performance for long strings
3 participants