Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gbenchmark for cudf::strings::translate function #7617

Merged

Conversation

davidwendt
Copy link
Contributor

Reference #5698
This creates a gbenchmark for the cudf::strings::translate() API. The benchmarks measures various sized rows as well as strings lengths and translate table sizes.
This PR also includes changes to translate.cu implementation cleaning up the code and using the more efficient make_strings_children. This change improved performance for all 4 functions on average by 2-3x.
A further improvement was to sort the translation table input to more quickly lookup matches in device code. This added another 2x improvement when using longer translate tables.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 16, 2021
@davidwendt davidwendt self-assigned this Mar 16, 2021
@davidwendt davidwendt requested review from a team as code owners March 16, 2021 21:10
@github-actions github-actions bot added the CMake CMake build issue label Mar 16, 2021
@codecov
Copy link

codecov bot commented Mar 17, 2021

Codecov Report

Merging #7617 (40a1c41) into branch-0.19 (7871e7a) will increase coverage by 0.58%.
The diff coverage is 93.16%.

❗ Current head 40a1c41 differs from pull request most recent head 88ae901. Consider uploading reports for the commit 88ae901 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7617      +/-   ##
===============================================
+ Coverage        81.86%   82.44%   +0.58%     
===============================================
  Files              101      101              
  Lines            16884    17369     +485     
===============================================
+ Hits             13822    14320     +498     
+ Misses            3062     3049      -13     
Impacted Files Coverage Δ
python/cudf/cudf/core/index.py 93.34% <ø> (+0.48%) ⬆️
python/cudf/cudf/core/series.py 91.65% <ø> (+0.86%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.53% <ø> (+0.08%) ⬆️
python/cudf/cudf/utils/cudautils.py 52.94% <ø> (+2.55%) ⬆️
python/cudf/cudf/utils/dtypes.py 89.88% <ø> (+0.37%) ⬆️
python/dask_cudf/dask_cudf/io/orc.py 91.04% <ø> (+0.13%) ⬆️
python/cudf/cudf/core/column/numerical.py 94.83% <87.50%> (-0.20%) ⬇️
python/cudf/cudf/core/frame.py 89.09% <89.47%> (+0.08%) ⬆️
python/cudf/cudf/core/column/column.py 87.86% <90.00%> (+0.10%) ⬆️
python/cudf/cudf/core/column/decimal.py 92.75% <90.32%> (-2.12%) ⬇️
... and 61 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4723051...88ae901. Read the comment docs.

@davidwendt
Copy link
Contributor Author

rerun tests

@harrism harrism added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Mar 23, 2021
@harrism
Copy link
Member

harrism commented Mar 23, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 2bf22d1 into rapidsai:branch-0.19 Mar 23, 2021
@davidwendt davidwendt deleted the benchmarks-strings-translate branch March 23, 2021 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants