Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gbenchmark for nvtext replace-tokens function #7708

Merged

Conversation

davidwendt
Copy link
Contributor

Reference #5696
Creates gbenchmarks for nvtext::replace_tokens() function.
The benchmarks measures various string lengths and number of rows with the default whitespace delimiter and 4 hardcoded tokens.

This API already uses the make_strings_children utility.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 24, 2021
@davidwendt davidwendt self-assigned this Mar 24, 2021
@davidwendt davidwendt requested review from a team as code owners March 24, 2021 18:50
@github-actions github-actions bot added the CMake CMake build issue label Mar 24, 2021
@codecov
Copy link

codecov bot commented Mar 24, 2021

Codecov Report

Merging #7708 (8c298e6) into branch-0.19 (7871e7a) will increase coverage by 0.64%.
The diff coverage is n/a.

❗ Current head 8c298e6 differs from pull request most recent head ffe227e. Consider uploading reports for the commit ffe227e to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7708      +/-   ##
===============================================
+ Coverage        81.86%   82.50%   +0.64%     
===============================================
  Files              101      101              
  Lines            16884    17441     +557     
===============================================
+ Hits             13822    14390     +568     
+ Misses            3062     3051      -11     
Impacted Files Coverage Δ
python/cudf/cudf/core/buffer.py 84.21% <ø> (+4.96%) ⬆️
python/cudf/cudf/core/column/categorical.py 91.97% <ø> (+0.58%) ⬆️
python/cudf/cudf/core/column/column.py 87.61% <ø> (-0.15%) ⬇️
python/cudf/cudf/core/column/datetime.py 89.63% <ø> (+0.54%) ⬆️
python/cudf/cudf/core/column/decimal.py 92.75% <ø> (-2.12%) ⬇️
python/cudf/cudf/core/column/lists.py 90.00% <ø> (-1.40%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.83% <ø> (-0.20%) ⬇️
python/cudf/cudf/core/column/string.py 86.79% <ø> (+0.30%) ⬆️
python/cudf/cudf/core/column/timedelta.py 88.57% <ø> (+0.33%) ⬆️
python/cudf/cudf/core/column_accessor.py 96.13% <ø> (+0.82%) ⬆️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e73fff0...ffe227e. Read the comment docs.

std::string row; // build a row of random tokens
while (static_cast<int>(row.size()) < n_length) row += words[tokens_dist(generator)];

std::uniform_int_distribution<int> position_dist(0, 16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why 16? Is this a good size test? Do we need to benchmark if each string is of size around 1000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an arbitrary power of two number less than 32 to force some amount of warp divergence.
This benchmark tests with string lengths ranging from 32 to 8K and also some row sizes between 4K to 16M (within limits of column size boundaries).

@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ad5452d into rapidsai:branch-0.19 Mar 26, 2021
@davidwendt davidwendt deleted the benchmark-nvtext-replace branch March 26, 2021 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants