Add gbenchmark for nvtext replace-tokens function #7708

davidwendt · 2021-03-24T18:50:48Z

Reference #5696
Creates gbenchmarks for nvtext::replace_tokens() function.
The benchmarks measures various string lengths and number of rows with the default whitespace delimiter and 4 hardcoded tokens.

This API already uses the make_strings_children utility.

codecov · 2021-03-24T21:32:45Z

Codecov Report

Merging #7708 (8c298e6) into branch-0.19 (7871e7a) will increase coverage by 0.64%.
The diff coverage is n/a.

❗ Current head 8c298e6 differs from pull request most recent head ffe227e. Consider uploading reports for the commit ffe227e to get more accurate results

@@               Coverage Diff               @@
##           branch-0.19    #7708      +/-   ##
===============================================
+ Coverage        81.86%   82.50%   +0.64%     
===============================================
  Files              101      101              
  Lines            16884    17441     +557     
===============================================
+ Hits             13822    14390     +568     
+ Misses            3062     3051      -11

Impacted Files	Coverage Δ
python/cudf/cudf/core/buffer.py	`84.21% <ø> (+4.96%)`	⬆️
python/cudf/cudf/core/column/categorical.py	`91.97% <ø> (+0.58%)`	⬆️
python/cudf/cudf/core/column/column.py	`87.61% <ø> (-0.15%)`	⬇️
python/cudf/cudf/core/column/datetime.py	`89.63% <ø> (+0.54%)`	⬆️
python/cudf/cudf/core/column/decimal.py	`92.75% <ø> (-2.12%)`	⬇️
python/cudf/cudf/core/column/lists.py	`90.00% <ø> (-1.40%)`	⬇️
python/cudf/cudf/core/column/numerical.py	`94.83% <ø> (-0.20%)`	⬇️
python/cudf/cudf/core/column/string.py	`86.79% <ø> (+0.30%)`	⬆️
python/cudf/cudf/core/column/timedelta.py	`88.57% <ø> (+0.33%)`	⬆️
python/cudf/cudf/core/column_accessor.py	`96.13% <ø> (+0.82%)`	⬆️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e73fff0...ffe227e. Read the comment docs.

ttnghia · 2021-03-25T14:19:57Z

cpp/benchmarks/text/replace_benchmark.cpp

+  std::string row;  // build a row of random tokens
+  while (static_cast<int>(row.size()) < n_length) row += words[tokens_dist(generator)];
+
+  std::uniform_int_distribution<int> position_dist(0, 16);


I'm curious, why 16? Is this a good size test? Do we need to benchmark if each string is of size around 1000?

Just an arbitrary power of two number less than 32 to force some amount of warp divergence.
This benchmark tests with string lengths ranging from 32 to 8K and also some row sizes between 4K to 16M (within limits of column size boundaries).

davidwendt · 2021-03-26T14:23:58Z

@gpucibot merge

Add gbenchmark for nvtext replace-tokens function

ffe227e

davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 24, 2021

davidwendt self-assigned this Mar 24, 2021

davidwendt requested review from a team as code owners March 24, 2021 18:50

davidwendt requested review from karthikeyann and nvdbaranec March 24, 2021 18:50

github-actions bot added the CMake CMake build issue label Mar 24, 2021

karthikeyann approved these changes Mar 24, 2021

View reviewed changes

ttnghia reviewed Mar 25, 2021

View reviewed changes

ttnghia approved these changes Mar 25, 2021

View reviewed changes

nvdbaranec approved these changes Mar 25, 2021

View reviewed changes

kkraus14 approved these changes Mar 26, 2021

View reviewed changes

rapids-bot bot merged commit ad5452d into rapidsai:branch-0.19 Mar 26, 2021

davidwendt deleted the benchmark-nvtext-replace branch March 26, 2021 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gbenchmark for nvtext replace-tokens function #7708

Add gbenchmark for nvtext replace-tokens function #7708

davidwendt commented Mar 24, 2021

codecov bot commented Mar 24, 2021

ttnghia Mar 25, 2021

davidwendt Mar 26, 2021

davidwendt commented Mar 26, 2021

Add gbenchmark for nvtext replace-tokens function #7708

Add gbenchmark for nvtext replace-tokens function #7708

Conversation

davidwendt commented Mar 24, 2021

codecov bot commented Mar 24, 2021

Codecov Report

ttnghia Mar 25, 2021

Choose a reason for hiding this comment

davidwendt Mar 26, 2021

Choose a reason for hiding this comment

davidwendt commented Mar 26, 2021