Add gbenchmark for nvtext tokenize functions #7684

davidwendt · 2021-03-23T15:20:17Z

Reference #5696
Creates gbenchmarks for nvtext::tokenize(), nvtext::count_tokens() and nvtext::ngrams_tokenize() functions.
The benchmarks measures various string lengths and number of rows.

These functions use the make_strings_column factory optimized in #7576

codereport

lgtm

codecov · 2021-03-23T18:09:58Z

Codecov Report

Merging #7684 (8bf0c22) into branch-0.19 (7871e7a) will increase coverage by 0.61%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           branch-0.19    #7684      +/-   ##
===============================================
+ Coverage        81.86%   82.47%   +0.61%     
===============================================
  Files              101      101              
  Lines            16884    17402     +518     
===============================================
+ Hits             13822    14353     +531     
+ Misses            3062     3049      -13

Impacted Files	Coverage Δ
python/cudf/cudf/core/column/categorical.py	`91.97% <ø> (+0.58%)`	⬆️
python/cudf/cudf/core/column/column.py	`87.86% <ø> (+0.10%)`	⬆️
python/cudf/cudf/core/column/datetime.py	`89.63% <ø> (+0.54%)`	⬆️
python/cudf/cudf/core/column/decimal.py	`92.75% <ø> (-2.12%)`	⬇️
python/cudf/cudf/core/column/lists.py	`92.50% <ø> (+1.10%)`	⬆️
python/cudf/cudf/core/column/numerical.py	`94.83% <ø> (-0.20%)`	⬇️
python/cudf/cudf/core/column/string.py	`86.79% <ø> (+0.30%)`	⬆️
python/cudf/cudf/core/column/timedelta.py	`88.57% <ø> (+0.33%)`	⬆️
python/cudf/cudf/core/column_accessor.py	`95.45% <ø> (+0.14%)`	⬆️
python/cudf/cudf/core/dataframe.py	`90.90% <ø> (+0.44%)`	⬆️
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d500142...8bf0c22. Read the comment docs.

codereport · 2021-03-24T02:24:14Z

@gpucibot merge

Add gbenchmark for nvtext tokenize functions

46a8830

davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 23, 2021

davidwendt self-assigned this Mar 23, 2021

davidwendt requested a review from a team as a code owner March 23, 2021 15:20

davidwendt requested review from harrism and codereport March 23, 2021 15:20

github-actions bot added the CMake CMake build issue label Mar 23, 2021

fix rows range values

8bf0c22

codereport approved these changes Mar 23, 2021

View reviewed changes

ttnghia approved these changes Mar 23, 2021

View reviewed changes

codereport added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Mar 24, 2021

harrism approved these changes Mar 24, 2021

View reviewed changes

rapids-bot bot merged commit 6ed360c into rapidsai:branch-0.19 Mar 24, 2021

davidwendt deleted the benchmark-nvtext-tokenize branch March 24, 2021 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gbenchmark for nvtext tokenize functions #7684

Add gbenchmark for nvtext tokenize functions #7684

davidwendt commented Mar 23, 2021

codereport left a comment

codecov bot commented Mar 23, 2021 •

edited

Loading

codereport commented Mar 24, 2021

Add gbenchmark for nvtext tokenize functions #7684

Add gbenchmark for nvtext tokenize functions #7684

Conversation

davidwendt commented Mar 23, 2021

codereport left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 23, 2021 • edited Loading

Codecov Report

codereport commented Mar 24, 2021

codecov bot commented Mar 23, 2021 •

edited

Loading