[FEA] Create gbenchmarks for nvtext APIs #5696

davidwendt · 2020-07-15T12:28:49Z

Currently there is only one benchmark for the nvtext APIs.

Propose creating the following gbenchmarks:

This will help measure performance impact of code changes.

github-actions · 2021-03-14T19:12:36Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@davidwendt

Reference #5696 Creates a gbenchmark for `nvtext::normalize_spaces()` and `nvtext::normalize_characters()` functions. The benchmarks measures various string lengths and number of rows. I found that `normalize_spaces()` is used in haproxy parsing along with `extract` so having this benchmark helps measure possible performance improvement solutions there. The `normalize_characters` is the same code used as part of the `subword_tokenizer`. Since each requires different memory footprint my initial goal for them to share a common benchmark structure did not work out. So the 2 tests are separate gbenchmark test files. I refactored some of this code to use the more efficient `make_strings_children` and this improved the performance of `normalize_spaces` by 2-3x. The current subword-tokenizer gbenchmark is also incorporated into the the TEXT_BENCHMARK gbenchmark. Authors: - David (@davidwendt) Approvers: - Vukasin Milovanovic (@vuule) - Conor Hoekstra (@codereport) - Mark Harris (@harrism) URL: #7668

@davidwendt

Reference #5696 Creates gbenchmarks for `nvtext::tokenize()`, `nvtext::count_tokens()` and `nvtext::ngrams_tokenize()` functions. The benchmarks measures various string lengths and number of rows. These functions use the `make_strings_column` factory optimized in #7576 Authors: - David (@davidwendt) Approvers: - Conor Hoekstra (@codereport) - Nghia Truong (@ttnghia) - Mark Harris (@harrism) URL: #7684

@davidwendt

Reference #5696 Creates gbenchmarks for `nvtext::replace_tokens()` function. The benchmarks measures various string lengths and number of rows with the default whitespace delimiter and 4 hardcoded tokens. This API already uses the `make_strings_children` utility. Authors: - David (@davidwendt) Approvers: - Karthikeyan (@karthikeyann) - Nghia Truong (@ttnghia) - @nvdbaranec - Keith Kraus (@kkraus14) URL: #7708

@davidwendt

Reference #5696 Creates a gbenchmark for `nvtext::generate_ngrams()` and `nvtext::generate_character_ngrams()` functions. The benchmarks measures various string lengths and number of rows. The `nvtext::generate_ngrams()` was refactored to use the more efficient `make_strings_children` which improved its performance by about 50%. Authors: - David (@davidwendt) Approvers: - Nghia Truong (@ttnghia) - Mark Harris (@harrism) URL: #7693

davidwendt · 2021-03-29T15:51:29Z

These are done now.

davidwendt added feature request New feature or request Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) labels Jul 15, 2020

davidwendt changed the title ~~Create gbenchmarks for nvtext APIs~~ [FEA] Create gbenchmarks for nvtext APIs Jul 15, 2020

harrism added Performance Performance related issue tech debt tests Unit testing for project and removed Needs Triage Need team to review and classify labels Jul 19, 2020

github-actions bot added the inactive-90d label Mar 14, 2021

davidwendt mentioned this issue Mar 22, 2021

Add gbenchmark for nvtext normalize functions #7668

Merged

davidwendt self-assigned this Mar 22, 2021

davidwendt removed the inactive-90d label Mar 23, 2021

This was referenced Mar 23, 2021

Add gbenchmark for nvtext tokenize functions #7684

Merged

Add gbenchmark for nvtext ngrams functions #7693

Merged

davidwendt mentioned this issue Mar 24, 2021

Add gbenchmark for nvtext replace-tokens function #7708

Merged

davidwendt closed this as completed Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Create gbenchmarks for nvtext APIs #5696

[FEA] Create gbenchmarks for nvtext APIs #5696

davidwendt commented Jul 15, 2020 •

edited

Loading

github-actions bot commented Mar 14, 2021

davidwendt commented Mar 29, 2021

[FEA] Create gbenchmarks for nvtext APIs #5696

[FEA] Create gbenchmarks for nvtext APIs #5696

Comments

davidwendt commented Jul 15, 2020 • edited Loading

github-actions bot commented Mar 14, 2021

davidwendt commented Mar 29, 2021

davidwendt commented Jul 15, 2020 •

edited

Loading