Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark for strings::repeat_strings APIs #8589

Merged
merged 41 commits into from
Jul 22, 2021

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Jun 22, 2021

This PR implemented benchmarks for the string APIs repeat_strings. The benchmark results listed below were generated from the current APIs and also from the same APIs but with some modifications.

Note that this PR includes upstream code from #8561 thus the code from that PR is also listed here as "changed files".

Blocked by #8561.

Terms:

  • Separate checking: Call for overflow checking separately before actually performing string repeating
  • Integrated checking: Inject overflow checking in the middle of the computation, using the string offsets generated during string repeating

Benchmark results:

Machine:

Run on (36 X 4600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x18)
  L1 Instruction 32 KiB (x18)
  L2 Unified 1024 KiB (x18)
  L3 Unified 25344 KiB (x1)
Load Average: 1.25, 1.60, 1.89

Without any overflow checking:

-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------
repeat_strings_scalar_times/256/16/manual_time          0.043 ms        0.062 ms        17850 bytes_per_second=45.3825M/s
repeat_strings_scalar_times/256/64/manual_time          0.053 ms        0.071 ms        12763 bytes_per_second=156.17M/s
repeat_strings_scalar_times/256/256/manual_time         0.116 ms        0.132 ms         5925 bytes_per_second=296M/s
repeat_strings_scalar_times/1024/16/manual_time         0.040 ms        0.058 ms        17683 bytes_per_second=212.767M/s
repeat_strings_scalar_times/1024/64/manual_time         0.053 ms        0.069 ms        12455 bytes_per_second=636.865M/s
repeat_strings_scalar_times/1024/256/manual_time        0.127 ms        0.141 ms         5406 bytes_per_second=1079.24M/s
repeat_strings_scalar_times/4096/16/manual_time         0.041 ms        0.057 ms        16729 bytes_per_second=808.35M/s
repeat_strings_scalar_times/4096/64/manual_time         0.093 ms        0.107 ms         7181 bytes_per_second=1.43583G/s
repeat_strings_scalar_times/4096/256/manual_time        0.466 ms        0.484 ms         1503 bytes_per_second=1.14725G/s
repeat_strings_scalar_times/16384/16/manual_time        0.057 ms        0.071 ms        11578 bytes_per_second=2.32692G/s
repeat_strings_scalar_times/16384/64/manual_time        0.242 ms        0.260 ms         2840 bytes_per_second=2.21228G/s
repeat_strings_scalar_times/16384/256/manual_time        7.54 ms         7.56 ms           93 bytes_per_second=290.49M/s
repeat_strings_scalar_times/65536/16/manual_time        0.109 ms        0.127 ms         5252 bytes_per_second=4.84485G/s
repeat_strings_scalar_times/65536/64/manual_time        0.816 ms        0.834 ms          849 bytes_per_second=2.62289G/s
repeat_strings_scalar_times/65536/256/manual_time        36.8 ms         36.8 ms           19 bytes_per_second=238.362M/s
repeat_strings_scalar_times/262144/16/manual_time       0.344 ms        0.364 ms         2091 bytes_per_second=6.17523G/s
repeat_strings_scalar_times/262144/64/manual_time        3.12 ms         3.14 ms          223 bytes_per_second=2.73715G/s
repeat_strings_scalar_times/262144/256/manual_time        143 ms          143 ms            5 bytes_per_second=245.057M/s
repeat_strings_column_times/256/16/manual_time          0.068 ms        0.085 ms         8440 bytes_per_second=43.036M/s
repeat_strings_column_times/256/64/manual_time          0.202 ms        0.219 ms         3385 bytes_per_second=45.7583M/s
repeat_strings_column_times/256/256/manual_time         0.820 ms        0.836 ms          784 bytes_per_second=43.1525M/s
repeat_strings_column_times/1024/16/manual_time         0.124 ms        0.142 ms         5174 bytes_per_second=100.509M/s
repeat_strings_column_times/1024/64/manual_time         0.382 ms        0.399 ms         1835 bytes_per_second=98.544M/s
repeat_strings_column_times/1024/256/manual_time         1.65 ms         1.66 ms          398 bytes_per_second=85.3484M/s
repeat_strings_column_times/4096/16/manual_time         0.124 ms        0.141 ms         5279 bytes_per_second=394.454M/s
repeat_strings_column_times/4096/64/manual_time         0.400 ms        0.415 ms         1748 bytes_per_second=380.972M/s
repeat_strings_column_times/4096/256/manual_time         1.66 ms         1.67 ms          416 bytes_per_second=339.81M/s
repeat_strings_column_times/16384/16/manual_time        0.135 ms        0.150 ms         5092 bytes_per_second=1.43102G/s
repeat_strings_column_times/16384/64/manual_time        0.406 ms        0.423 ms         1721 bytes_per_second=1.47128G/s
repeat_strings_column_times/16384/256/manual_time        1.67 ms         1.69 ms          416 bytes_per_second=1.3138G/s
repeat_strings_column_times/65536/16/manual_time        0.415 ms        0.433 ms         1679 bytes_per_second=1.86606G/s
repeat_strings_column_times/65536/64/manual_time         2.51 ms         2.53 ms          277 bytes_per_second=972.971M/s
repeat_strings_column_times/65536/256/manual_time        32.8 ms         32.8 ms           21 bytes_per_second=274.627M/s
repeat_strings_column_times/262144/16/manual_time        3.78 ms         3.80 ms          186 bytes_per_second=839.675M/s
repeat_strings_column_times/262144/64/manual_time        46.1 ms         46.1 ms           15 bytes_per_second=211.801M/s
repeat_strings_column_times/262144/256/manual_time        329 ms          329 ms            2 bytes_per_second=109.597M/s

With separate overflow checking:

-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
repeat_strings_scalar_times/256/16/manual_time          0.052 ms        0.070 ms        12103 bytes_per_second=37.368M/s
repeat_strings_scalar_times/256/64/manual_time          0.065 ms        0.082 ms        11249 bytes_per_second=127.858M/s
repeat_strings_scalar_times/256/256/manual_time         0.129 ms        0.145 ms         5006 bytes_per_second=267.697M/s
repeat_strings_scalar_times/1024/16/manual_time         0.051 ms        0.069 ms        12464 bytes_per_second=165.709M/s
repeat_strings_scalar_times/1024/64/manual_time         0.066 ms        0.082 ms         8230 bytes_per_second=511.448M/s
repeat_strings_scalar_times/1024/256/manual_time        0.164 ms        0.182 ms         4737 bytes_per_second=834.513M/s
repeat_strings_scalar_times/4096/16/manual_time         0.061 ms        0.078 ms         9498 bytes_per_second=542.576M/s
repeat_strings_scalar_times/4096/64/manual_time         0.114 ms        0.129 ms         5767 bytes_per_second=1.16889G/s
repeat_strings_scalar_times/4096/256/manual_time        0.509 ms        0.527 ms         1367 bytes_per_second=1075.86M/s
repeat_strings_scalar_times/16384/16/manual_time        0.072 ms        0.086 ms         8897 bytes_per_second=1.82313G/s
repeat_strings_scalar_times/16384/64/manual_time        0.260 ms        0.279 ms         2690 bytes_per_second=2.06074G/s
repeat_strings_scalar_times/16384/256/manual_time        9.50 ms         9.52 ms           77 bytes_per_second=230.482M/s
repeat_strings_scalar_times/65536/16/manual_time        0.125 ms        0.144 ms         5277 bytes_per_second=4.22483G/s
repeat_strings_scalar_times/65536/64/manual_time        0.866 ms        0.888 ms          767 bytes_per_second=2.47143G/s
repeat_strings_scalar_times/65536/256/manual_time        37.5 ms         37.5 ms           19 bytes_per_second=233.585M/s
repeat_strings_scalar_times/262144/16/manual_time       0.333 ms        0.352 ms         2075 bytes_per_second=6.36716G/s
repeat_strings_scalar_times/262144/64/manual_time        3.13 ms         3.15 ms          224 bytes_per_second=2.73003G/s
repeat_strings_scalar_times/262144/256/manual_time        143 ms          143 ms            5 bytes_per_second=244.985M/s
repeat_strings_column_times/256/16/manual_time          0.093 ms        0.111 ms         6545 bytes_per_second=31.3167M/s
repeat_strings_column_times/256/64/manual_time          0.223 ms        0.240 ms         3031 bytes_per_second=41.4737M/s
repeat_strings_column_times/256/256/manual_time         0.839 ms        0.855 ms          823 bytes_per_second=42.184M/s
repeat_strings_column_times/1024/16/manual_time         0.146 ms        0.164 ms         4211 bytes_per_second=85.1519M/s
repeat_strings_column_times/1024/64/manual_time         0.402 ms        0.419 ms         1747 bytes_per_second=93.7183M/s
repeat_strings_column_times/1024/256/manual_time         1.68 ms         1.70 ms          420 bytes_per_second=83.6342M/s
repeat_strings_column_times/4096/16/manual_time         0.156 ms        0.173 ms         4586 bytes_per_second=314.36M/s
repeat_strings_column_times/4096/64/manual_time         0.423 ms        0.438 ms         1636 bytes_per_second=360.101M/s
repeat_strings_column_times/4096/256/manual_time         1.67 ms         1.69 ms          413 bytes_per_second=337.363M/s
repeat_strings_column_times/16384/16/manual_time        0.157 ms        0.171 ms         4178 bytes_per_second=1.23136G/s
repeat_strings_column_times/16384/64/manual_time        0.428 ms        0.444 ms         1628 bytes_per_second=1.39523G/s
repeat_strings_column_times/16384/256/manual_time        1.70 ms         1.72 ms          411 bytes_per_second=1.29319G/s
repeat_strings_column_times/65536/16/manual_time        0.420 ms        0.438 ms         1577 bytes_per_second=1.84401G/s
repeat_strings_column_times/65536/64/manual_time         2.70 ms         2.72 ms          259 bytes_per_second=905.829M/s
repeat_strings_column_times/65536/256/manual_time        33.3 ms         33.3 ms           21 bytes_per_second=270.614M/s
repeat_strings_column_times/262144/16/manual_time        4.15 ms         4.17 ms          167 bytes_per_second=764.591M/s
repeat_strings_column_times/262144/64/manual_time        46.6 ms         46.6 ms           15 bytes_per_second=209.284M/s
repeat_strings_column_times/262144/256/manual_time        329 ms          329 ms            2 bytes_per_second=109.643M/s

With integrated overflow checking:

-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
repeat_strings_scalar_times/256/16/manual_time          0.046 ms        0.064 ms        11316 bytes_per_second=42.1088M/s
repeat_strings_scalar_times/256/64/manual_time          0.061 ms        0.079 ms        10387 bytes_per_second=134.546M/s
repeat_strings_scalar_times/256/256/manual_time         0.130 ms        0.146 ms         5531 bytes_per_second=265.39M/s
repeat_strings_scalar_times/1024/16/manual_time         0.059 ms        0.078 ms         8486 bytes_per_second=143.562M/s
repeat_strings_scalar_times/1024/64/manual_time         0.064 ms        0.081 ms         8424 bytes_per_second=525.084M/s
repeat_strings_scalar_times/1024/256/manual_time        0.152 ms        0.171 ms         4078 bytes_per_second=896.389M/s
repeat_strings_scalar_times/4096/16/manual_time         0.060 ms        0.078 ms        10393 bytes_per_second=554.57M/s
repeat_strings_scalar_times/4096/64/manual_time         0.118 ms        0.133 ms         5616 bytes_per_second=1.12776G/s
repeat_strings_scalar_times/4096/256/manual_time        0.496 ms        0.514 ms         1369 bytes_per_second=1104.09M/s
repeat_strings_scalar_times/16384/16/manual_time        0.073 ms        0.089 ms         8857 bytes_per_second=1.80611G/s
repeat_strings_scalar_times/16384/64/manual_time        0.265 ms        0.286 ms         2676 bytes_per_second=2.02063G/s
repeat_strings_scalar_times/16384/256/manual_time        8.38 ms         8.40 ms           70 bytes_per_second=261.41M/s
repeat_strings_scalar_times/65536/16/manual_time        0.137 ms        0.159 ms         4912 bytes_per_second=3.86253G/s
repeat_strings_scalar_times/65536/64/manual_time        0.827 ms        0.845 ms          842 bytes_per_second=2.59047G/s
repeat_strings_scalar_times/65536/256/manual_time        36.5 ms         36.5 ms           19 bytes_per_second=240.245M/s
repeat_strings_scalar_times/262144/16/manual_time       0.329 ms        0.348 ms         2093 bytes_per_second=6.44757G/s
repeat_strings_scalar_times/262144/64/manual_time        3.12 ms         3.14 ms          225 bytes_per_second=2.74115G/s
repeat_strings_scalar_times/262144/256/manual_time        143 ms          143 ms            5 bytes_per_second=245.422M/s
repeat_strings_column_times/256/16/manual_time          0.076 ms        0.094 ms         8701 bytes_per_second=38.2733M/s
repeat_strings_column_times/256/64/manual_time          0.210 ms        0.228 ms         3299 bytes_per_second=43.8627M/s
repeat_strings_column_times/256/256/manual_time         0.831 ms        0.847 ms          834 bytes_per_second=42.5716M/s
repeat_strings_column_times/1024/16/manual_time         0.129 ms        0.147 ms         5279 bytes_per_second=96.2961M/s
repeat_strings_column_times/1024/64/manual_time         0.388 ms        0.405 ms         1785 bytes_per_second=97.1464M/s
repeat_strings_column_times/1024/256/manual_time         1.65 ms         1.67 ms          421 bytes_per_second=85.0178M/s
repeat_strings_column_times/4096/16/manual_time         0.132 ms        0.149 ms         5124 bytes_per_second=369.813M/s
repeat_strings_column_times/4096/64/manual_time         0.408 ms        0.423 ms         1711 bytes_per_second=373.09M/s
repeat_strings_column_times/4096/256/manual_time         1.66 ms         1.68 ms          416 bytes_per_second=338.6M/s
repeat_strings_column_times/16384/16/manual_time        0.143 ms        0.158 ms         4756 bytes_per_second=1.34724G/s
repeat_strings_column_times/16384/64/manual_time        0.413 ms        0.430 ms         1676 bytes_per_second=1.44544G/s
repeat_strings_column_times/16384/256/manual_time        1.68 ms         1.70 ms          412 bytes_per_second=1.30622G/s
repeat_strings_column_times/65536/16/manual_time        0.414 ms        0.432 ms         1683 bytes_per_second=1.86912G/s
repeat_strings_column_times/65536/64/manual_time         2.54 ms         2.56 ms          274 bytes_per_second=959.96M/s
repeat_strings_column_times/65536/256/manual_time        32.9 ms         32.9 ms           21 bytes_per_second=273.805M/s
repeat_strings_column_times/262144/16/manual_time        3.67 ms         3.69 ms          190 bytes_per_second=864.283M/s
repeat_strings_column_times/262144/64/manual_time        46.0 ms         46.0 ms           15 bytes_per_second=212.259M/s
repeat_strings_column_times/262144/256/manual_time        329 ms          329 ms            2 bytes_per_second=109.717M/s

Performance graph:

image

image

@ttnghia ttnghia added feature request New feature or request 2 - In Progress Currently a work in progress depends on libcudf libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python) 5 - DO NOT MERGE Hold off on merging; see PR for details non-breaking Non-breaking change labels Jun 22, 2021
@ttnghia ttnghia requested a review from jlowe June 22, 2021 20:24
@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 21, 2021

I just did a lot of cleanup for this PR. Now the code should look very clean for reviewing.

@ttnghia ttnghia changed the title Benchmark for strings::repeat_strings [skip ci] Benchmark for strings::repeat_strings APIs Jul 21, 2021
@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 21, 2021

Rerun tests.

@ttnghia ttnghia requested a review from a team July 21, 2021 00:30
Copy link
Contributor

@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

Comment on lines 124 to 125
int const min_rowlen = 1 << 4;
int const max_rowlen = 1 << 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be str_len?

Copy link
Contributor Author

@ttnghia ttnghia Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Row length here is also string length.
Update: Done.

@rapidsai rapidsai deleted a comment from codecov bot Jul 21, 2021
@rapidsai rapidsai deleted a comment from codecov bot Jul 21, 2021
@rapidsai rapidsai deleted a comment from codecov bot Jul 21, 2021
@harrism
Copy link
Member

harrism commented Jul 22, 2021

@ttnghia this status of this PR is conflicted. The target branch is still 21.08 but the project board says 21.10 (I made the latter change when it was not approved). Now that it is approved, if you are ready to merge you can go ahead and merge it into 21.08 as long as that is done before code freeze. But please move it to the 21.08 project board if you do so. Otherwise please change the target branch to 21.10 by clicking "edit" next to the title.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 22, 2021

@ttnghia this status of this PR is conflicted. The target branch is still 21.08 but the project board says 21.10 (I made the latter change when it was not approved). Now that it is approved, if you are ready to merge you can go ahead and merge it into 21.08 as long as that is done before code freeze. But please move it to the 21.08 project board if you do so. Otherwise please change the target branch to 21.10 by clicking "edit" next to the title.

Sorry for the confusion. I'm merging it to 21.08.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 22, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 09cd5a0 into rapidsai:branch-21.08 Jul 22, 2021
@ttnghia ttnghia deleted the benchmark_repeat_strings branch July 23, 2021 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants