Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strings::repeat_strings API that can repeat each string a different number of times #8561

Merged
merged 37 commits into from
Jul 20, 2021

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Jun 18, 2021

This work is requested from the Spark team, which is also a follow up work on #8423 so that cudf's strings::repeat_strings fully supports StringRepeat SQL expression in Apache Spark.

Note that this API requires to explicitly implement overflow check for the size of the output strings column, as it is not trivial and can't be performed outside of cudf.

This PR also rewrites some existing code, including renaming variables and changes in doxygen.

Follow up works depending on this PR:

@ttnghia ttnghia added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python) non-breaking Non-breaking change labels Jun 18, 2021
@ttnghia ttnghia self-assigned this Jun 18, 2021
@ttnghia ttnghia changed the title Add more API for strings::repeat_strings for repeating individual strings by individual times Add more API for strings::repeat_strings that can repeat different strings by different number of times Jun 18, 2021
@ttnghia ttnghia marked this pull request as ready for review June 18, 2021 21:59
@ttnghia ttnghia requested a review from a team as a code owner June 18, 2021 21:59
@ttnghia ttnghia requested review from vuule and codereport June 18, 2021 21:59
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks real solid, just got a few minor suggestions/questions.

cpp/include/cudf/strings/repeat_strings.hpp Outdated Show resolved Hide resolved
cpp/src/strings/repeat_strings.cu Show resolved Hide resolved
cpp/src/strings/repeat_strings.cu Outdated Show resolved Hide resolved
cpp/src/strings/repeat_strings.cu Outdated Show resolved Hide resolved
@ttnghia ttnghia requested review from davidwendt and vuule June 21, 2021 13:50
cpp/src/strings/repeat_strings.cu Outdated Show resolved Hide resolved
cpp/src/strings/repeat_strings.cu Outdated Show resolved Hide resolved
cpp/src/strings/repeat_strings.cu Outdated Show resolved Hide resolved
@rapidsai rapidsai deleted a comment from codecov bot Jul 19, 2021
@rapidsai rapidsai deleted a comment from codecov bot Jul 19, 2021
@ttnghia ttnghia requested a review from davidwendt July 19, 2021 19:20
@rapidsai rapidsai deleted a comment from codecov bot Jul 19, 2021
@rapidsai rapidsai deleted a comment from codecov bot Jul 19, 2021
@rapidsai rapidsai deleted a comment from codecov bot Jul 19, 2021
# Conflicts:
#	cpp/tests/strings/repeat_strings_tests.cpp
@ttnghia ttnghia requested a review from davidwendt July 20, 2021 19:04
@rapidsai rapidsai deleted a comment from codecov bot Jul 20, 2021
@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 20, 2021

@gpucibot merge

@codecov
Copy link

codecov bot commented Jul 20, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@7ee347c). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #8561   +/-   ##
===============================================
  Coverage                ?   10.49%           
===============================================
  Files                   ?      116           
  Lines                   ?    18985           
  Branches                ?        0           
===============================================
  Hits                    ?     1993           
  Misses                  ?    16992           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ee347c...b294f52. Read the comment docs.

@rapids-bot rapids-bot bot merged commit 799f688 into rapidsai:branch-21.08 Jul 20, 2021
rapids-bot bot pushed a commit that referenced this pull request Jul 22, 2021
This PR implemented benchmarks for the string APIs `repeat_strings`. The benchmark results listed below were generated from the current APIs and also from the same APIs but with some modifications.

Note that this PR includes upstream code from #8561 thus the code from that PR is also listed here as "changed files".

Blocked by #8561.

## Terms:
 * Separate checking: Call for overflow checking separately before actually performing string repeating
 * Integrated checking: Inject overflow checking in the middle of the computation, using the string offsets generated during string repeating

## Benchmark results:
Machine:
```
Run on (36 X 4600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x18)
  L1 Instruction 32 KiB (x18)
  L2 Unified 1024 KiB (x18)
  L3 Unified 25344 KiB (x1)
Load Average: 1.25, 1.60, 1.89
```
### Without any overflow checking:
```
-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------
repeat_strings_scalar_times/256/16/manual_time          0.043 ms        0.062 ms        17850 bytes_per_second=45.3825M/s
repeat_strings_scalar_times/256/64/manual_time          0.053 ms        0.071 ms        12763 bytes_per_second=156.17M/s
repeat_strings_scalar_times/256/256/manual_time         0.116 ms        0.132 ms         5925 bytes_per_second=296M/s
repeat_strings_scalar_times/1024/16/manual_time         0.040 ms        0.058 ms        17683 bytes_per_second=212.767M/s
repeat_strings_scalar_times/1024/64/manual_time         0.053 ms        0.069 ms        12455 bytes_per_second=636.865M/s
repeat_strings_scalar_times/1024/256/manual_time        0.127 ms        0.141 ms         5406 bytes_per_second=1079.24M/s
repeat_strings_scalar_times/4096/16/manual_time         0.041 ms        0.057 ms        16729 bytes_per_second=808.35M/s
repeat_strings_scalar_times/4096/64/manual_time         0.093 ms        0.107 ms         7181 bytes_per_second=1.43583G/s
repeat_strings_scalar_times/4096/256/manual_time        0.466 ms        0.484 ms         1503 bytes_per_second=1.14725G/s
repeat_strings_scalar_times/16384/16/manual_time        0.057 ms        0.071 ms        11578 bytes_per_second=2.32692G/s
repeat_strings_scalar_times/16384/64/manual_time        0.242 ms        0.260 ms         2840 bytes_per_second=2.21228G/s
repeat_strings_scalar_times/16384/256/manual_time        7.54 ms         7.56 ms           93 bytes_per_second=290.49M/s
repeat_strings_scalar_times/65536/16/manual_time        0.109 ms        0.127 ms         5252 bytes_per_second=4.84485G/s
repeat_strings_scalar_times/65536/64/manual_time        0.816 ms        0.834 ms          849 bytes_per_second=2.62289G/s
repeat_strings_scalar_times/65536/256/manual_time        36.8 ms         36.8 ms           19 bytes_per_second=238.362M/s
repeat_strings_scalar_times/262144/16/manual_time       0.344 ms        0.364 ms         2091 bytes_per_second=6.17523G/s
repeat_strings_scalar_times/262144/64/manual_time        3.12 ms         3.14 ms          223 bytes_per_second=2.73715G/s
repeat_strings_scalar_times/262144/256/manual_time        143 ms          143 ms            5 bytes_per_second=245.057M/s
repeat_strings_column_times/256/16/manual_time          0.068 ms        0.085 ms         8440 bytes_per_second=43.036M/s
repeat_strings_column_times/256/64/manual_time          0.202 ms        0.219 ms         3385 bytes_per_second=45.7583M/s
repeat_strings_column_times/256/256/manual_time         0.820 ms        0.836 ms          784 bytes_per_second=43.1525M/s
repeat_strings_column_times/1024/16/manual_time         0.124 ms        0.142 ms         5174 bytes_per_second=100.509M/s
repeat_strings_column_times/1024/64/manual_time         0.382 ms        0.399 ms         1835 bytes_per_second=98.544M/s
repeat_strings_column_times/1024/256/manual_time         1.65 ms         1.66 ms          398 bytes_per_second=85.3484M/s
repeat_strings_column_times/4096/16/manual_time         0.124 ms        0.141 ms         5279 bytes_per_second=394.454M/s
repeat_strings_column_times/4096/64/manual_time         0.400 ms        0.415 ms         1748 bytes_per_second=380.972M/s
repeat_strings_column_times/4096/256/manual_time         1.66 ms         1.67 ms          416 bytes_per_second=339.81M/s
repeat_strings_column_times/16384/16/manual_time        0.135 ms        0.150 ms         5092 bytes_per_second=1.43102G/s
repeat_strings_column_times/16384/64/manual_time        0.406 ms        0.423 ms         1721 bytes_per_second=1.47128G/s
repeat_strings_column_times/16384/256/manual_time        1.67 ms         1.69 ms          416 bytes_per_second=1.3138G/s
repeat_strings_column_times/65536/16/manual_time        0.415 ms        0.433 ms         1679 bytes_per_second=1.86606G/s
repeat_strings_column_times/65536/64/manual_time         2.51 ms         2.53 ms          277 bytes_per_second=972.971M/s
repeat_strings_column_times/65536/256/manual_time        32.8 ms         32.8 ms           21 bytes_per_second=274.627M/s
repeat_strings_column_times/262144/16/manual_time        3.78 ms         3.80 ms          186 bytes_per_second=839.675M/s
repeat_strings_column_times/262144/64/manual_time        46.1 ms         46.1 ms           15 bytes_per_second=211.801M/s
repeat_strings_column_times/262144/256/manual_time        329 ms          329 ms            2 bytes_per_second=109.597M/s
```

### With separate overflow checking:
```
-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
repeat_strings_scalar_times/256/16/manual_time          0.052 ms        0.070 ms        12103 bytes_per_second=37.368M/s
repeat_strings_scalar_times/256/64/manual_time          0.065 ms        0.082 ms        11249 bytes_per_second=127.858M/s
repeat_strings_scalar_times/256/256/manual_time         0.129 ms        0.145 ms         5006 bytes_per_second=267.697M/s
repeat_strings_scalar_times/1024/16/manual_time         0.051 ms        0.069 ms        12464 bytes_per_second=165.709M/s
repeat_strings_scalar_times/1024/64/manual_time         0.066 ms        0.082 ms         8230 bytes_per_second=511.448M/s
repeat_strings_scalar_times/1024/256/manual_time        0.164 ms        0.182 ms         4737 bytes_per_second=834.513M/s
repeat_strings_scalar_times/4096/16/manual_time         0.061 ms        0.078 ms         9498 bytes_per_second=542.576M/s
repeat_strings_scalar_times/4096/64/manual_time         0.114 ms        0.129 ms         5767 bytes_per_second=1.16889G/s
repeat_strings_scalar_times/4096/256/manual_time        0.509 ms        0.527 ms         1367 bytes_per_second=1075.86M/s
repeat_strings_scalar_times/16384/16/manual_time        0.072 ms        0.086 ms         8897 bytes_per_second=1.82313G/s
repeat_strings_scalar_times/16384/64/manual_time        0.260 ms        0.279 ms         2690 bytes_per_second=2.06074G/s
repeat_strings_scalar_times/16384/256/manual_time        9.50 ms         9.52 ms           77 bytes_per_second=230.482M/s
repeat_strings_scalar_times/65536/16/manual_time        0.125 ms        0.144 ms         5277 bytes_per_second=4.22483G/s
repeat_strings_scalar_times/65536/64/manual_time        0.866 ms        0.888 ms          767 bytes_per_second=2.47143G/s
repeat_strings_scalar_times/65536/256/manual_time        37.5 ms         37.5 ms           19 bytes_per_second=233.585M/s
repeat_strings_scalar_times/262144/16/manual_time       0.333 ms        0.352 ms         2075 bytes_per_second=6.36716G/s
repeat_strings_scalar_times/262144/64/manual_time        3.13 ms         3.15 ms          224 bytes_per_second=2.73003G/s
repeat_strings_scalar_times/262144/256/manual_time        143 ms          143 ms            5 bytes_per_second=244.985M/s
repeat_strings_column_times/256/16/manual_time          0.093 ms        0.111 ms         6545 bytes_per_second=31.3167M/s
repeat_strings_column_times/256/64/manual_time          0.223 ms        0.240 ms         3031 bytes_per_second=41.4737M/s
repeat_strings_column_times/256/256/manual_time         0.839 ms        0.855 ms          823 bytes_per_second=42.184M/s
repeat_strings_column_times/1024/16/manual_time         0.146 ms        0.164 ms         4211 bytes_per_second=85.1519M/s
repeat_strings_column_times/1024/64/manual_time         0.402 ms        0.419 ms         1747 bytes_per_second=93.7183M/s
repeat_strings_column_times/1024/256/manual_time         1.68 ms         1.70 ms          420 bytes_per_second=83.6342M/s
repeat_strings_column_times/4096/16/manual_time         0.156 ms        0.173 ms         4586 bytes_per_second=314.36M/s
repeat_strings_column_times/4096/64/manual_time         0.423 ms        0.438 ms         1636 bytes_per_second=360.101M/s
repeat_strings_column_times/4096/256/manual_time         1.67 ms         1.69 ms          413 bytes_per_second=337.363M/s
repeat_strings_column_times/16384/16/manual_time        0.157 ms        0.171 ms         4178 bytes_per_second=1.23136G/s
repeat_strings_column_times/16384/64/manual_time        0.428 ms        0.444 ms         1628 bytes_per_second=1.39523G/s
repeat_strings_column_times/16384/256/manual_time        1.70 ms         1.72 ms          411 bytes_per_second=1.29319G/s
repeat_strings_column_times/65536/16/manual_time        0.420 ms        0.438 ms         1577 bytes_per_second=1.84401G/s
repeat_strings_column_times/65536/64/manual_time         2.70 ms         2.72 ms          259 bytes_per_second=905.829M/s
repeat_strings_column_times/65536/256/manual_time        33.3 ms         33.3 ms           21 bytes_per_second=270.614M/s
repeat_strings_column_times/262144/16/manual_time        4.15 ms         4.17 ms          167 bytes_per_second=764.591M/s
repeat_strings_column_times/262144/64/manual_time        46.6 ms         46.6 ms           15 bytes_per_second=209.284M/s
repeat_strings_column_times/262144/256/manual_time        329 ms          329 ms            2 bytes_per_second=109.643M/s
``` 

### With integrated overflow checking:
```
-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
repeat_strings_scalar_times/256/16/manual_time          0.046 ms        0.064 ms        11316 bytes_per_second=42.1088M/s
repeat_strings_scalar_times/256/64/manual_time          0.061 ms        0.079 ms        10387 bytes_per_second=134.546M/s
repeat_strings_scalar_times/256/256/manual_time         0.130 ms        0.146 ms         5531 bytes_per_second=265.39M/s
repeat_strings_scalar_times/1024/16/manual_time         0.059 ms        0.078 ms         8486 bytes_per_second=143.562M/s
repeat_strings_scalar_times/1024/64/manual_time         0.064 ms        0.081 ms         8424 bytes_per_second=525.084M/s
repeat_strings_scalar_times/1024/256/manual_time        0.152 ms        0.171 ms         4078 bytes_per_second=896.389M/s
repeat_strings_scalar_times/4096/16/manual_time         0.060 ms        0.078 ms        10393 bytes_per_second=554.57M/s
repeat_strings_scalar_times/4096/64/manual_time         0.118 ms        0.133 ms         5616 bytes_per_second=1.12776G/s
repeat_strings_scalar_times/4096/256/manual_time        0.496 ms        0.514 ms         1369 bytes_per_second=1104.09M/s
repeat_strings_scalar_times/16384/16/manual_time        0.073 ms        0.089 ms         8857 bytes_per_second=1.80611G/s
repeat_strings_scalar_times/16384/64/manual_time        0.265 ms        0.286 ms         2676 bytes_per_second=2.02063G/s
repeat_strings_scalar_times/16384/256/manual_time        8.38 ms         8.40 ms           70 bytes_per_second=261.41M/s
repeat_strings_scalar_times/65536/16/manual_time        0.137 ms        0.159 ms         4912 bytes_per_second=3.86253G/s
repeat_strings_scalar_times/65536/64/manual_time        0.827 ms        0.845 ms          842 bytes_per_second=2.59047G/s
repeat_strings_scalar_times/65536/256/manual_time        36.5 ms         36.5 ms           19 bytes_per_second=240.245M/s
repeat_strings_scalar_times/262144/16/manual_time       0.329 ms        0.348 ms         2093 bytes_per_second=6.44757G/s
repeat_strings_scalar_times/262144/64/manual_time        3.12 ms         3.14 ms          225 bytes_per_second=2.74115G/s
repeat_strings_scalar_times/262144/256/manual_time        143 ms          143 ms            5 bytes_per_second=245.422M/s
repeat_strings_column_times/256/16/manual_time          0.076 ms        0.094 ms         8701 bytes_per_second=38.2733M/s
repeat_strings_column_times/256/64/manual_time          0.210 ms        0.228 ms         3299 bytes_per_second=43.8627M/s
repeat_strings_column_times/256/256/manual_time         0.831 ms        0.847 ms          834 bytes_per_second=42.5716M/s
repeat_strings_column_times/1024/16/manual_time         0.129 ms        0.147 ms         5279 bytes_per_second=96.2961M/s
repeat_strings_column_times/1024/64/manual_time         0.388 ms        0.405 ms         1785 bytes_per_second=97.1464M/s
repeat_strings_column_times/1024/256/manual_time         1.65 ms         1.67 ms          421 bytes_per_second=85.0178M/s
repeat_strings_column_times/4096/16/manual_time         0.132 ms        0.149 ms         5124 bytes_per_second=369.813M/s
repeat_strings_column_times/4096/64/manual_time         0.408 ms        0.423 ms         1711 bytes_per_second=373.09M/s
repeat_strings_column_times/4096/256/manual_time         1.66 ms         1.68 ms          416 bytes_per_second=338.6M/s
repeat_strings_column_times/16384/16/manual_time        0.143 ms        0.158 ms         4756 bytes_per_second=1.34724G/s
repeat_strings_column_times/16384/64/manual_time        0.413 ms        0.430 ms         1676 bytes_per_second=1.44544G/s
repeat_strings_column_times/16384/256/manual_time        1.68 ms         1.70 ms          412 bytes_per_second=1.30622G/s
repeat_strings_column_times/65536/16/manual_time        0.414 ms        0.432 ms         1683 bytes_per_second=1.86912G/s
repeat_strings_column_times/65536/64/manual_time         2.54 ms         2.56 ms          274 bytes_per_second=959.96M/s
repeat_strings_column_times/65536/256/manual_time        32.9 ms         32.9 ms           21 bytes_per_second=273.805M/s
repeat_strings_column_times/262144/16/manual_time        3.67 ms         3.69 ms          190 bytes_per_second=864.283M/s
repeat_strings_column_times/262144/64/manual_time        46.0 ms         46.0 ms           15 bytes_per_second=212.259M/s
repeat_strings_column_times/262144/256/manual_time        329 ms          329 ms            2 bytes_per_second=109.717M/s
```

### Performance graph:


![image](https://user-images.githubusercontent.com/7416935/123003352-51f8f480-d370-11eb-8608-743bcc08e8d3.png)

![image](https://user-images.githubusercontent.com/7416935/123004367-bec0be80-d371-11eb-8c0e-72437e6d4e45.png)

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Conor Hoekstra (https://github.com/codereport)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #8589
@ttnghia ttnghia deleted the repeat_strings branch July 23, 2021 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants