-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add strings::repeat_strings
API that can repeat each string a different number of times
#8561
Conversation
strings::repeat_strings
for repeating individual strings by individual timesstrings::repeat_strings
that can repeat different strings by different number of times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks real solid, just got a few minor suggestions/questions.
# Conflicts: # cpp/tests/strings/repeat_strings_tests.cpp
@gpucibot merge |
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8561 +/- ##
===============================================
Coverage ? 10.49%
===============================================
Files ? 116
Lines ? 18985
Branches ? 0
===============================================
Hits ? 1993
Misses ? 16992
Partials ? 0 Continue to review full report at Codecov.
|
This PR implemented benchmarks for the string APIs `repeat_strings`. The benchmark results listed below were generated from the current APIs and also from the same APIs but with some modifications. Note that this PR includes upstream code from #8561 thus the code from that PR is also listed here as "changed files". Blocked by #8561. ## Terms: * Separate checking: Call for overflow checking separately before actually performing string repeating * Integrated checking: Inject overflow checking in the middle of the computation, using the string offsets generated during string repeating ## Benchmark results: Machine: ``` Run on (36 X 4600 MHz CPU s) CPU Caches: L1 Data 32 KiB (x18) L1 Instruction 32 KiB (x18) L2 Unified 1024 KiB (x18) L3 Unified 25344 KiB (x1) Load Average: 1.25, 1.60, 1.89 ``` ### Without any overflow checking: ``` ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------- repeat_strings_scalar_times/256/16/manual_time 0.043 ms 0.062 ms 17850 bytes_per_second=45.3825M/s repeat_strings_scalar_times/256/64/manual_time 0.053 ms 0.071 ms 12763 bytes_per_second=156.17M/s repeat_strings_scalar_times/256/256/manual_time 0.116 ms 0.132 ms 5925 bytes_per_second=296M/s repeat_strings_scalar_times/1024/16/manual_time 0.040 ms 0.058 ms 17683 bytes_per_second=212.767M/s repeat_strings_scalar_times/1024/64/manual_time 0.053 ms 0.069 ms 12455 bytes_per_second=636.865M/s repeat_strings_scalar_times/1024/256/manual_time 0.127 ms 0.141 ms 5406 bytes_per_second=1079.24M/s repeat_strings_scalar_times/4096/16/manual_time 0.041 ms 0.057 ms 16729 bytes_per_second=808.35M/s repeat_strings_scalar_times/4096/64/manual_time 0.093 ms 0.107 ms 7181 bytes_per_second=1.43583G/s repeat_strings_scalar_times/4096/256/manual_time 0.466 ms 0.484 ms 1503 bytes_per_second=1.14725G/s repeat_strings_scalar_times/16384/16/manual_time 0.057 ms 0.071 ms 11578 bytes_per_second=2.32692G/s repeat_strings_scalar_times/16384/64/manual_time 0.242 ms 0.260 ms 2840 bytes_per_second=2.21228G/s repeat_strings_scalar_times/16384/256/manual_time 7.54 ms 7.56 ms 93 bytes_per_second=290.49M/s repeat_strings_scalar_times/65536/16/manual_time 0.109 ms 0.127 ms 5252 bytes_per_second=4.84485G/s repeat_strings_scalar_times/65536/64/manual_time 0.816 ms 0.834 ms 849 bytes_per_second=2.62289G/s repeat_strings_scalar_times/65536/256/manual_time 36.8 ms 36.8 ms 19 bytes_per_second=238.362M/s repeat_strings_scalar_times/262144/16/manual_time 0.344 ms 0.364 ms 2091 bytes_per_second=6.17523G/s repeat_strings_scalar_times/262144/64/manual_time 3.12 ms 3.14 ms 223 bytes_per_second=2.73715G/s repeat_strings_scalar_times/262144/256/manual_time 143 ms 143 ms 5 bytes_per_second=245.057M/s repeat_strings_column_times/256/16/manual_time 0.068 ms 0.085 ms 8440 bytes_per_second=43.036M/s repeat_strings_column_times/256/64/manual_time 0.202 ms 0.219 ms 3385 bytes_per_second=45.7583M/s repeat_strings_column_times/256/256/manual_time 0.820 ms 0.836 ms 784 bytes_per_second=43.1525M/s repeat_strings_column_times/1024/16/manual_time 0.124 ms 0.142 ms 5174 bytes_per_second=100.509M/s repeat_strings_column_times/1024/64/manual_time 0.382 ms 0.399 ms 1835 bytes_per_second=98.544M/s repeat_strings_column_times/1024/256/manual_time 1.65 ms 1.66 ms 398 bytes_per_second=85.3484M/s repeat_strings_column_times/4096/16/manual_time 0.124 ms 0.141 ms 5279 bytes_per_second=394.454M/s repeat_strings_column_times/4096/64/manual_time 0.400 ms 0.415 ms 1748 bytes_per_second=380.972M/s repeat_strings_column_times/4096/256/manual_time 1.66 ms 1.67 ms 416 bytes_per_second=339.81M/s repeat_strings_column_times/16384/16/manual_time 0.135 ms 0.150 ms 5092 bytes_per_second=1.43102G/s repeat_strings_column_times/16384/64/manual_time 0.406 ms 0.423 ms 1721 bytes_per_second=1.47128G/s repeat_strings_column_times/16384/256/manual_time 1.67 ms 1.69 ms 416 bytes_per_second=1.3138G/s repeat_strings_column_times/65536/16/manual_time 0.415 ms 0.433 ms 1679 bytes_per_second=1.86606G/s repeat_strings_column_times/65536/64/manual_time 2.51 ms 2.53 ms 277 bytes_per_second=972.971M/s repeat_strings_column_times/65536/256/manual_time 32.8 ms 32.8 ms 21 bytes_per_second=274.627M/s repeat_strings_column_times/262144/16/manual_time 3.78 ms 3.80 ms 186 bytes_per_second=839.675M/s repeat_strings_column_times/262144/64/manual_time 46.1 ms 46.1 ms 15 bytes_per_second=211.801M/s repeat_strings_column_times/262144/256/manual_time 329 ms 329 ms 2 bytes_per_second=109.597M/s ``` ### With separate overflow checking: ``` ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------- repeat_strings_scalar_times/256/16/manual_time 0.052 ms 0.070 ms 12103 bytes_per_second=37.368M/s repeat_strings_scalar_times/256/64/manual_time 0.065 ms 0.082 ms 11249 bytes_per_second=127.858M/s repeat_strings_scalar_times/256/256/manual_time 0.129 ms 0.145 ms 5006 bytes_per_second=267.697M/s repeat_strings_scalar_times/1024/16/manual_time 0.051 ms 0.069 ms 12464 bytes_per_second=165.709M/s repeat_strings_scalar_times/1024/64/manual_time 0.066 ms 0.082 ms 8230 bytes_per_second=511.448M/s repeat_strings_scalar_times/1024/256/manual_time 0.164 ms 0.182 ms 4737 bytes_per_second=834.513M/s repeat_strings_scalar_times/4096/16/manual_time 0.061 ms 0.078 ms 9498 bytes_per_second=542.576M/s repeat_strings_scalar_times/4096/64/manual_time 0.114 ms 0.129 ms 5767 bytes_per_second=1.16889G/s repeat_strings_scalar_times/4096/256/manual_time 0.509 ms 0.527 ms 1367 bytes_per_second=1075.86M/s repeat_strings_scalar_times/16384/16/manual_time 0.072 ms 0.086 ms 8897 bytes_per_second=1.82313G/s repeat_strings_scalar_times/16384/64/manual_time 0.260 ms 0.279 ms 2690 bytes_per_second=2.06074G/s repeat_strings_scalar_times/16384/256/manual_time 9.50 ms 9.52 ms 77 bytes_per_second=230.482M/s repeat_strings_scalar_times/65536/16/manual_time 0.125 ms 0.144 ms 5277 bytes_per_second=4.22483G/s repeat_strings_scalar_times/65536/64/manual_time 0.866 ms 0.888 ms 767 bytes_per_second=2.47143G/s repeat_strings_scalar_times/65536/256/manual_time 37.5 ms 37.5 ms 19 bytes_per_second=233.585M/s repeat_strings_scalar_times/262144/16/manual_time 0.333 ms 0.352 ms 2075 bytes_per_second=6.36716G/s repeat_strings_scalar_times/262144/64/manual_time 3.13 ms 3.15 ms 224 bytes_per_second=2.73003G/s repeat_strings_scalar_times/262144/256/manual_time 143 ms 143 ms 5 bytes_per_second=244.985M/s repeat_strings_column_times/256/16/manual_time 0.093 ms 0.111 ms 6545 bytes_per_second=31.3167M/s repeat_strings_column_times/256/64/manual_time 0.223 ms 0.240 ms 3031 bytes_per_second=41.4737M/s repeat_strings_column_times/256/256/manual_time 0.839 ms 0.855 ms 823 bytes_per_second=42.184M/s repeat_strings_column_times/1024/16/manual_time 0.146 ms 0.164 ms 4211 bytes_per_second=85.1519M/s repeat_strings_column_times/1024/64/manual_time 0.402 ms 0.419 ms 1747 bytes_per_second=93.7183M/s repeat_strings_column_times/1024/256/manual_time 1.68 ms 1.70 ms 420 bytes_per_second=83.6342M/s repeat_strings_column_times/4096/16/manual_time 0.156 ms 0.173 ms 4586 bytes_per_second=314.36M/s repeat_strings_column_times/4096/64/manual_time 0.423 ms 0.438 ms 1636 bytes_per_second=360.101M/s repeat_strings_column_times/4096/256/manual_time 1.67 ms 1.69 ms 413 bytes_per_second=337.363M/s repeat_strings_column_times/16384/16/manual_time 0.157 ms 0.171 ms 4178 bytes_per_second=1.23136G/s repeat_strings_column_times/16384/64/manual_time 0.428 ms 0.444 ms 1628 bytes_per_second=1.39523G/s repeat_strings_column_times/16384/256/manual_time 1.70 ms 1.72 ms 411 bytes_per_second=1.29319G/s repeat_strings_column_times/65536/16/manual_time 0.420 ms 0.438 ms 1577 bytes_per_second=1.84401G/s repeat_strings_column_times/65536/64/manual_time 2.70 ms 2.72 ms 259 bytes_per_second=905.829M/s repeat_strings_column_times/65536/256/manual_time 33.3 ms 33.3 ms 21 bytes_per_second=270.614M/s repeat_strings_column_times/262144/16/manual_time 4.15 ms 4.17 ms 167 bytes_per_second=764.591M/s repeat_strings_column_times/262144/64/manual_time 46.6 ms 46.6 ms 15 bytes_per_second=209.284M/s repeat_strings_column_times/262144/256/manual_time 329 ms 329 ms 2 bytes_per_second=109.643M/s ``` ### With integrated overflow checking: ``` ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------- repeat_strings_scalar_times/256/16/manual_time 0.046 ms 0.064 ms 11316 bytes_per_second=42.1088M/s repeat_strings_scalar_times/256/64/manual_time 0.061 ms 0.079 ms 10387 bytes_per_second=134.546M/s repeat_strings_scalar_times/256/256/manual_time 0.130 ms 0.146 ms 5531 bytes_per_second=265.39M/s repeat_strings_scalar_times/1024/16/manual_time 0.059 ms 0.078 ms 8486 bytes_per_second=143.562M/s repeat_strings_scalar_times/1024/64/manual_time 0.064 ms 0.081 ms 8424 bytes_per_second=525.084M/s repeat_strings_scalar_times/1024/256/manual_time 0.152 ms 0.171 ms 4078 bytes_per_second=896.389M/s repeat_strings_scalar_times/4096/16/manual_time 0.060 ms 0.078 ms 10393 bytes_per_second=554.57M/s repeat_strings_scalar_times/4096/64/manual_time 0.118 ms 0.133 ms 5616 bytes_per_second=1.12776G/s repeat_strings_scalar_times/4096/256/manual_time 0.496 ms 0.514 ms 1369 bytes_per_second=1104.09M/s repeat_strings_scalar_times/16384/16/manual_time 0.073 ms 0.089 ms 8857 bytes_per_second=1.80611G/s repeat_strings_scalar_times/16384/64/manual_time 0.265 ms 0.286 ms 2676 bytes_per_second=2.02063G/s repeat_strings_scalar_times/16384/256/manual_time 8.38 ms 8.40 ms 70 bytes_per_second=261.41M/s repeat_strings_scalar_times/65536/16/manual_time 0.137 ms 0.159 ms 4912 bytes_per_second=3.86253G/s repeat_strings_scalar_times/65536/64/manual_time 0.827 ms 0.845 ms 842 bytes_per_second=2.59047G/s repeat_strings_scalar_times/65536/256/manual_time 36.5 ms 36.5 ms 19 bytes_per_second=240.245M/s repeat_strings_scalar_times/262144/16/manual_time 0.329 ms 0.348 ms 2093 bytes_per_second=6.44757G/s repeat_strings_scalar_times/262144/64/manual_time 3.12 ms 3.14 ms 225 bytes_per_second=2.74115G/s repeat_strings_scalar_times/262144/256/manual_time 143 ms 143 ms 5 bytes_per_second=245.422M/s repeat_strings_column_times/256/16/manual_time 0.076 ms 0.094 ms 8701 bytes_per_second=38.2733M/s repeat_strings_column_times/256/64/manual_time 0.210 ms 0.228 ms 3299 bytes_per_second=43.8627M/s repeat_strings_column_times/256/256/manual_time 0.831 ms 0.847 ms 834 bytes_per_second=42.5716M/s repeat_strings_column_times/1024/16/manual_time 0.129 ms 0.147 ms 5279 bytes_per_second=96.2961M/s repeat_strings_column_times/1024/64/manual_time 0.388 ms 0.405 ms 1785 bytes_per_second=97.1464M/s repeat_strings_column_times/1024/256/manual_time 1.65 ms 1.67 ms 421 bytes_per_second=85.0178M/s repeat_strings_column_times/4096/16/manual_time 0.132 ms 0.149 ms 5124 bytes_per_second=369.813M/s repeat_strings_column_times/4096/64/manual_time 0.408 ms 0.423 ms 1711 bytes_per_second=373.09M/s repeat_strings_column_times/4096/256/manual_time 1.66 ms 1.68 ms 416 bytes_per_second=338.6M/s repeat_strings_column_times/16384/16/manual_time 0.143 ms 0.158 ms 4756 bytes_per_second=1.34724G/s repeat_strings_column_times/16384/64/manual_time 0.413 ms 0.430 ms 1676 bytes_per_second=1.44544G/s repeat_strings_column_times/16384/256/manual_time 1.68 ms 1.70 ms 412 bytes_per_second=1.30622G/s repeat_strings_column_times/65536/16/manual_time 0.414 ms 0.432 ms 1683 bytes_per_second=1.86912G/s repeat_strings_column_times/65536/64/manual_time 2.54 ms 2.56 ms 274 bytes_per_second=959.96M/s repeat_strings_column_times/65536/256/manual_time 32.9 ms 32.9 ms 21 bytes_per_second=273.805M/s repeat_strings_column_times/262144/16/manual_time 3.67 ms 3.69 ms 190 bytes_per_second=864.283M/s repeat_strings_column_times/262144/64/manual_time 46.0 ms 46.0 ms 15 bytes_per_second=212.259M/s repeat_strings_column_times/262144/256/manual_time 329 ms 329 ms 2 bytes_per_second=109.717M/s ``` ### Performance graph: ![image](https://user-images.githubusercontent.com/7416935/123003352-51f8f480-d370-11eb-8608-743bcc08e8d3.png) ![image](https://user-images.githubusercontent.com/7416935/123004367-bec0be80-d371-11eb-8c0e-72437e6d4e45.png) Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Conor Hoekstra (https://github.com/codereport) - Robert Maynard (https://github.com/robertmaynard) URL: #8589
This work is requested from the Spark team, which is also a follow up work on #8423 so that cudf's
strings::repeat_strings
fully supportsStringRepeat
SQL expression in Apache Spark.Note that this API requires to explicitly implement overflow check for the size of the output strings column, as it is not trivial and can't be performed outside of cudf.
This PR also rewrites some existing code, including renaming variables and changes in doxygen.
Follow up works depending on this PR:
strings::repeat_strings
APIs #8589strings:repeat_strings
that repeats each string separately by different numbers of times #8572