Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor libcudf strings::replace to use make_strings_children utility (
#7384) Reference #7370 This PR simplifies the current `cudf::strings::replace` (non-regex) functions by refactoring to use the more efficient `make_strings_children` utility. This refactoring improves performance by about 2x on these APIs as measured by the gbenchmark PR #7369. <details> <summary>Baseline gbenchmark for replace-scalar</summary> ``` --------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------- StringReplaceScalar/replace_scalar/4096/32/manual_time 0.308 ms 0.316 ms 2345 bytes_per_second=224.631M/s StringReplaceScalar/replace_scalar/4096/128/manual_time 1.01 ms 1.03 ms 684 bytes_per_second=269.171M/s StringReplaceScalar/replace_scalar/4096/512/manual_time 7.35 ms 7.38 ms 95 bytes_per_second=149.028M/s StringReplaceScalar/replace_scalar/4096/2048/manual_time 74.1 ms 74.2 ms 9 bytes_per_second=58.9153M/s StringReplaceScalar/replace_scalar/4096/8192/manual_time 1170 ms 1170 ms 1 bytes_per_second=14.8457M/s StringReplaceScalar/replace_scalar/32768/32/manual_time 0.314 ms 0.333 ms 2225 bytes_per_second=1.7147G/s StringReplaceScalar/replace_scalar/32768/128/manual_time 1.16 ms 1.18 ms 604 bytes_per_second=1.83688G/s StringReplaceScalar/replace_scalar/32768/512/manual_time 7.56 ms 7.58 ms 92 bytes_per_second=1.12604G/s StringReplaceScalar/replace_scalar/32768/2048/manual_time 80.8 ms 80.9 ms 9 bytes_per_second=432.314M/s StringReplaceScalar/replace_scalar/32768/8192/manual_time 1526 ms 1521 ms 1 bytes_per_second=91.3563M/s StringReplaceScalar/replace_scalar/262144/32/manual_time 0.430 ms 0.449 ms 1622 bytes_per_second=10.0357G/s StringReplaceScalar/replace_scalar/262144/128/manual_time 1.94 ms 1.96 ms 361 bytes_per_second=8.80298G/s StringReplaceScalar/replace_scalar/262144/512/manual_time 18.1 ms 18.0 ms 39 bytes_per_second=3.77253G/s StringReplaceScalar/replace_scalar/262144/2048/manual_time 227 ms 227 ms 3 bytes_per_second=1.20334G/s StringReplaceScalar/replace_scalar/2097152/32/manual_time 2.48 ms 2.50 ms 282 bytes_per_second=13.9373G/s StringReplaceScalar/replace_scalar/2097152/128/manual_time 11.8 ms 11.9 ms 59 bytes_per_second=11.5245G/s StringReplaceScalar/replace_scalar/2097152/512/manual_time 101 ms 101 ms 7 bytes_per_second=5.42976G/s StringReplaceScalar/replace_scalar/16777216/32/manual_time 22.2 ms 22.2 ms 31 bytes_per_second=12.4258G/s ``` </details> <details> <summary>gbenchmark results for refactored replace-scalar</summary> ``` --------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------- StringReplaceScalar/replace_scalar/4096/32/manual_time 0.144 ms 0.162 ms 4871 bytes_per_second=481.559M/s StringReplaceScalar/replace_scalar/4096/128/manual_time 0.428 ms 0.446 ms 1633 bytes_per_second=634.055M/s StringReplaceScalar/replace_scalar/4096/512/manual_time 2.65 ms 2.67 ms 263 bytes_per_second=413.561M/s StringReplaceScalar/replace_scalar/4096/2048/manual_time 28.8 ms 28.8 ms 24 bytes_per_second=151.733M/s StringReplaceScalar/replace_scalar/4096/8192/manual_time 479 ms 479 ms 2 bytes_per_second=36.2387M/s StringReplaceScalar/replace_scalar/32768/32/manual_time 0.161 ms 0.178 ms 4347 bytes_per_second=3.35237G/s StringReplaceScalar/replace_scalar/32768/128/manual_time 0.466 ms 0.484 ms 1502 bytes_per_second=4.57268G/s StringReplaceScalar/replace_scalar/32768/512/manual_time 2.94 ms 2.96 ms 238 bytes_per_second=2.89405G/s StringReplaceScalar/replace_scalar/32768/2048/manual_time 37.4 ms 37.4 ms 19 bytes_per_second=933.899M/s StringReplaceScalar/replace_scalar/32768/8192/manual_time 567 ms 565 ms 1 bytes_per_second=245.929M/s StringReplaceScalar/replace_scalar/262144/32/manual_time 0.316 ms 0.334 ms 2198 bytes_per_second=13.6601G/s StringReplaceScalar/replace_scalar/262144/128/manual_time 1.39 ms 1.41 ms 498 bytes_per_second=12.237G/s StringReplaceScalar/replace_scalar/262144/512/manual_time 12.8 ms 12.9 ms 54 bytes_per_second=5.30963G/s StringReplaceScalar/replace_scalar/262144/2048/manual_time 157 ms 157 ms 4 bytes_per_second=1.73861G/s StringReplaceScalar/replace_scalar/2097152/32/manual_time 1.84 ms 1.86 ms 379 bytes_per_second=18.7409G/s StringReplaceScalar/replace_scalar/2097152/128/manual_time 9.50 ms 9.52 ms 74 bytes_per_second=14.3717G/s StringReplaceScalar/replace_scalar/2097152/512/manual_time 84.7 ms 84.7 ms 8 bytes_per_second=6.44185G/s StringReplaceScalar/replace_scalar/16777216/32/manual_time 14.0 ms 14.0 ms 50 bytes_per_second=19.6828G/s ``` </details> Improvements for #7370 should base off of these changes. Authors: - David (@davidwendt) Approvers: - Jason Lowe (@jlowe) - @nvdbaranec - Mark Harris (@harrism) URL: #7384
- Loading branch information