-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstracting block reduce and block scan from cuIO kernels with cub
apis
#7278
Abstracting block reduce and block scan from cuIO kernels with cub
apis
#7278
Conversation
…238_block_reduce
…o 6238_block_reduce
cub
apis
Please always label graphs with units. Is higher better or is lower better? |
Lower is better |
And if you are not seeing axes. please change your github theme from black to white just for this PR, my apologies. |
Are these numbers from the benchmarks? |
"Change in perf" is percentage? Multiple? Number of seconds? |
in ms |
OK, an absolute difference is not very helpful. Relative performance (speedup/slowdown) would be helpful. |
If the value is negative, then we are having speed-up, and if it is positive then slowdown. |
Yes, but an absolute change in ms is not helpful unless you know the total time. It's easier to make decisions based on old_time / new_time than on old_time - new_time. |
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7278 +/- ##
==============================================
Coverage ? 82.20%
==============================================
Files ? 100
Lines ? 16966
Branches ? 0
==============================================
Hits ? 13947
Misses ? 3019
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job untangling some of the reductions, looks so much better now!
got a bunch of small questions and suggestions.
As an initialization kernel, it shouldn't've been a big part of the overall execution anyway and you confirmed it. That's great. So no big slowdowns as a result of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues that I can think of. @vuule's review is quite thorough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely in favor of the latest changes. Now the code does exactly what it needs to do 👍
@gpucibot merge |
closes #6238
This PR replaces existing usages of
warp_reduce
orwarp_scans
which were used for block reduction/scan withcub::BlockReduce/cub::BlockScan
.The changes has positive effect on mostly on numerical data processing, but seems to be little slower in case of string type.
all files.zip
Update: Graphs have been updated after fixing a bug which also resolved several other performance issues.
Perf plots
Benchmark Performance
y-axis is in
ms
and there are three sets of plot, one which compares mean performance change next to each other, which also has error bars which is standard deviation calculated using five sets of benchmarks. Next one is difference of performance between cub::block_reduce/cub::block_scan with generic approach and the last one is percentage change in performance compared to branch-0.18. If the value is positive, then test is taking less time, else it is taking more time compared to main branch.CSV READER
ORC READER
ORC WRITER
PARQUET CHUNKED WRITER
PARQUET READER
PARQUET WRITER