-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Back port optimization to broadcast_axis to MXNet1.x #18773
Conversation
Hey @access2rohit , Thanks for submitting the PR
CI supported jobs: [sanity, unix-cpu, windows-cpu, centos-gpu, clang, website, miscellaneous, centos-cpu, windows-gpu, unix-gpu, edge] Note: |
@leezu @sandeep-krishnamurthy can you help merge ? These are cherry-picked from master(already merged there) |
Looks good since it's a cherry-pick. |
@ChaiBapchya |
That PR was against master [2.0]. This PR is for 1.x branch [which forked off from master awhile ago]. Running perf before this PR and after this PR & getting the numbers for this PR would ensure we do our due diligence before merging this PR. |
|
* adding separate int32_t kernel for GPU in broadcast_axis/to/like operators * using structure instead of temp workspace to pass stride and shape * replacing hardcoded int32_t with generic index_t * combining CPU and GPU kernels to leverage cached stride calculation and fast access shape data in both Co-authored-by: Rohit Kumar Srivastava <[email protected]>
* adding comments explaining code optimizations * fixing broadcast_axis kernel to int32 * fixing slice_axis kernel to int32 * combining CPU and GPU implementation method signatures and cleaned up code * adding new broadcast_axis to np_matmul Co-authored-by: Rohit Kumar Srivastava <[email protected]>
Why is it master? Shouldn't it be v1.x? |
1.x is master for all future 1.x releases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Description
Back port optimization to broadcast_axis for CPU #17882 and GPU #18168 to MXNet1.7.x
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.