Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable wider load/store for multi_tensor_apply kernels #763

Merged
merged 2 commits into from
Apr 30, 2020

Conversation

FDecaYed
Copy link
Contributor

No description provided.

@ptrblck ptrblck merged commit 17ee854 into NVIDIA:master Apr 30, 2020
lcskrishna added a commit to ROCm/apex that referenced this pull request May 7, 2020
* fix dropout scaling from p to 1/(1-p) (NVIDIA#816)

Co-authored-by: Sukru Eryilmaz <[email protected]>

* Improvements to apex.mlp (NVIDIA#804)

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

* enable wider load/store for multi_tensor_apply kernels (NVIDIA#763)

* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

* Changes to make xentropysoftmax load/store vectorized when possible: (NVIDIA#725)

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided, so same order compare to vector case

* Add shift for not aligned case. Remove less than 16 bytes aligned access

Co-authored-by: Burc Eryilmaz <[email protected]>
Co-authored-by: Sukru Eryilmaz <[email protected]>
Co-authored-by: Deyu Fu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants