Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix dropout scaling from p to 1/(1-p) in multihead attention #816

Merged
merged 1 commit into from
Apr 30, 2020

Conversation

seryilmaz
Copy link
Contributor

Fixes dropout scaling in backward pass of multihead attention

Copy link
Contributor

@kevinstephano kevinstephano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I knew about this change and talked about it with Burc.

@ptrblck
Copy link
Contributor

ptrblck commented Apr 30, 2020

Thanks! :)

@ptrblck ptrblck merged commit aad9300 into NVIDIA:master Apr 30, 2020
lcskrishna added a commit to ROCm/apex that referenced this pull request May 7, 2020
* fix dropout scaling from p to 1/(1-p) (NVIDIA#816)

Co-authored-by: Sukru Eryilmaz <[email protected]>

* Improvements to apex.mlp (NVIDIA#804)

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

* enable wider load/store for multi_tensor_apply kernels (NVIDIA#763)

* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

* Changes to make xentropysoftmax load/store vectorized when possible: (NVIDIA#725)

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided, so same order compare to vector case

* Add shift for not aligned case. Remove less than 16 bytes aligned access

Co-authored-by: Burc Eryilmaz <[email protected]>
Co-authored-by: Sukru Eryilmaz <[email protected]>
Co-authored-by: Deyu Fu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants