[Upstream] IFU 05072020 #4

lcskrishna · 2020-05-07T17:33:37Z

Integrate changes from upstream

Co-authored-by: Sukru Eryilmaz <[email protected]>

* update fused bias relu backward kernel * adding support for not require first layer dgrad * fix bug: wrong layer in requires grad * add infrastructure for optional bias and activation, currently only support no bias and no relu * make bias and relu optional separately * add sigmoid activation option

* modify MTA axpby for wider load/store * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

…VIDIA#725) * Changes to make xentropysoftmax load/store vectorized when possible: Increase default ILP so that each thread handle 16 Bytes data in one step Make thread load/store longest vector possible Make unroll case handle adjacent data instead of strided, so same order compare to vector case * Add shift for not aligned case. Remove less than 16 bytes aligned access

…eam-track

jeffdaily

LGTM. Straightforward IFU.

Accept custom (layer type:param name) to include in sparse_parameter …

* Added support for fused ReLU and dropout into transducer joint * Reorganized code selection path in transducer joint fwd * Added support for fused ReLU+dropout into transducer joint * Vectorize transducer loss backward with fused softmax (#3) * Nanz/transducer loss (#4) * Vectorize transducer loss backward with fused softmax * Added a predicate to avoid potential IMA * Nanz/transducer loss (#5) * Vectorize transducer loss backward with fused softmax * Added a predicate to avoid potentional IMA * Added more predicates to avoid IMAs * Updated documentations for newly added features. * Fixed a error in transducer.py

seryilmaz and others added 5 commits April 30, 2020 12:13

fix dropout scaling from p to 1/(1-p) (NVIDIA#816)

aad9300

Co-authored-by: Sukru Eryilmaz <[email protected]>

enable wider load/store for multi_tensor_apply kernels (NVIDIA#763)

17ee854

* modify MTA axpby for wider load/store * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

Merge branch 'master' of https://github.com/NVIDIA/apex into cl/upstr…

4a3ae9a

…eam-track

lcskrishna requested review from ashishfarmer and jeffdaily May 7, 2020 17:33

jeffdaily approved these changes May 7, 2020

View reviewed changes

lcskrishna merged commit e85a1d4 into ROCm:master May 7, 2020

jeffdaily pushed a commit that referenced this pull request Jan 18, 2021

Merge pull request #4 from a-maci/ASP_sparse_param_dict_update

eb5e96c

Accept custom (layer type:param name) to include in sparse_parameter …

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Upstream] IFU 05072020 #4

[Upstream] IFU 05072020 #4

lcskrishna commented May 7, 2020

jeffdaily left a comment

[Upstream] IFU 05072020 #4

[Upstream] IFU 05072020 #4

Conversation

lcskrishna commented May 7, 2020

jeffdaily left a comment

Choose a reason for hiding this comment