stage_1_and_2.py: do gradient scale only for fp16 #3166

guoyejun · 2023-04-09T10:58:54Z

No description provided.

guoyejun · 2023-04-09T10:59:19Z

for bf16, the gradient scale is not needed.

* zero++ tutorial PR (#3783) * [Fix] _conv_flops_compute when padding is a str and stride=1 (#3169) * fix conv_flops_compute when padding is a str when stride=1 * fix error * change type of paddings to tuple * fix padding calculation * apply formatting check --------- Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> * fix interpolate flops compute (#3782) * use `Flops Profiler` to test `model.generate()` (#2515) * Update profiler.py * pre-commit run --all-files * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store --------- Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Cheng Li <[email protected]> * revert PR #3166, it disabled grad clip for bf16 * ensure no loss scaling for non-fp16 dtypes * revert PR #3611 (#3786) * bump to 0.9.6 * ZeRO++ chinese blog (#3793) * zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format * remove staging trigger (#3792) * DeepSpeed-Triton for Inference (#3748) Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Arash Bakhtiari <[email protected]> Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Ethan Doe <[email protected]> Co-authored-by: yidoe <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * ZeRO++ (#3784) Co-authored-by: HeyangQin <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> Co-authored-by: cmikeh2 <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> * adding zero++ to navigation panel of deepspeed.ai (#3796) * Add ZeRO++ Japanese blog (#3797) * zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format * add ZeRO++ Japanese blog * add links --------- Co-authored-by: HeyangQin <[email protected]> Co-authored-by: Conglong Li <[email protected]> * Bug Fixes for autotuner and flops profiler (#1880) * fix autotuner when backward is not called * fix format --------- Co-authored-by: Olatunji Ruwase <[email protected]> * Missing strided copy for gated MLP (#3788) Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Logan Adams <[email protected]> * Requires grad checking. (#3789) Co-authored-by: Jeff Rasley <[email protected]> * bump to 0.10.0 * Fix Bug in transform.cu (#3534) * Bug fix * Fixed formatting error --------- Co-authored-by: Logan Adams <[email protected]> * bug fix: triton importing error (#3799) Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> --------- Co-authored-by: Heyang Qin <[email protected]> Co-authored-by: Bill Luo <[email protected]> Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Guorun <[email protected]> Co-authored-by: stephen youn <[email protected]> Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Arash Bakhtiari <[email protected]> Co-authored-by: Ethan Doe <[email protected]> Co-authored-by: yidoe <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> Co-authored-by: cmikeh2 <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Conglong Li <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Joe Mayer <[email protected]> Co-authored-by: Ramya Ramineni <[email protected]>

…osoft#3790) * zero++ tutorial PR (microsoft#3783) * [Fix] _conv_flops_compute when padding is a str and stride=1 (microsoft#3169) * fix conv_flops_compute when padding is a str when stride=1 * fix error * change type of paddings to tuple * fix padding calculation * apply formatting check --------- Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> * fix interpolate flops compute (microsoft#3782) * use `Flops Profiler` to test `model.generate()` (microsoft#2515) * Update profiler.py * pre-commit run --all-files * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store --------- Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Cheng Li <[email protected]> * revert PR microsoft#3166, it disabled grad clip for bf16 * ensure no loss scaling for non-fp16 dtypes * revert PR microsoft#3611 (microsoft#3786) * bump to 0.9.6 * ZeRO++ chinese blog (microsoft#3793) * zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format * remove staging trigger (microsoft#3792) * DeepSpeed-Triton for Inference (microsoft#3748) Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Arash Bakhtiari <[email protected]> Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Ethan Doe <[email protected]> Co-authored-by: yidoe <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * ZeRO++ (microsoft#3784) Co-authored-by: HeyangQin <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> Co-authored-by: cmikeh2 <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> * adding zero++ to navigation panel of deepspeed.ai (microsoft#3796) * Add ZeRO++ Japanese blog (microsoft#3797) * zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format * add ZeRO++ Japanese blog * add links --------- Co-authored-by: HeyangQin <[email protected]> Co-authored-by: Conglong Li <[email protected]> * Bug Fixes for autotuner and flops profiler (microsoft#1880) * fix autotuner when backward is not called * fix format --------- Co-authored-by: Olatunji Ruwase <[email protected]> * Missing strided copy for gated MLP (microsoft#3788) Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Logan Adams <[email protected]> * Requires grad checking. (microsoft#3789) Co-authored-by: Jeff Rasley <[email protected]> * bump to 0.10.0 * Fix Bug in transform.cu (microsoft#3534) * Bug fix * Fixed formatting error --------- Co-authored-by: Logan Adams <[email protected]> * bug fix: triton importing error (microsoft#3799) Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> --------- Co-authored-by: Heyang Qin <[email protected]> Co-authored-by: Bill Luo <[email protected]> Co-authored-by: Cheng Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Guorun <[email protected]> Co-authored-by: stephen youn <[email protected]> Co-authored-by: Stephen Youn <[email protected]> Co-authored-by: Arash Bakhtiari <[email protected]> Co-authored-by: Ethan Doe <[email protected]> Co-authored-by: yidoe <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> Co-authored-by: cmikeh2 <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Conglong Li <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Joe Mayer <[email protected]> Co-authored-by: Ramya Ramineni <[email protected]>

stage_1_and_2.py: do gradient scale only for fp16

4070c4d

guoyejun requested review from jeffra, tjruwase, samyam and mrwyattii as code owners April 9, 2023 10:58

tjruwase approved these changes Apr 10, 2023

View reviewed changes

guoyejun and others added 6 commits April 11, 2023 09:53

Merge branch 'master' into fp16_gscale

72b83ec

Merge branch 'master' into fp16_gscale

6fa91aa

Merge branch 'master' into fp16_gscale

996381c

Merge branch 'master' into fp16_gscale

d081c97

Merge branch 'master' into fp16_gscale

9cc24d9

Merge branch 'master' into fp16_gscale

7d0d90e

tjruwase enabled auto-merge (squash) April 25, 2023 11:48

tjruwase added 3 commits April 25, 2023 13:23

Merge branch 'master' into fp16_gscale

dbf1ab4

Merge branch 'master' into fp16_gscale

acb5578

Merge branch 'master' into fp16_gscale

cf56ae6

tjruwase merged commit 0e35766 into microsoft:master Apr 26, 2023

jeffra added a commit that referenced this pull request Jun 22, 2023

revert PR #3166, it disabled grad clip for bf16

9bd7b24

jeffra mentioned this pull request Jun 22, 2023

[zero] revert PR #3166, it disabled grad clip for bf16 #3790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stage_1_and_2.py: do gradient scale only for fp16 #3166

stage_1_and_2.py: do gradient scale only for fp16 #3166

guoyejun commented Apr 9, 2023

guoyejun commented Apr 9, 2023

stage_1_and_2.py: do gradient scale only for fp16 #3166

stage_1_and_2.py: do gradient scale only for fp16 #3166

Conversation

guoyejun commented Apr 9, 2023

guoyejun commented Apr 9, 2023