Feature Request: Support MS-AMP #2143

winglian · 2023-11-10T14:40:24Z

MS-AMP would allow us to also store the weights in FP8, allowing for larger models to be trained on smaller hardware, as right now the weights are still stored on device as fp16/bf16.

The implementation example they provide seems similar to accelerate.prepare(...):

model, optimizer = msamp.initialize(model, optimizer, opt_level="O2")

The text was updated successfully, but these errors were encountered:

muellerzr · 2023-11-10T15:14:45Z

Might be good to have this as an alternative choice, from their docs:

MS-AMP has the following benefit comparing with Transformer Engine:

Speed up memory-limited operations by accessing one byte compared to half or single-precision.
Reduce memory requirements for training models, enabling larger models.
Speed up communication for distributed model by transmitting lower precision gradients.
Reduce training time for large language models with larger minibatches.

Will work on this next week :)

casper-hansen · 2023-11-11T21:19:45Z

+++ would love to see MS-AMP supported. Currently, H100s are on par with A100s cost-wise even with the current FP8 implementation, but if MS-AMP FP8 can be implemented, it is likely anywhere between a 50-100% boost in training speed. We still need Flash Attention with FP8, but MS-AMP is a great first step towards faster training.

winglian · 2023-11-29T15:00:02Z

@muellerzr is this branch in a state to be tested? https://github.com/huggingface/accelerate/tree/ms-amp thanks!

muellerzr · 2023-11-29T16:23:40Z

@winglian not quite yet! But I'll let you know for you to test :) (should be by end of this week!)

muellerzr · 2023-11-29T19:08:42Z

@winglian go ahead and try the branch out :) Note that it only works on single GPU for now (will look at deepspeed tommorow), and you shouldn't see a time decrease I don't think. What you should see though is a memory decrease for NLP based models.

For example, I ran bert-base-cased (NLP example) and saw:

FP8:
Before: 610.92 MB
After: 2.14 GB
BF16:
Before: 413.69 MB
After: 2.72 GB

But time was almost ~2x increase 😱

casper-hansen · 2023-11-29T20:34:10Z

Shouldn’t the FLOPs increase and thereby reducing training time? It should not be present on small models, but if you take a 30B, I would be surprised if you don’t see a difference

muellerzr · 2023-11-29T20:47:49Z

Correct. I only tested on a tiny model just to get the API stable 😉

muellerzr · 2023-12-07T02:17:04Z

Now that it’s a bit more stable, I saw both memory decreases and speed increases when combining MS-AMP and TransformerEngine. More details are in the PR (so overall purely positives)

LSC527 · 2024-07-25T12:06:09Z

@muellerzr accelerate fp8 with ms-amp backend seems not work with deepspeed. However ms-amp itself support work with deepspeed (zero) https://azure.github.io/MS-AMP/docs/user-tutorial/usage/#usage-in-deepspeed

muellerzr · 2024-08-15T16:10:10Z

Correct, I'm looking into that this week

muellerzr self-assigned this Nov 10, 2023

muellerzr added enhancement New feature or request feature request Request for a new feature to be added to Accelerate labels Nov 10, 2023

muellerzr linked a pull request Dec 6, 2023 that will close this issue

Integrate MS-AMP Support for FP8 Precision #2224

Closed

muellerzr mentioned this issue Aug 16, 2024

Fixup MS-AMP integration #3023

Closed

5 tasks

muellerzr mentioned this issue Sep 9, 2024

MS-AMP support (w/o FSDP) #3093

Merged

5 tasks

muellerzr closed this as completed in #3093 Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support MS-AMP #2143

Feature Request: Support MS-AMP #2143

winglian commented Nov 10, 2023

muellerzr commented Nov 10, 2023 •

edited

Loading

casper-hansen commented Nov 11, 2023

winglian commented Nov 29, 2023

muellerzr commented Nov 29, 2023 •

edited

Loading

muellerzr commented Nov 29, 2023 •

edited

Loading

casper-hansen commented Nov 29, 2023

muellerzr commented Nov 29, 2023

muellerzr commented Dec 7, 2023 •

edited

Loading

LSC527 commented Jul 25, 2024

muellerzr commented Aug 15, 2024

Feature Request: Support MS-AMP #2143

Feature Request: Support MS-AMP #2143

Comments

winglian commented Nov 10, 2023

muellerzr commented Nov 10, 2023 • edited Loading

casper-hansen commented Nov 11, 2023

winglian commented Nov 29, 2023

muellerzr commented Nov 29, 2023 • edited Loading

muellerzr commented Nov 29, 2023 • edited Loading

casper-hansen commented Nov 29, 2023

muellerzr commented Nov 29, 2023

muellerzr commented Dec 7, 2023 • edited Loading

LSC527 commented Jul 25, 2024

muellerzr commented Aug 15, 2024

muellerzr commented Nov 10, 2023 •

edited

Loading

muellerzr commented Nov 29, 2023 •

edited

Loading

muellerzr commented Nov 29, 2023 •

edited

Loading

muellerzr commented Dec 7, 2023 •

edited

Loading