Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support MS-AMP #2143

Closed
winglian opened this issue Nov 10, 2023 · 10 comments · Fixed by #3093
Closed

Feature Request: Support MS-AMP #2143

winglian opened this issue Nov 10, 2023 · 10 comments · Fixed by #3093
Assignees
Labels
enhancement New feature or request feature request Request for a new feature to be added to Accelerate

Comments

@winglian
Copy link

Docs

MS-AMP would allow us to also store the weights in FP8, allowing for larger models to be trained on smaller hardware, as right now the weights are still stored on device as fp16/bf16.

The implementation example they provide seems similar to accelerate.prepare(...):

model, optimizer = msamp.initialize(model, optimizer, opt_level="O2")
@muellerzr
Copy link
Collaborator

muellerzr commented Nov 10, 2023

Might be good to have this as an alternative choice, from their docs:

MS-AMP has the following benefit comparing with Transformer Engine:

Speed up memory-limited operations by accessing one byte compared to half or single-precision.
Reduce memory requirements for training models, enabling larger models.
Speed up communication for distributed model by transmitting lower precision gradients.
Reduce training time for large language models with larger minibatches.

Will work on this next week :)

@muellerzr muellerzr self-assigned this Nov 10, 2023
@muellerzr muellerzr added enhancement New feature or request feature request Request for a new feature to be added to Accelerate labels Nov 10, 2023
@casper-hansen
Copy link

+++ would love to see MS-AMP supported. Currently, H100s are on par with A100s cost-wise even with the current FP8 implementation, but if MS-AMP FP8 can be implemented, it is likely anywhere between a 50-100% boost in training speed. We still need Flash Attention with FP8, but MS-AMP is a great first step towards faster training.

@winglian
Copy link
Author

@muellerzr is this branch in a state to be tested? https://github.com/huggingface/accelerate/tree/ms-amp thanks!

@muellerzr
Copy link
Collaborator

muellerzr commented Nov 29, 2023

@winglian not quite yet! But I'll let you know for you to test :) (should be by end of this week!)

@muellerzr
Copy link
Collaborator

muellerzr commented Nov 29, 2023

@winglian go ahead and try the branch out :) Note that it only works on single GPU for now (will look at deepspeed tommorow), and you shouldn't see a time decrease I don't think. What you should see though is a memory decrease for NLP based models.

For example, I ran bert-base-cased (NLP example) and saw:

FP8:
Before: 610.92 MB
After: 2.14 GB
BF16:
Before: 413.69 MB
After: 2.72 GB

But time was almost ~2x increase 😱

@casper-hansen
Copy link

Shouldn’t the FLOPs increase and thereby reducing training time? It should not be present on small models, but if you take a 30B, I would be surprised if you don’t see a difference

@muellerzr
Copy link
Collaborator

Correct. I only tested on a tiny model just to get the API stable 😉

@muellerzr muellerzr linked a pull request Dec 6, 2023 that will close this issue
@muellerzr
Copy link
Collaborator

muellerzr commented Dec 7, 2023

Now that it’s a bit more stable, I saw both memory decreases and speed increases when combining MS-AMP and TransformerEngine. More details are in the PR (so overall purely positives)

@LSC527
Copy link

LSC527 commented Jul 25, 2024

@muellerzr accelerate fp8 with ms-amp backend seems not work with deepspeed. However ms-amp itself support work with deepspeed (zero) https://azure.github.io/MS-AMP/docs/user-tutorial/usage/#usage-in-deepspeed

@muellerzr
Copy link
Collaborator

Correct, I'm looking into that this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request Request for a new feature to be added to Accelerate
Projects
None yet
4 participants