-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set FSDP transformer_layer_cls_to_wrap
to model._no_split_modules
?
#24568
Set FSDP transformer_layer_cls_to_wrap
to model._no_split_modules
?
#24568
Comments
cc @pacman100 |
Any thoughts about this? Maybe also cc @stas00? |
Unfortunately I don't have experience with FSDP to contribute to this discussion. |
@pacman100 Friendly ping |
Hello @apoorvkh, the code part you highlighted is enabled now only when using FSDP+XLA. For general FSDP, internally everything is handled by Accelerate. It happens here: transformers/src/transformers/training_args.py Lines 1533 to 1556 in 66954ea
|
PRs huggingface/accelerate#1753 and #24980 should add this capability wherein it will try |
Very cool, thanks a ton! I will try it out and let you know. |
Just circling back, works on my end -- thanks again! |
@pacman100 I want to better understand the mechanism of FSDP's wrapping. Do you know why My understanding of that latter is from this post:
I understand this means that the module should not be split during the forward pass. However, I am not sure I see the connection with Is there a connection between those two variables, or is it simply a way to quickly find the name of the transformer layers (since it is named with a convention of |
Feature request
Currently, when training with FSDP, the Trainer expects to receive an
fsdp_config
argument specifyingfsdp_transformer_layer_cls_to_wrap
.transformers/src/transformers/trainer.py
Lines 1394 to 1406 in 66954ea
I am wondering if we can set this automatically, when the model has a
_no_split_modules
attribute, e.g.transformers/src/transformers/models/opt/modeling_opt.py
Line 401 in 66954ea
Motivation
It would be a convenient feature to set this automatically. This argument is model-specific, but it might be nice to define training arguments independently of a specific model type.
Your contribution
Happy to help make a PR. Would be great if you can confirm whether this would be desirable or if I am misunderstanding something. Thanks!
The text was updated successfully, but these errors were encountered: