-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add changes to support FSDP #598
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments. For the Transformers-related changes, it seems you based them on a newer version (main?). Let's just stick to v4.34.1 for now as newer releases are not supported 🙂
Can you also format the code as follows?
pip install --upgrade ruff
make style
@@ -46,7 +47,7 @@ | |||
|
|||
from optimum.habana import GaudiConfig, GaudiTrainer, GaudiTrainingArguments | |||
from optimum.habana.utils import set_seed | |||
|
|||
from optimum.habana.peft.layer import GaudiLoraLayerLinearforward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do that everytime we use FSDP and LoRA? Or is it necessary for this example only?
If it's always necessary, it's better to do that in optimum.habana.transformers.modeling_utils.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to do this every time we use LoRA with torch_compile enabled (see detailed explanation I added in response to one of the other comments from you).
I can move it to optimum.habana.transformers.modeling_utils.py, its ok to do it unconditionally, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay yes. Maybe LoRA inference in the text-generation example could be impacted, I'll check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regisss My understanding was that LoRA is used only for finetuning. But please check and resolve this conversation when you get a chance. Thanks.
@regisss any suggestions on where in code should I add a warning that FSDP is an experimental feature not ready for use yet? How can I add test_fsdp_examples.py to list of tests you run as part of your regular testing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging now since @libinta told me that 1.14 is getting released.
I'll add the warning and the test in another PR.
@vivekgoe When running the FSDP test on Gaudi2 with 1.13, I get errors like
And on Gaudi1 with 1.14 I get
I launched the test with
Is there something I'm doing wrong? |
What does this PR do?
Add changes to support FSDP. BERT-Base enabled with FSDP as a toy example.
Fixes # (issue)
Before submitting