-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepspeed initialization AttributeError: 'EncoderDecoderConfig' object has no attribute 'hidden_size' #22176
Comments
Hi @ksopyla Thanks for raising this issue and for giving all the script and environment details. Could you share the full traceback of the error encountered? Although I'm not immediately sure where the error is being raised, it is expected that the error occurs if |
HI @amyeroberts I have updated the issue and added the traceback. I hope it helps. I would add to this that the encoder and decoder could have different sizes in terms of the number of layers and hidden_size |
Thank you for the full traceback, @ksopyla. Now it's easy to support you. Please try again with the latest version of transformers. You can see here that this situation has been dealt with on Feb 10th so this assert shouldn't happen again as it now carefully checks different scenarios: transformers/src/transformers/deepspeed.py Lines 179 to 213 in 1c4a9ac
However if you don't set This is just an automatic optimization and you can remove these entries completely and deepspeed will use its defaults. Or you can study what those values should be and set them yourself as explained here: |
Sure, I will check and let you know. I infer you talk about these parameters, which should be set if I use Zero3.
Correct me if I am wrong. Or maybe I should also set those in Zero2? |
ah, ok, thank you for clarifying the situation - that's even simpler then. Just upgrade transformers, change nothing in your setup and it should just work. The original code just did |
I have updated the transformers to 4.27 and pytorch 2.0 and it works :) |
Best to discuss a new issue in a new Issue, but if we can wrap it up quickly - it's absolutely normal that the speed will progressively drop as you enable stages 1, 2 and 3, as each stage creates an additional overhead. If you can fit everything into a single GPU do not use Deepspeed. It's a scalability solution for when one can't fit the training or inference components into a single gpu. If you can, always use straight DDP. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
System: Ubuntu 22.04
transformers
version: 4.26.1packages - Click to expand!
Who can help?
HF Trainer: @stas00, Accelerate: @pacman100
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
deepspeed_zero2.json >>
The training script
Expected behavior
Start traning without error
Traceback
The text was updated successfully, but these errors were encountered: