-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue using num_beams parameter for T5 / DeepSpeed #10149
Comments
It's
This script is going to be retired soon and |
Thanks -- I was doing it the complex way and looking through the seqtrainer to verify the num_beams was being passed, when really I should have started with funetune_trainer.py to verify the name was the same. :) That did get rid of the argument error. But I am now seeing different errors:
|
You probably need to start transitioning to I haven't full figured out how to do it as not everything was ported, but I'm updating notes here: #10036 as I learn new nuances - one of the main changes is that datasets are now done in a complete different way.
Yes, I remember I had encountered that too - I went back to the original scripts that I know worked (#9996) and then started comparing what changes I have done and then discovered which differences I made that led to more GPU usage. Also note that since the merge of #10114 the DeepSpeed process is completely contained in the Before this PR was merged, if you were to train and then eval then the smaller model would avail itself to eval. Not yet sure how to best to proceed - surely if one can train a model, they should be able to eval it too. edit: looking closer, |
Thanks -- I migrated to Since this is unrelated to the ```--num_beams`` argument, I put it in a new issue: #10161 and we can probably close this one. |
Using a fine-turned seq2seq model, I'd like to generate some number of possible different generations for a given input. One way of typically doing this is using beam search.
Using @stas00 's amazing DeepSpeed additions so that T5-11B will fit in my GPUs, I'm calling the trainer ( finetune_trainer.py
) with only the --do_predict (no train/eval) and (critically) the --num_beams parameter, but this is throwing an error.
I think the issue is likely one of the following:
That this is an unexpected bug/error
That this is normal/expected, and that beam search isn't supported on trainer prediction, but rather normally accomplished using run_distributed_eval.py (as described in https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md ). But if I remember correctly I don't think run_distributed_eval.py currently works with DeepSpeed (though I could be wrong?).
I am using a pull from around Feb 4th, so if things have changed in the past week, it's possible that's my issue, too.
Run Script
Error
The text was updated successfully, but these errors were encountered: