-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[run_summarization.py] wrong dataset leads to CUDA error:s #11344
Comments
This fails too:
|
I'm not sure it's a dataset thing. I think there is something wrong inside the Pegasus model, there have been multiple issues with it not working with Trainer. |
Hmm, after updating |
Hi, do you know how to use GPU when running summarization.py? I have 2 GPUs on my computer, but it didn't use them... Thank you very much! |
@liubest, please kindly use https://discuss.huggingface.co/ if you run into troubles after reading README.md, which should cover most of the questions on this example usage. |
Thank you for your reply. I have one more question and it is not found in the forum. When using run_summarization.py, how to run transformer models like t5-small, facebook/bart-large-cnn without loading pre-trained weights? I only want to train their original model architecture without pre-trained model. Thank you very much! |
You will find probably dozens tutorials if you use google: Please try huggingface train model from scratch. Please let's not derail this issue by asking unrelated questions. If you still have a problem please start a new Issue. Thank you! |
I'm also interested in solving this problem. @stas00, let me know if I should look into it |
Yes, please, @patrickvonplaten - thank you! |
@stas00, I checked and the problem simply seems to be that python examples/pytorch/summarization/run_summarization.py --model_name_or_path google/pegasus-xsum --do_train \
--do_eval --dataset_name cnn_dailymail --dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization --per_device_train_batch_size=1 --per_device_eval_batch_size=1 \
--overwrite_output_dir --predict_with_generate --max_source_length 512 |
By the way errors like those |
Thank you for investigating this, @patrickvonplaten - could we programmatically defend against this mismatch? |
Yes! so with we should document this at https://huggingface.co/transformers/troubleshooting.html Also |
@stas00 , @patrickvonplaten , Pegasus actually uses SinusoidalPositionalEmbedding, so there is no seq length limit. We should resize the embedding if cur len is greater than the default len. That's what we do in FSMT and M2M100 |
On the other hand Pegasus has only been trained on a max length of 512, so I'm not sure whether it's a good idea to "silently" extend the input to a length of 1024 since the model will probably produce garbage, or do you guys have had different experiences @stas00 @patil-suraj ? Think I'd prefer to reduce max length automatically to model.config.max_position_embeddings and throw a warning |
That makes sense, but even though pegaus is pre-trained with 512, they use different fo example for xsum model and for cnn_dm, pubmed it is 1024 |
This is very likely to be unnoticed. We misuse warnings too much, they are ok when you have 5 lines of output, when you have 100s of those chances that the user will see it is close to 0. Especially when things seem to work, albeit with setting changes behind the scenes. I feel that @patil-suraj's suggestion of granting user's wish is a better one and if they get garbage then it's loud and clear that they did something wrong. Here, a warning of asking for a longer value than preset will work, as they are likely to search for the culprit. And in situations where we know what the user is asking for is surely not going to work, we should assert. |
Ok - good arguments! IMO we should only allow this resizing though for models that use Sinusoidal position embeddings a.k.a. position embeddings that have set In terms of implementation, I'd suggest to add a general |
We should also overwrite the |
@patrickvonplaten, do you have some resources to come back so that we could complete this issue? It looks like it fell between the cracks. Thank you. |
Ok so the plan is to:
=> Happy to open a PR for this one, but would be great to first hear @LysandreJik and @sgugger's opinion on it as well |
Works for me! |
@sgugger ,can you share your working code? |
No I meant the plan suggested by @patrickvonplaten in the above message works for me. |
Feeding
--dataset_name cnn_dailymail
to--model_name_or_path google/pegasus-xsum
leads to lots of errors from pytorch - perhaps there is a way to detect that the dataset is inappropriate and give a nice relevant assert instead?You'd think that
--dataset_name cnn_dailymail
and--dataset_name xsum
should be interchangeable...If I run it on one gpu I get:
Thanks.
@sgugger, @patil-suraj
The text was updated successfully, but these errors were encountered: