-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run error due to dsataset #44
Comments
this error means your batchsize is too small |
thanks for your guidance |
|
But when I run the same command it gives this error: after changing -encoder from classifier to bert or baseline, it gives this error: I deleted these args "-dropout 0.1 -world_size 1 -decay_method noam" from your command line and got this error: gpu_rank 0 can you please help me to find the solution? |
It seems that a certain |
It seems that the issue was from the downloaded data. I download it again , and that issue was solve. Now the code gives this error: I am using this command: |
Downgrade to PyTorch |
I want to test new datasets with BertSum. A dataset like list of paper abstracts in a CSV files. Can you please let me know that how I can pre-process such a data set and create a .pt file which BertSum accepts it as an input? |
Just create |
Thank you! Do you know any tool for making .story file? |
E.g. in python: text = "abc"
with open("text.story","w") as f:
f.write(text) |
Thank you! |
I came across this error as well. I wondered why batch size would be "too" small, since I didn't see a constraint like this. Is there anywhere mentioning this constraint? And how to determine whether a batch size is too small or not? |
#33 |
@nimahassanpour It depends what you want to do with your samples. If you want to train the model you need a references summary (section in @highlight). If you want only to predict then it should be fine without them. |
@RafaelWO I am sorry for asking many questions and thank you for your replies! I am having another strange problem. When I follow the Option2 for pre-processing data, I can tokenize .story files successfully and generate .story.json file. But When I run the step-5 I always get three empty square bracket: [nhassanp@uc1f-bioinfocloud-assembly-base src]$ /data/conda_envs/20200204/miniconda3/bin/python preprocess.py -mode format_to_bert -raw_path ../merged_stories_tokenized/ -save_path ../bert_cnn/ -oracle_mode greedy -log_file ../logs/preprocess.log (Since I don't have url for my data, I skip step-4) |
Traceback (most recent call last):
File "train.py", line 340, in
train(args, device_id)
File "train.py", line 272, in train
trainer.train(train_iter_fct, args.train_steps)
File "/home/wsy/xry/BertSum-master/src/models/trainer.py", line 142, in train
for i, batch in enumerate(train_iter):
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 131, in iter
for batch in self.cur_iter:
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 235, in iter
batch = Batch(minibatch, self.device, self.is_test)
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 27, in init
src = torch.tensor(self._pad(pre_src, 0))
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 14, in _pad
width = max(len(d) for d in data)
ValueError: max() arg is an empty sequence
The text was updated successfully, but these errors were encountered: