Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run error due to dsataset #44

Closed
angeluau opened this issue Jun 12, 2019 · 16 comments
Closed

run error due to dsataset #44

angeluau opened this issue Jun 12, 2019 · 16 comments

Comments

@angeluau
Copy link

Traceback (most recent call last):
File "train.py", line 340, in
train(args, device_id)
File "train.py", line 272, in train
trainer.train(train_iter_fct, args.train_steps)
File "/home/wsy/xry/BertSum-master/src/models/trainer.py", line 142, in train
for i, batch in enumerate(train_iter):
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 131, in iter
for batch in self.cur_iter:
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 235, in iter
batch = Batch(minibatch, self.device, self.is_test)
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 27, in init
src = torch.tensor(self._pad(pre_src, 0))
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 14, in _pad
width = max(len(d) for d in data)
ValueError: max() arg is an empty sequence

@nlpyang
Copy link
Owner

nlpyang commented Jun 12, 2019

this error means your batchsize is too small

@angeluau
Copy link
Author

thanks for your guidance

@w5688414
Copy link

python train.py -mode train -encoder classifier -dropout 0.1 -bert_data_path ../bert_data/cnndm -model_path ../models/bert_classifier -lr 2e-3 -visible_gpus 0 -gpu_ranks 0 -world_size 1 -report_every 50 -save_checkpoint_steps 1000 -batch_size 800 -decay_method noam -train_steps 50000 -accum_count 2 -log_file ../logs/bert_classifier -use_interval true -warmup_steps 10000
I use this configurations and it works

gpu_rank 0
[2019-12-24 13:37:56,642 INFO] * number of parameters: 109483009
[2019-12-24 13:37:56,643 INFO] Start training...
[2019-12-24 13:37:56,801 INFO] Loading train dataset from ../bert_data/cnndm.train.123.bert.pt, number of examples: 2001
[2019-12-24 13:38:20,795 INFO] Step 50/50000; xent: 7.57; lr: 0.0000001; 8 docs/s; 24 sec
[2019-12-24 13:38:45,591 INFO] Step 100/50000; xent: 6.42; lr: 0.0000002; 8 docs/s; 49 sec
[2019-12-24 13:39:10,421 INFO] Step 150/50000; xent: 5.29; lr: 0.0000003; 8 docs/s; 74 sec
[2019-12-24 13:39:34,860 INFO] Step 200/50000; xent: 4.07; lr: 0.0000004; 8 docs/s; 98 sec
[2019-12-24 13:40:00,014 INFO] Step 250/50000; xent: 3.43; lr: 0.0000005; 8 docs/s; 123 sec
[2019-12-24 13:40:25,245 INFO] Step 300/50000; xent: 3.33; lr: 0.0000006; 8 docs/s; 148 sec
[2019-12-24 13:40:49,972 INFO] Step 350/50000; xent: 3.52; lr: 0.0000007; 8 docs/s; 173 sec
[2019-12-24 13:41:14,567 INFO] Step 400/50000; xent: 3.31; lr: 0.0000008; 8 docs/s; 198 sec
[2019-12-24 13:41:38,936 INFO] Step 450/50000; xent: 3.27; lr: 0.0000009; 9 docs/s; 222 sec
[2019-12-24 13:42:02,810 INFO] Step 500/50000; xent: 3.38; lr: 0.0000010; 9 docs/s; 246 sec
[2019-12-24 13:42:26,544 INFO] Step 550/50000; xent: 3.25; lr: 0.0000011; 9 docs/s; 270 sec
[2019-12-24 13:42:50,622 INFO] Step 600/50000; xent: 3.35; lr: 0.0000012; 9 docs/s; 294 sec
[2019-12-24 13:43:14,930 INFO] Step 650/50000; xent: 3.29; lr: 0.0000013; 8 docs/s; 318 sec
[2019-12-24 13:43:39,289 INFO] Step 700/50000; xent: 3.20; lr: 0.0000014; 8 docs/s; 342 sec
[2019-12-24 13:44:03,805 INFO] Step 750/50000; xent: 3.41; lr: 0.0000015; 8 docs/s; 367 sec
[2019-12-24 13:44:28,189 INFO] Step 800/50000; xent: 3.36; lr: 0.0000016; 8 docs/s; 391 sec
[2019-12-24 13:44:52,998 INFO] Step 850/50000; xent: 3.29; lr: 0.0000017; 8 docs/s; 416 sec
[2019-12-24 13:45:19,066 INFO] Step 900/50000; xent: 3.30; lr: 0.0000018; 8 docs/s; 442 sec
[2019-12-24 13:45:44,456 INFO] Step 950/50000; xent: 3.15; lr: 0.0000019; 8 docs/s; 468 sec
[2019-12-24 13:45:55,944 INFO] Loading train dataset from ../bert_data/cnndm.train.91.bert.pt, number of examples: 1998
[2019-12-24 13:46:09,317 INFO] Step 1000/50000; xent: 3.39; lr: 0.0000020; 8 docs/s; 493 sec
[2019-12-24 13:46:09,320 INFO] Saving checkpoint ../models/bert_classifier/model_step_1000.pt
[2019-12-24 13:46:35,466 INFO] Step 1050/50000; xent: 3.21; lr: 0.0000021; 8 docs/s; 519 sec

@nimahassanpour
Copy link

But when I run the same command it gives this error:
train.py: error: argument -encoder: invalid choice: 'classifier' (choose from 'bert', 'baseline')

after changing -encoder from classifier to bert or baseline, it gives this error:
train.py: error: unrecognized arguments: -dropout 0.1 -world_size 1 -decay_method noam

I deleted these args "-dropout 0.1 -world_size 1 -decay_method noam" from your command line and got this error:

gpu_rank 0
[2020-02-06 22:17:01,325 INFO] * number of parameters: 35456513
[2020-02-06 22:17:01,326 INFO] Start training...
[2020-02-06 22:17:01,438 INFO] Loading train dataset from ../bert_data/cnndm.train.123.bert.pt, number of examples: 2001
Traceback (most recent call last):
File "train.py", line 146, in
train_ext(args, device_id)
File "/data/examples/nhassanp/PreSumm-master/src/train_extractive.py", line 203, in train_ext
train_single_ext(args, device_id)
File "/data/examples/nhassanp/PreSumm-master/src/train_extractive.py", line 245, in train_single_ext
trainer.train(train_iter_fct, args.train_steps)
File "/data/examples/nhassanp/PreSumm-master/src/models/trainer_ext.py", line 137, in train
for i, batch in enumerate(train_iter):
File "/data/examples/nhassanp/PreSumm-master/src/models/data_loader.py", line 142, in iter
for batch in self.cur_iter:
File "/data/examples/nhassanp/PreSumm-master/src/models/data_loader.py", line 278, in iter
for idx, minibatch in enumerate(self.batches):
File "/data/examples/nhassanp/PreSumm-master/src/models/data_loader.py", line 256, in create_batches
for buffer in self.batch_buffer(data, self.batch_size * 300):
File "/data/examples/nhassanp/PreSumm-master/src/models/data_loader.py", line 224, in batch_buffer
ex = self.preprocess(ex, self.is_test)
File "/data/examples/nhassanp/PreSumm-master/src/models/data_loader.py", line 195, in preprocess
tgt = ex['tgt'][:self.args.max_tgt_len][:-1]+[2]
KeyError: 'tgt'

can you please help me to find the solution?

@RafaelWO
Copy link

RafaelWO commented Feb 7, 2020

It seems that a certain target cannot be found. Did you run all suggested preprocessing steps?

@nimahassanpour
Copy link

It seems that the issue was from the downloaded data. I download it again , and that issue was solve. Now the code gives this error:
RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead.

I am using this command:
python train.py -mode train -encoder bert -bert_data_path ../bert_data/cnndm -model_path ../models/bert_classifier -lr 2e-3 -visible_gpus 0 -gpu_ranks 0 -report_every 50 -save_checkpoint_steps 1000 -batch_size 800 -train_steps 50000 -accum_count 2 -log_file ../logs/bert_classifier -use_interval true -warmup_steps 10000 -ext_dropout 0.1

@RafaelWO
Copy link

RafaelWO commented Feb 7, 2020

Downgrade to PyTorch 1.1.0, see also #73

@nimahassanpour
Copy link

I want to test new datasets with BertSum. A dataset like list of paper abstracts in a CSV files. Can you please let me know that how I can pre-process such a data set and create a .pt file which BertSum accepts it as an input?
Thank you!

@RafaelWO
Copy link

Just create .story files which are structured the same way as the original ones from the dataset.
Then you can use the preprocessing in the same way as with the original data.

@nimahassanpour
Copy link

Thank you! Do you know any tool for making .story file?

@RafaelWO
Copy link

E.g. in python:

text = "abc"
with open("text.story","w") as f:
    f.write(text)

@nimahassanpour
Copy link

nimahassanpour commented Feb 12, 2020

Thank you!
Just a quick question, do I need to have @highlight part for may samples? Because my samples do not have summarised part.

@tcqiuyu
Copy link

tcqiuyu commented Feb 18, 2020

this error means your batchsize is too small

I came across this error as well. I wondered why batch size would be "too" small, since I didn't see a constraint like this. Is there anywhere mentioning this constraint? And how to determine whether a batch size is too small or not?

@tcqiuyu
Copy link

tcqiuyu commented Feb 18, 2020

this error means your batchsize is too small

#33
I've seen your explanation here. Thanks. I will dig it out.

@RafaelWO
Copy link

Thank you!
Just a quick question, do I need to have @highlight part for may samples? Because my samples do not have summarised part.

@nimahassanpour It depends what you want to do with your samples. If you want to train the model you need a references summary (section in @highlight). If you want only to predict then it should be fine without them.

@nimahassanpour
Copy link

nimahassanpour commented Feb 19, 2020

@RafaelWO I am sorry for asking many questions and thank you for your replies! I am having another strange problem. When I follow the Option2 for pre-processing data, I can tokenize .story files successfully and generate .story.json file. But When I run the step-5 I always get three empty square bracket:

[nhassanp@uc1f-bioinfocloud-assembly-base src]$ /data/conda_envs/20200204/miniconda3/bin/python preprocess.py -mode format_to_bert -raw_path ../merged_stories_tokenized/ -save_path ../bert_cnn/ -oracle_mode greedy -log_file ../logs/preprocess.log
[]
[]
[]

(Since I don't have url for my data, I skip step-4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants