Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_utils.py List index out of range #8

Open
shubhamagarwal92 opened this issue Feb 22, 2019 · 12 comments
Open

data_utils.py List index out of range #8

shubhamagarwal92 opened this issue Feb 22, 2019 · 12 comments

Comments

@shubhamagarwal92
Copy link

While creating train-roto-ptrs.txt using ptrs mode, I am getting this index error:

Traceback (most recent call last):
  File "data_utils.py", line 859, in <module>
    make_pointerfi(args.output_fi, inp_file=args.input_path, content_plan_inp=args.train_content_plan)
  File "data_utils.py", line 593, in make_pointerfi
    content_plan_entry = [content_plan_record for content_plan_record in content_plan[i]]
IndexError: list index out of range

Any quick suggestion?

@ratishsp
Copy link
Owner

The error will occur if the sizes of the content plan and training data do not match.

@ratishsp ratishsp closed this as completed Mar 2, 2019
@tuyaao
Copy link

tuyaao commented May 26, 2019

I use provided train.json and inter/train_content_plan.txt(I do not generate them by myself), it shows error:
Traceback (most recent call last):
File "data_utils.py", line 887, in
make_pointerfi(args.output_fi, inp_file=args.input_path, content_plan_inp=args.train_content_plan)
File "data_utils.py", line 614, in make_pointerfi
content_plan_entry = [content_plan_record for content_plan_record in content_plan[i]]
IndexError: list index out of range
what should I do to fix this bug?

@ghtaro
Copy link

ghtaro commented May 6, 2021

Hi @ratishsp ,

Thank you very much for sharing the code and for answering many questions for people trying to replicate your result which is very helpful to me as well.

I am sorry but I want to reopen this issue because I had the same error above and could not resolve it.

I run the following command:
python data_utils.py -mode ptrs -input_path $BASE/rotowire/train.json -train_content_plan $BASE/rotowire/inter/train_content_plan.txt -output_fi $BASE/rotowire/train-roto-ptrs.txt

with:

The error occurred in the same place:

File "data_utils.py", line 614, in make_pointerfi
content_plan_entry = [content_plan_record for content_plan_record in content_plan[i]]
IndexError: list index out of range

Following your comment:

The error will occur if the sizes of the content plan and training data do not match.

I checked the data size and found the mismatch between train.json (3398 as in https://github.com/harvardnlp/boxscore-data) and train_content_plan.txt (3371 line).
The problem to me is that both the input data are provided and could not do anything with those.

To be honest, I am not very familiar with OpenNMT etc., so if it is too obvious I am sorry.
Could you please tell me if I miss anything?

My environment is below (I could not follow the env specified in requirement.txt, but I do not think it matters to the error...):

future                       0.18.2
nltk                         3.4.5
six                          1.14.0             
torch                        1.8.1              
torchtext                    0.9.1              
tqdm                         4.42.1             

Thank you very much for your help.

@ratishsp
Copy link
Owner

ratishsp commented May 9, 2021

Hi @ghtaro,
Nice to know that you found the code useful.
I am not sure about the root cause of the issue you are facing. But as mentioned in #26 (comment), I have realized that the pointer network supervision is not strictly required. So you can comment any code which uses the pointer supervision.

@ratishsp ratishsp reopened this May 9, 2021
@ghtaro
Copy link

ghtaro commented May 10, 2021

Hi @ratishsp ,

Thank you very much for your prompt reply.

I am not sure about the root cause of the issue you are facing.

I was able to setup the very similar computational environment now, but still got the same error messages...

Anyway, understood, I will follow the instruction (in #26).
Please leave the reopened issue as it is until I can run python script without any error messages...

@happycjksh
Copy link

I have the same problem. Do you solve it?

@happycjksh
Copy link

HI, @ghtaro, I have the same problem and I've stuck in it for many days. Could you tell me how to solve it?

@ratishsp
Copy link
Owner

ratishsp commented Jul 1, 2021

Hi @happycjksh,
I think I now understand what the root cause of the issue is. It is indeed related to #34.
The lengths of train.json and train_content_plan.txt are different because there were some training examples for which no content plans were found. Such examples were excluded during training. So I had worked with a subset of 3371 examples in training for which the content plans could be extracted. I have shared train.json at https://drive.google.com/file/d/1uuRckc6D2WIvrpoadNj-lbilw5XCjR12/view?usp=sharing which matches the length of train_content_plan.txt. Please try with this train.json file and let me know if it works.

@happycjksh
Copy link

Thanks for your answer. I'll load the new train.json immediately and give you a reply about the result as soon as possible.

@happycjksh
Copy link

happycjksh commented Jul 2, 2021

Hi @ratishsp, I'm so sorry. Although the data_utils.py can drive, the train-roto-ptrs.txt is an empty file. I try to solve the problem, but useless. I hope you can help me solve the problem. Thank you

@ratishsp
Copy link
Owner

ratishsp commented Jul 2, 2021

Oh. I am not sure why the file is empty.
You can use the train-roto-ptrs.txt file from the location https://drive.google.com/drive/folders/1R_82ifGiybHKuXnVnC8JhBTW8BAkdwek

@happycjksh
Copy link

I'll try to solve it. If I solve the problem, I can tell you the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants