-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding train-roto-ptrs.txt file #26
Comments
Analyzing the content of a fileLooking into the content of the original train-roto-ptrs.txt file did not help to understand how the file was created. From my perspective they are not related to any other information from other files: @ratishsp, Could you please help me to figure out how the train-roto-ptrs.txt file was created? |
Hi Ruslan, data2text-plan-py/data_utils.py Line 574 in 4b74535
The core idea is to provide supervision while training the copy mechanism.
The entries in the train_roto_ptrs.txt contain mapping between the summary token and the corresponding matching token in the content plan. For eg: the last entry 245,39 in train_roto_ptrs[1] indicates that the 245th token in summary matches with 39th content plan entry. Having said that, in later experiments I have realized that such supervision is not strictly required. The model learns an accurate value of |
Hi Ratish, Thanks a lot for your prompt answer! It was such a useful and valuable comment, that I was able to create same file but now for other dataset!
At this moment I did not try to comment out any code which uses the supervision through pointers, since I want to keep things just the way you did! Later I plan to run such an experiment and see whether such supervision is required or not. |
Great! |
When I drive the utils_data.py, the terminal output that the index out of range in the line 593. Do you have the problem? |
Hi Ratish,
Thanks a lot for the insightful research paper as well as making the codebase publicly available! I was able to train the model on boxscore-data and then used it for inference. Now I am interested in training your model on my dataset and then use it for text generation.
Unfortunately, I encountered a problem with the following step (from the README page):
Since my dataset is different from boxscore-data, I basically have to perform transformations manually to my dataset in order to have it in a suitable format for model training.
I have successfully prepared my dataset in the same format as files in boxscore-data/rotowire/, namely, the following files: src_train.txt, train_content_plan.txt, tgt_train.txt, inter/train_content_plan.txt and src_valid.txt, tgt_valid.txt, valid_content_plan.txt, inter/valid_content_plan.txt and test/src_test.txt , test/tgt_test.txt. The structure of these files is the same as in files obtained from boxscore-data
The function for creating this file is:
data2text-plan-py/data_utils.py
Line 574 in 4b74535
Because I have another dataset, I can not use the above-mentioned function, thus I need to create train-roto-ptrs.txt file by myself. Unfortunately, going multiple times through the function implementation and analyzing the content of a file (please see comments), I could not figure out how to create such file from my dataset.
Can you please elaborate on the purposes of the train-roto-ptrs.txt file and briefly describe the steps on how it was created?
This is my current bottleneck and I would highly appreciate your help with this issue!
Many thanks in advance,
Ruslan
The text was updated successfully, but these errors were encountered: