Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

coco-train-words.p #168

Closed
DesaleF opened this issue Nov 10, 2021 · 2 comments
Closed

coco-train-words.p #168

DesaleF opened this issue Nov 10, 2021 · 2 comments

Comments

@DesaleF
Copy link

DesaleF commented Nov 10, 2021

I am trying to finetune Oscar(vinvl) on my own dataset using vinvl feature. I extracted the features using scene_graph_benchmark repo. My custom data is prepared using coco-caption format. Does anyone know how to prepare coco-train-words.p file for my own custom dataset?

@jontooy
Copy link

jontooy commented Nov 11, 2021

Hi DesaleF,

I am in the same situation, but have not had the time to look into coco-train-words.p yet, although I found this issue on the cider github, which may hint towards preparing your own coco-train-words.p Let me know how it goes!

My current understanding is that the pickle file (.p file) contains dataset document-frequencies that is used when calculating the cider-scores. This pickle file should be passed into the function pycocoevalcap.eval when evaluation your captions. Someone please correct me if I'm wrong.

@DesaleF
Copy link
Author

DesaleF commented Nov 11, 2021

Hello @jontooy! Thank you for the information. I was taking a look at the file a little bit. this is basically the highlight what is in the "coco-train-words.p". What I don't understand was how we can create the document_frequency in the pickle file.

`

import pickle
with open("datasets/coco_caption/coco-train-words.p", "rb") as f:
... words_p = pickle.load(f)
...
words_p.keys()
dict_keys(['document_frequency', 'ref_len'])
words_p["ref_len"]
113287
len(list(words_p["document_frequency"]))
3636892
type(words_p["document_frequency"])
<class 'collections.defaultdict'>
len(list(words_p["document_frequency"].keys()))
3636892
list(words_p["document_frequency"].items())[:5]
[(('knife',), 855.0), (('her', 'head', 'cutting'), 1.0), (('cutting', 'a'), 502.0), (('a', 'chefs', 'knife'), 7.0), (('on', 'her', 'head'), 54.0)]
`

I will take a look at the git issue that you pointed me above and I will reply what I found here.

@DesaleF DesaleF closed this as completed Jun 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants