Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing MeaCap-TF results on MS COCO dataset #9

Open
thechargedneutron opened this issue Oct 31, 2024 · 3 comments
Open

Reproducing MeaCap-TF results on MS COCO dataset #9

thechargedneutron opened this issue Oct 31, 2024 · 3 comments

Comments

@thechargedneutron
Copy link

Hi,

Thanks for the good work. I am trying to reproduce the numbers reported in the paper (Table 1, MeaCap-TF). The paper mentions a CIDEr score of 42.5 on the training free variant. I use the command python inference.py --use_prompt --memory_id cc3m --img_path ./image_example --lm_model_path ./checkpoints/CBART_one_billion to find the MS-COCO captions and use pycocoeval package to find the language metrics. Here are the numbers that I got

SPICE: 0.094
Bleu_4: 0.045
METEOR: 0.141
ROUGE_L: 0.264
CIDEr: 0.260

which seems lower than the numbers in the paper. Can you point me to the evaluation code in the codebase? I am using pycocoeval and not sure if that is the reason for a lower performance. Or let me know if I am missing something.

Thanks

@thechargedneutron thechargedneutron changed the title Reproducing the MeaCap-TF on MS COCO dataset Reproducing MeaCap-TF results on MS COCO dataset Oct 31, 2024
@joeyz0z
Copy link
Owner

joeyz0z commented Nov 6, 2024

The training-free version is sensitive to prompts. You can use --prompt_ensembling.

@thechargedneutron
Copy link
Author

Thanks, I will try that. Do you have an evaluation code to check the performance for the generated captions? I do not see the eval code in the repo.

@thechargedneutron
Copy link
Author

I tried --prompt_ensembling and I get similar performance as I reported above. Do you have the generations for this training-free variant and also, the evaluation code, if possible? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants