-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower CIDEr score than the paper reported #9
Comments
Hi. I tried a lot to replicate the results, but its very hard to achieve the cider score reported in the paper. I stopped wasting time on it after that. You may try this PyTorch implementation which includes the adaptive attention model, and see if you can replicate the result to 1.085, but make sure to fine tune the encoder, as its important as mentioned here: ruotianluo/self-critical.pytorch#13 |
Thanks so much for your reply. In fact, I think your code implementation is very reasonable and close to the original paper. But I'm still a little confused about the acitivation function you chose for the sentinel and hidden affine. ` hidden_affine = F.tanh(self.h_affine(decoder_out)) # (batch_size,hidden_size) Are there any special considerations for the choice of Relu and Tanh? I refered to the paper and they simply said
|
For the sentinel vector, I used ReLU in: |
OK,thanks again for your patience and reply,I'll try if I can boost the score. |
One thing you should note that the results in the paper are evaluated on the test split, not the validation split. But in this repo, they are evaluated on the validation split. So just change the Validation loader and load the testing data to perform fair comparison with the paper. Also I suggest you try the self-critical repo. As reported in the issue here, the model can achieve 1.03 cider without finetuning the encoder. I am planning to come back to it once I have time. Keep me updated if you get it to work. |
Feel free to open this issue again if you have any questions. |
Have you tried to set the 'attention_dim' to 49? This parameter is set to 512 in train_eval.py, but I think it should be set to 49, which is equal to the number of the input features. I got this detail in Section 2.2 of the original paper . And I found that higher dimensions may result in lower scores in some other experiments. |
@LONGRYUU I didn't notice this. Thanks for letting me know. You can try it and see if there is an improvement |
Thanks for the nice work. I trained the model from scratch, and got the final CIDEr score of about 93.2 after 17 epochs, although i haven't implemented the beam search part, i don't think it will boost my model to 108.5, which is reported in the original paper. I also checked the result you gave out and the CIDEr is about 98.7, so do you have any idea of the margin?
The text was updated successfully, but these errors were encountered: