Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower CIDEr score than the paper reported #9

Closed
zyj0021200 opened this issue May 15, 2020 · 8 comments
Closed

Lower CIDEr score than the paper reported #9

zyj0021200 opened this issue May 15, 2020 · 8 comments

Comments

@zyj0021200
Copy link

Thanks for the nice work. I trained the model from scratch, and got the final CIDEr score of about 93.2 after 17 epochs, although i haven't implemented the beam search part, i don't think it will boost my model to 108.5, which is reported in the original paper. I also checked the result you gave out and the CIDEr is about 98.7, so do you have any idea of the margin?

@fawazsammani
Copy link
Owner

fawazsammani commented May 15, 2020

Hi. I tried a lot to replicate the results, but its very hard to achieve the cider score reported in the paper. I stopped wasting time on it after that. You may try this PyTorch implementation which includes the adaptive attention model, and see if you can replicate the result to 1.085, but make sure to fine tune the encoder, as its important as mentioned here: ruotianluo/self-critical.pytorch#13

@zyj0021200
Copy link
Author

Thanks so much for your reply. In fact, I think your code implementation is very reasonable and close to the original paper. But I'm still a little confused about the acitivation function you chose for the sentinel and hidden affine.

`
num_pixels = spatial_image.shape[1]
visual_attn = self.v_att(spatial_image) # (batch_size,num_pixels,att_dim)
sentinel_affine = F.relu(self.sen_affine(st)) # (batch_size,hidden_size)
sentinel_attn = self.sen_att(sentinel_affine) # (batch_size,att_dim)

hidden_affine = F.tanh(self.h_affine(decoder_out)) # (batch_size,hidden_size)
hidden_attn = self.h_att(hidden_affine) # (batch_size,att_dim)
`

Are there any special considerations for the choice of Relu and Tanh? I refered to the paper and they simply said

We use a single layer neural network to transform the visual sentinel vector st and LSTM output vector ht into new vectors that have the dimension d.

@fawazsammani
Copy link
Owner

For the sentinel vector, I used ReLU in: sentinel_affine = F.relu(self.sen_affine(st)) because this will be concatenated with the spatial image as in : concat_features = torch.cat([spatial_image, sentinel_affine.unsqueeze(1)], dim = 1) and since the spatial image is activated by ReLU, it is reasonable to activate the sentinel by ReLU as well. As for tanh in the attention, this is the activation function the authors mentioned in the paper (refer to equation 6 and 12).

@zyj0021200
Copy link
Author

OK,thanks again for your patience and reply,I'll try if I can boost the score.

@fawazsammani
Copy link
Owner

fawazsammani commented May 18, 2020

One thing you should note that the results in the paper are evaluated on the test split, not the validation split. But in this repo, they are evaluated on the validation split. So just change the Validation loader and load the testing data to perform fair comparison with the paper. Also I suggest you try the self-critical repo. As reported in the issue here, the model can achieve 1.03 cider without finetuning the encoder. I am planning to come back to it once I have time. Keep me updated if you get it to work.

@fawazsammani
Copy link
Owner

Feel free to open this issue again if you have any questions.

@LONGRYUU
Copy link

Have you tried to set the 'attention_dim' to 49? This parameter is set to 512 in train_eval.py, but I think it should be set to 49, which is equal to the number of the input features. I got this detail in Section 2.2 of the original paper . And I found that higher dimensions may result in lower scores in some other experiments.

@fawazsammani
Copy link
Owner

@LONGRYUU I didn't notice this. Thanks for letting me know. You can try it and see if there is an improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants