Lower CIDEr score than the paper reported #9

zyj0021200 · 2020-05-15T02:39:31Z

Thanks for the nice work. I trained the model from scratch, and got the final CIDEr score of about 93.2 after 17 epochs, although i haven't implemented the beam search part, i don't think it will boost my model to 108.5, which is reported in the original paper. I also checked the result you gave out and the CIDEr is about 98.7, so do you have any idea of the margin?

fawazsammani · 2020-05-15T13:38:34Z

Hi. I tried a lot to replicate the results, but its very hard to achieve the cider score reported in the paper. I stopped wasting time on it after that. You may try this PyTorch implementation which includes the adaptive attention model, and see if you can replicate the result to 1.085, but make sure to fine tune the encoder, as its important as mentioned here: ruotianluo/self-critical.pytorch#13

zyj0021200 · 2020-05-17T08:36:39Z

Thanks so much for your reply. In fact, I think your code implementation is very reasonable and close to the original paper. But I'm still a little confused about the acitivation function you chose for the sentinel and hidden affine.

`
num_pixels = spatial_image.shape[1]
visual_attn = self.v_att(spatial_image) # (batch_size,num_pixels,att_dim)
sentinel_affine = F.relu(self.sen_affine(st)) # (batch_size,hidden_size)
sentinel_attn = self.sen_att(sentinel_affine) # (batch_size,att_dim)

hidden_affine = F.tanh(self.h_affine(decoder_out)) # (batch_size,hidden_size)
hidden_attn = self.h_att(hidden_affine) # (batch_size,att_dim)
`

Are there any special considerations for the choice of Relu and Tanh? I refered to the paper and they simply said

We use a single layer neural network to transform the visual sentinel vector st and LSTM output vector ht into new vectors that have the dimension d.

fawazsammani · 2020-05-17T12:31:46Z

For the sentinel vector, I used ReLU in: sentinel_affine = F.relu(self.sen_affine(st)) because this will be concatenated with the spatial image as in : concat_features = torch.cat([spatial_image, sentinel_affine.unsqueeze(1)], dim = 1) and since the spatial image is activated by ReLU, it is reasonable to activate the sentinel by ReLU as well. As for tanh in the attention, this is the activation function the authors mentioned in the paper (refer to equation 6 and 12).

zyj0021200 · 2020-05-18T08:13:54Z

OK，thanks again for your patience and reply，I'll try if I can boost the score.

fawazsammani · 2020-05-18T13:17:11Z

One thing you should note that the results in the paper are evaluated on the test split, not the validation split. But in this repo, they are evaluated on the validation split. So just change the Validation loader and load the testing data to perform fair comparison with the paper. Also I suggest you try the self-critical repo. As reported in the issue here, the model can achieve 1.03 cider without finetuning the encoder. I am planning to come back to it once I have time. Keep me updated if you get it to work.

fawazsammani · 2020-05-18T13:22:23Z

Feel free to open this issue again if you have any questions.

LONGRYUU · 2020-08-26T14:37:18Z

Have you tried to set the 'attention_dim' to 49? This parameter is set to 512 in train_eval.py, but I think it should be set to 49, which is equal to the number of the input features. I got this detail in Section 2.2 of the original paper . And I found that higher dimensions may result in lower scores in some other experiments.

fawazsammani · 2020-08-26T14:42:44Z

@LONGRYUU I didn't notice this. Thanks for letting me know. You can try it and see if there is an improvement

fawazsammani closed this as completed May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower CIDEr score than the paper reported #9

Lower CIDEr score than the paper reported #9

zyj0021200 commented May 15, 2020

fawazsammani commented May 15, 2020 •

edited

Loading

zyj0021200 commented May 17, 2020

fawazsammani commented May 17, 2020

zyj0021200 commented May 18, 2020

fawazsammani commented May 18, 2020 •

edited

Loading

fawazsammani commented May 18, 2020

LONGRYUU commented Aug 26, 2020

fawazsammani commented Aug 26, 2020

Lower CIDEr score than the paper reported #9

Lower CIDEr score than the paper reported #9

Comments

zyj0021200 commented May 15, 2020

fawazsammani commented May 15, 2020 • edited Loading

zyj0021200 commented May 17, 2020

fawazsammani commented May 17, 2020

zyj0021200 commented May 18, 2020

fawazsammani commented May 18, 2020 • edited Loading

fawazsammani commented May 18, 2020

LONGRYUU commented Aug 26, 2020

fawazsammani commented Aug 26, 2020

fawazsammani commented May 15, 2020 •

edited

Loading

fawazsammani commented May 18, 2020 •

edited

Loading