High loss for Text CNN in Stage 1 and COCO dataset questions #6

Kabnoory · 2018-07-13T19:53:22Z

Hey layumi, I am trying to replicate your results for mscoco in tensorflow I had some questions about processing data and loss:

At the end of Stage 1 my text CNN ('objective_txt') loss is high around 5.5, what was the loss you got at the end of Stage 1?
in dataset/MSCOCO-prepare/prepare_wordcnn_feature2.m you create
wordcnn = zeros(32,611765,'int16')
then loop over all the captions in MSCOCO, but there is 616,767 captions in MSCOCO, so what's the reason of this 5002 difference? it throws an out of range error when I implemented it in tensorflow because there is more captions than the rows/columns in the matrix wordcnn created
coco_dictionary.mat dimensions is 29972 in your code but my dimensions are different? I wonder if this is the reason why the loss is high or it might be because tensorflow uses a different random generator than matlab, if you have any suggestion on this that would be great

Thank you!

layumi · 2018-07-14T22:09:28Z

Hi @Kabnoory
Thank you for following our work!

I do not remember the loss clearly. The training accuracy (text branch) converges to about 70%-80%.
I randomly select the 1k image for testing. 1k image & about 5k captions are not included during the dictionary learning, since dictionary learning is a training process.
You may use the tensorflow random generator. It is fine. I do not think the training/test split plays an important role for training convergence.

layumi · 2018-07-14T22:11:59Z

I will upload a second-version paper soon with more technical details.
It will be more easy to follow.

Kabnoory · 2018-07-16T22:02:16Z

Thanks for your response! By training accuracy do you mean top1err_txt or top5err_txt for the text branch for stage 1? And that would be 0.2-0.3 error right? I think I would have to reimplement my Text CNN cause my loss is way higher than that.

Kabnoory · 2018-07-17T18:03:11Z

I think the issue was that I set the learning rate to 0.001, but I found here https://github.com/layumi/Image-Text-Embedding/blob/master/matlab/%2Bdagnn/%40DagNN/initParams.m
that each layer has different learning rate.

I wanted to ask what's the purpose of
obj.params(p(1)).value = obj.params(p(1)).value * 0.1;
in line 78 in initParams.m?
Also, is there any other changes in dagNN that you made other than a custom learning rate, that I should implement by myself in tensorflow?

layumi · 2018-07-17T18:06:05Z

Hi @Kabnoory,
Yes. It is a trick in the early image classification works.
They usually use a small initiailzation for the final fully-connected layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High loss for Text CNN in Stage 1 and COCO dataset questions #6

High loss for Text CNN in Stage 1 and COCO dataset questions #6

Kabnoory commented Jul 13, 2018

layumi commented Jul 14, 2018

layumi commented Jul 14, 2018

Kabnoory commented Jul 16, 2018

Kabnoory commented Jul 17, 2018

layumi commented Jul 17, 2018

High loss for Text CNN in Stage 1 and COCO dataset questions #6

High loss for Text CNN in Stage 1 and COCO dataset questions #6

Comments

Kabnoory commented Jul 13, 2018

layumi commented Jul 14, 2018

layumi commented Jul 14, 2018

Kabnoory commented Jul 16, 2018

Kabnoory commented Jul 17, 2018

layumi commented Jul 17, 2018