Build an image captioning model by combining a pre-trained VGG-16 image encoder with LSTM-based language decoder.
Effectively I am implementing the following paper:
Show and tell: A neural image caption generator, O. Vinyals, A. Toshev, S. Bengio, D. Erhan, CVPR, 2015.