Solution for VLSP image captioning task

Install

pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
# download pretrain and some necessary package from
download_weights.sh

Prepare dataset

mkdir dataset
cd dataset
# download training data and unzip them in 'dataset' folder
gdown --id 1lbOTlksNA5a97_Ydqh84TE6Dm85Rsy60
# download private data and unzip them in 'dataset' folder
gdown --id 1rCDniCZNgaJ7WQUzPpzEwuXW5_WNyave

Training

python train.py --decoder_mode lstm --batch_size 16 --checkpoint checkpoint/pretrain_coco.pth.tar --fine_tune_encoder True

Inference

python caption.py --decoder_mode lstm --checkpoint checkpoint/model_best.pth.tar

My solution result

Method	Avg BLEU score (public test)	Avg BLEU score (private test)
Resnet + LSTM (with pretrained COCO dataset)	0.279	0.273
Resnet + LSTM (without COCO dataset)	0.263
Resnet + Transformer	0.265
EfficientNet B7 + LSTM	0.261

Reference

I got a lot of code from Image-Caption, thanks to @Jianan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Solution for VLSP image captioning task

Install

Prepare dataset

Training

Inference

My solution result

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Solution for VLSP image captioning task

Install

Prepare dataset

Training

Inference

My solution result

Reference