Image Captioning (Computer Vision Nanodegree Project)

The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.

You can read more about the dataset on the website or in the research paper.

In this notebook, you will explore this dataset, in preparation for the project.

Demo

To see the working of this project please to 3_Inference.ipynb.

Model Architecture

Encoder
Decoder
Model

Screenshots

  1. Some of best predictions.

a man riding skis down a snow covered slope.

a large jetliner flying through the air.

  2. Some of not the best predictions.

a man is sitting on a couch with a laptop.

a fire hydrant on a sidewalk next to a building.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
.gitignore		.gitignore
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
README.md		README.md
data_loader.py		data_loader.py
data_loader_val.py		data_loader_val.py
model.py		model.py
training_log.txt		training_log.txt
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning (Computer Vision Nanodegree Project)

Demo

Model Architecture

Screenshots

About

Releases

Packages

Languages

AniketARS/CVND-Image-Captioning-COCO

Folders and files

Latest commit

History

Repository files navigation

Image Captioning (Computer Vision Nanodegree Project)

Demo

Model Architecture

Screenshots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages