Image Captioning (Computer Vision Nanodegree Project)

The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.

You can read more about the dataset on the website or in the research paper.

In this notebook, you will explore this dataset, in preparation for the project.

Demo

To see the working of this project please to 3_Inference.ipynb.

Model Architecture

Encoder
Decoder
Model

Screenshots

  1. Some of best predictions.

a man riding skis down a snow covered slope.

a large jetliner flying through the air.

  2. Some of not the best predictions.

a man is sitting on a couch with a laptop.

a fire hydrant on a sidewalk next to a building.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Image Captioning (Computer Vision Nanodegree Project)

Demo

Model Architecture

Screenshots

Files

README.md

Latest commit

History

README.md

File metadata and controls

Image Captioning (Computer Vision Nanodegree Project)

Demo

Model Architecture

Screenshots