Skip to content

Latest commit

 

History

History
55 lines (32 loc) · 1.38 KB

README.md

File metadata and controls

55 lines (32 loc) · 1.38 KB

Image Captioning (Computer Vision Nanodegree Project)

The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.

Sample Dog Output

You can read more about the dataset on the website or in the research paper.

In this notebook, you will explore this dataset, in preparation for the project.


Demo

To see the working of this project please to 3_Inference.ipynb.


Model Architecture

  • Encoder Encoder Architecture

  • Decoder Decoder Architecture

  • Model Model Architecture


Screenshots

  1. Some of best predictions.

a man riding skis down a snow covered slope.

a large jetliner flying through the air.

  2. Some of not the best predictions.

a man is sitting on a couch with a laptop.

a fire hydrant on a sidewalk next to a building.