Skip to content

AniketARS/CVND-Image-Captioning-COCO

Repository files navigation

Image Captioning (Computer Vision Nanodegree Project)

The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.

Sample Dog Output

You can read more about the dataset on the website or in the research paper.

In this notebook, you will explore this dataset, in preparation for the project.


Demo

To see the working of this project please to 3_Inference.ipynb.


Model Architecture

  • Encoder Encoder Architecture

  • Decoder Decoder Architecture

  • Model Model Architecture


Screenshots

  1. Some of best predictions.

a man riding skis down a snow covered slope.

a large jetliner flying through the air.

  2. Some of not the best predictions.

a man is sitting on a couch with a laptop.

a fire hydrant on a sidewalk next to a building.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published