Public Datasets

All datasets mentioned here are free for research and educational purposes.

Curated Lists

Description	DS Size	Formats
Flickr Data Yahoo dataset	100M img, 1M videos	-
Tiny Images	80M	tiny img
CIFAR-10 and CIFAR-100	80M	3kB img
IMAGENET	14M	img
VQA Visual Question Answering	10M+	-
Open Images Dataset	9M	img urls
YouTube-8M Dataset	8M	YouTube urls
Street View House Numbers	600k	.mat
Microsoft COCO	330k	.txt, img
COCO-QA Image Question answering	123k	.txt, .jpg
CAVIAR video sequences of mall and public space behavior	90k+	.mpeg, .jpg
MNIST Handwritten digits	60k	.msb
Flickr 30k	30k img, 150k captions	-
OULU dataset of eyegazing, face expressions	~20k	-
Berkeley Segmentation DS	12k	.cdr
Traffic Image Sequences	8k	grayscale .gif
Sign Language Recognition	5k	img
AR Face Database	4k	.raw, .avi
Fingerprint database	<1k	grayscale .gif

Description	DS Size	Formats
Question Answering Corpus	1M+ Q&A	.txt, .html
Fictional Conversations Extracted from Raw Movie Scripts	220k conversations	.txt
Microsoft's Social Media Conversation Corpus	12k tweets	.txt

Description	DS Size	Formats
University of South Florida Digital Mammography	2620 cases	.ics
Bone and joint CT scan images	-	.dcm