Skip to content

Latest commit

 

History

History
67 lines (59 loc) · 3.92 KB

DAT.md

File metadata and controls

67 lines (59 loc) · 3.92 KB

← Back to overview

Public Datasets

All datasets mentioned here are free for research and educational purposes.

Curated Lists

Databases

Description DS Size Formats
Research Google 2.1M .csv, YouTube urls
Kaggle 5k+ -
AWS public Datasets k+ -
UCI k+ -

Speech

Description DS Size Formats
Project Common Voice 400k+ .csv,.mp3
VoxForge 10k+ -
The CMU Audio Databases 1k .raw

Vision

Description DS Size Formats
Flickr Data Yahoo dataset 100M img, 1M videos -
Tiny Images 80M tiny img
CIFAR-10 and CIFAR-100 80M 3kB img
IMAGENET 14M img
VQA Visual Question Answering 10M+ -
Open Images Dataset 9M img urls
YouTube-8M Dataset 8M YouTube urls
Street View House Numbers 600k .mat
Microsoft COCO 330k .txt, img
COCO-QA Image Question answering 123k .txt, .jpg
CAVIAR video sequences of mall and public space behavior 90k+ .mpeg, .jpg
MNIST Handwritten digits 60k .msb
Flickr 30k 30k img, 150k captions -
OULU dataset of eyegazing, face expressions ~20k -
Berkeley Segmentation DS 12k .cdr
Traffic Image Sequences 8k grayscale .gif
Sign Language Recognition 5k img
AR Face Database 4k .raw, .avi
Fingerprint database <1k grayscale .gif

Social Media and Communication

Description DS Size Formats
Question Answering Corpus 1M+ Q&A .txt, .html
Fictional Conversations Extracted from Raw Movie Scripts 220k conversations .txt
Microsoft's Social Media Conversation Corpus 12k tweets .txt

Medicine

Description DS Size Formats
University of South Florida Digital Mammography 2620 cases .ics
Bone and joint CT scan images - .dcm