All datasets mentioned here are free for research and educational purposes.
Description | DS Size | Formats |
---|---|---|
Research Google | 2.1M | .csv, YouTube urls |
Kaggle | 5k+ | - |
AWS public Datasets | k+ | - |
UCI | k+ | - |
Description | DS Size | Formats |
---|---|---|
Project Common Voice | 400k+ | .csv,.mp3 |
VoxForge | 10k+ | - |
The CMU Audio Databases | 1k | .raw |
Description | DS Size | Formats |
---|---|---|
Flickr Data Yahoo dataset | 100M img, 1M videos | - |
Tiny Images | 80M | tiny img |
CIFAR-10 and CIFAR-100 | 80M | 3kB img |
IMAGENET | 14M | img |
VQA Visual Question Answering | 10M+ | - |
Open Images Dataset | 9M | img urls |
YouTube-8M Dataset | 8M | YouTube urls |
Street View House Numbers | 600k | .mat |
Microsoft COCO | 330k | .txt, img |
COCO-QA Image Question answering | 123k | .txt, .jpg |
CAVIAR video sequences of mall and public space behavior | 90k+ | .mpeg, .jpg |
MNIST Handwritten digits | 60k | .msb |
Flickr 30k | 30k img, 150k captions | - |
OULU dataset of eyegazing, face expressions | ~20k | - |
Berkeley Segmentation DS | 12k | .cdr |
Traffic Image Sequences | 8k | grayscale .gif |
Sign Language Recognition | 5k | img |
AR Face Database | 4k | .raw, .avi |
Fingerprint database | <1k | grayscale .gif |
Description | DS Size | Formats |
---|---|---|
Question Answering Corpus | 1M+ Q&A | .txt, .html |
Fictional Conversations Extracted from Raw Movie Scripts | 220k conversations | .txt |
Microsoft's Social Media Conversation Corpus | 12k tweets | .txt |
Description | DS Size | Formats |
---|---|---|
University of South Florida Digital Mammography | 2620 cases | .ics |
Bone and joint CT scan images | - | .dcm |