-
-
Notifications
You must be signed in to change notification settings - Fork 91
Data Repositories
This is a list of public dataset repositories we aim to connect to for getting more varied datasets in OpenML. These have widely varying data formats, so we need both manual selection plus automatic conversion or meta-data extraction to make them easily usable.
-
A collection of sources made by different users
-
Machine learning dataset repositories (mostly already in OpenML)
-
Time series data:
- UCR: http://timeseriesclassification.com/
- Older version: http://www.cs.ucr.edu/~eamonn/time_series_data/
-
Causality related datasets:
-
Deep learning datasets (mostly image data)
-
Extreme classification:
-
MLData (will merge with OpenML in 2018)
-
AutoWEKA datasets:
-
Kaggle public datasets
-
RAMP Challenge datasets
-
Wolfram data repository
-
Data.world
-
Figshare (needs digging, lots of Excel files)
-
KDNuggets list of data sets (meta-list, lots of stuff here):
-
Benchmark Data Sets for Highly Imbalanced Binary Classification
http://www.cs.gsu.edu/~zding/research/imbalance-data/x19data.txt
-
Feature Selection Challenge Datasets
http://www.nipsfsc.ecs.soton.ac.uk/datasets/ http://featureselection.asu.edu/datasets.php
-
BigML's list of 1000+ data sources
http://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/
-
Massive list from Data Science Central.
http://www.datasciencecentral.com/profiles/blogs/data-sources-for-cool-data-science-projects
-
R packages (also see https://github.com/openml/openml-r/issues/185)
-
UTwente Activity recognition datasets:
-
Vanderbilt:
-
Quandl
-
Microarray data:
http://genomics-pubs.princeton.edu/oncology/ http://svitsrv25.epfl.ch/R-doc/library/multtest/html/golub.html
-
Medical data:
http://www.healthdata.gov/
http://homepages.inf.ed.ac.uk/rbf/IAPR/researchers/PPRPAGES/pprdat.htm
http://hcup-us.ahrq.gov/
https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Physician-and-Other-Supplier.html
https://nsduhweb.rti.org/respweb/homepage.cfm
http://orwh.od.nih.gov/resources/policyreports/womenofcolor.asp -
Nature.com Scientific data repositories list
Drafts:
Proposals:
Other: