Skip to content

reactual/datalibrary

Repository files navigation

DataLibrary

An API for better datasets -- https://datalibrary.com

Overview

DataLibrary was created to bring datasets from a range of subjects into a single API. Our primary goal is consistency and ease of use.

For example, take a random selection of datasets:

  • List of Metric Units
  • List of US States
  • List of English Stopwords
  • Air Pollution Measurement Data
  • List of AWS & GCP Data Center Regions
  • Public financial data from 2 different municipalities

Before DataLibrary, you would most likely access these datasets from different sources. Beyond the technical challenges, each provider would typically use different schema patterns, naming conventions, and formatting.

DataLibrary exists not only to bring datasets together into a single source, but also clean and reformat data when possible. For common subjects, data could be combined from several sources to create a new, richer dataset, with fields and metadata carefully renamed for a better experience.

Access

The DataLibrary API will initially be available via GraphQL, with a RESTful HTTP API following. A frontend for searching datasets and other features will be available also.

Copyright Notes

DataLibrary's goal is to make data more accessible. We take licensing and copyrights seriously.

For datasets where a copyright wouldn't apply, DataLibrary will typically host a formatted version of the data directly. This especially applies to common or infrequently changing datasets.

DataLibrary supports datasets that contain copyrights, premium, and paid datasets, when approved by a provider.

A few example strategies:

  • Maintaining our own agreement/terms with a provider.
  • Acting as a proxy where you bring your own license/token, not maintaining a local copy.
  • Providing an API or local library for formatting raw data from a dataset template we have.
  • Acting as a paid, data app store where we provide access to a dataset that generates revenue for a provider.
  • Providing generic utilities for cleaning & working with your own data.

Logo

A project by Reactual