Skip to content

Latest commit

 

History

History
106 lines (76 loc) · 4.62 KB

README.md

File metadata and controls

106 lines (76 loc) · 4.62 KB

awesome-synthetic-data

Awesome

A curated list of resources dedicated to Synthetic Data

If you want to contribute to this list (please do), send me a pull request or contact me @AlexWatson405. Also, a listed repository should be deprecated if:

  • Repository's owner explicitly says that "this library is not maintained".
  • Not committed for a long time (2~3 years).

Contents

Research Summaries and Trends

Back to Top

Tutorials

Back to Top

Reading Content

Back to Top

Introductions and Guides to Synthetic Data

Blogs and Newsletters

Videos and Online Courses

Videos and Online Courses

Back to Top

Libraries

Open Source Generative Synthetic Data Models, Libraries and Frameworks | Back to Top

Text, Tabular and Time-Series

  • gretel-synthetics - Generative models for structured and unstructured text, tabular, and multi-variate time-series data featuring differentially private learning.
  • SDV - Synthetic Data Generator for tabular, relational, and time series data.
  • Synthea - Synthetic Patient Population Simulator.
  • ydata-synthetic - Synthetic structured data generators.

Images

Audio

  • Jukebox - OpenAI's Jukebox- A Generative Model for Music.

Simulation

  • AirSim - AirSim is a simulator for drones, cars and more, built on Unreal and Unity engines.
  • Nvidia Dataset Synthesizer - NDDS is a UE4 plugin from NVIDIA to empower computer vision researchers to export high-quality synthetic images with metadata.
  • OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
  • Unity Perception Perception toolkit for sim2real training and validation in Unity.

Academic Papers

Back to Top

Language Models

Generative Adversarial Networks (GANs)

Diffusion Models

  • Generative Modeling by Estimating Gradients of the Data Distribution (2021) Yang Song [pdf]
  • Diffusion Models are Autoencoders S. Dielman (2021) [pdf]
  • Deep Unsupervised Learning using Nonequilibrium Thermodynamics (2015) J Sohl-Dickstein et al. [pdf]

Fair AI

Services

Synthetic Data as API with higher level functionality such model training, fine-tuning, and generation | Back to Top

Prominent Synthetic Data Research Labs

Back to Top

Datasets

Back to Top

  • HuggingFace Datasets - Library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks.
  • Google Cloud Public Datasets - Publicly available and free machine learning and analytics datasets.
  • Kaggle Datasets - Data science and machine learning datasets.
  • /r/datasets - A place to share, find, and discuss Datasets.

License

License - CC0