A curated list of resources dedicated to Synthetic Data
If you want to contribute to this list (please do), send me a pull request or contact me @AlexWatson405. Also, a listed repository should be deprecated if:
- Repository's owner explicitly says that "this library is not maintained".
- Not committed for a long time (2~3 years).
- Research Summaries and Trends
- Tutorials
- Libraries
- Academic Papers
- Services
- Prominent Synthetic Data Research Labs
- Datasets
Introductions and Guides to Synthetic Data
Blogs and Newsletters
Videos and Online Courses
Open Source Generative Synthetic Data Models, Libraries and Frameworks | Back to Top
- gretel-synthetics - Generative models for structured and unstructured text, tabular, and multi-variate time-series data featuring differentially private learning.
- SDV - Synthetic Data Generator for tabular, relational, and time series data.
- Synthea - Synthetic Patient Population Simulator.
- ydata-synthetic - Synthetic structured data generators.
- Contrastive Unpaired Translation - Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan.
- StyleGAN 3 - Official PyTorch implementation of StyleGAN3 from NeurIPS 2021.
- Jukebox - OpenAI's Jukebox- A Generative Model for Music.
- AirSim - AirSim is a simulator for drones, cars and more, built on Unreal and Unity engines.
- Nvidia Dataset Synthesizer - NDDS is a UE4 plugin from NVIDIA to empower computer vision researchers to export high-quality synthetic images with metadata.
- OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
- Unity Perception Perception toolkit for sim2real training and validation in Unity.
- Generative Modeling by Estimating Gradients of the Data Distribution (2021) Yang Song [pdf]
- Diffusion Models are Autoencoders S. Dielman (2021) [pdf]
- Deep Unsupervised Learning using Nonequilibrium Thermodynamics (2015) J Sohl-Dickstein et al. [pdf]
Synthetic Data as API with higher level functionality such model training, fine-tuning, and generation | Back to Top
- HuggingFace Datasets - Library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks.
- Google Cloud Public Datasets - Publicly available and free machine learning and analytics datasets.
- Kaggle Datasets - Data science and machine learning datasets.
- /r/datasets - A place to share, find, and discuss Datasets.
License - CC0