This repository aims to provide a full list of works about dataset distillation (DD) or dataset condensation (DC).
Papers sorted by year: | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 |
Papers in 2023 [Back-to-top]
Author | Title | Type | Task | Dataset | Venue | Supp. Material |
---|---|---|---|---|---|---|
Ruonan Yu et al | Dataset Distillation: A Comprehensive Review | Survey | Multiple tasks | arXiv, Jan., 2023 | ||
Shiye Lei et al | A Comprehensive Survey to Dataset Distillation | Survey | Multiple tasks | arXiv, Jan., 2023 | ||
Noveen Sachdeva et al | Data Distillation: A Survey | Survey | Multiple tasks | arXiv, Jan., 2023 | ||
Yugeng Liu et al | Backdoor Attacks Against Dataset Distillation | Security | FMNIST, CIFAR10, STL10, SVHN | NDSS, 2023 | Code |
Papers in 2022 [Back-to-top]
Author | Title | Type | Task | Dataset | Venue | Supp. Material |
---|---|---|---|---|---|---|
Guang Li et al | Compressed Gastric Image Generation Based on Soft-Label Dataset Distillation for Medical Data Sharing | Soft-Label Distillation | Application: Medical Data Sharing | Gastric X-ray | Computer Methods and Programs in Biomedicine, 2022 | |
Zijia Wang et al | Gift from nature: Potential Energy Minimization for explainable dataset distillation | Potential Energy Minimization | Image Classification | miniImageNet, CUB-200 | ACCV Workshop, 2022 | |
Michael Arbel et al | Non-Convex Bilevel Games with Critical Point Selection Maps | General Optimization | Image Classification | CIFAR-10 | NeurIPS, 2022 | |
Zhiwei Deng et al | Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks | Image Classification | MNIST, SVHN, CIFAR10/100, TinyImageNet | NeurIPS, 2022 | Code | |
Noveen Sachdeva et al | Infinite Recommendation Networks: A Data-Centric Approach | Neural Tangent Kernel | Application: Recommender System | Amazon Magazine, ML-1M, Douban, Netflix | NeurIPS, 2022 | Code |
Dingfang Chen et al | Private Set Generation with Discriminative Information | Application: Private Data Generation | MNIST, FashionMNIST | NeurIPS, 2022 | Code, Poster | |
Justin Cui et al | DC-BENCH: Dataset Condensation Benchmark | Benchmark | Image Classification | NeurIPS, 2022 | Code, Poster | |
Yongchao Zhou et al | Dataset Distillation using Neural Feature Regression | Image Classification | CIFAR100, TinyImageNet, ImageNette, ImageWoof | NeurIPS, 2022 | Code, Slide | |
Songhua Liu et al | Dataset Distillation via Factorization | Image Classification | SVHN, CIFAR10/100 | NeurIPS, 2022 | Code, Poster | |
Noel Loo et al | Efficient Dataset Distillation using Random Feature Approximation | Image Classification | MNIST, FashionMNIST, SVHN, CIFAR-10/100 | NeurIPS, 2022 | Code, Poster | |
Yihan Wu et al | Towards Robust Dataset Learning | Tri-level Optimization | Robust Image Classification | MNIST, CIFAR10, TinyImageNet | arXiv, Nov., 2022 | |
Andrey Zhmoginov et al | Decentralized Learning with Multi-Headed Distillation | Local DD | Application: FL | CIFAR-10/100 | arXiv, Nov., 2022 | |
Jiawei Du et al | Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation | Accumulated Trajectory Matching | Image Classification | arXiv, Nov., 2022 | ||
Justin Cui et al | Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory | Image Classification | CIFAR-10/100, ImageNet-1K | arXiv, Nov., 2022 | ||
Renjie Pi et al | DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics | Application: FL | FMNIST, CIFAR10, CINIC10 | arXiv, Nov., 2022 | ||
Zongwei Wang et al | Quick Graph Conversion for Robust Recommendation | Gradient Matching | Application: Recommender System | Beauty, Alibaba-iFashion, Yelp2018 | arXiv, Oct., 2022 | |
Yulan Chen et al | Learning from Designers: Fashion Compatibility Analysis Via Dataset Distillation | Application: Fashion Analysis | ICIP, 2022 | |||
Yuna Jeong et al | Training data selection based on dataset distillation for rapid deployment in machine-learning workflows | Application: Dataset Selection | Multimedia Tools and Applications, 2022 | |||
Yanlin Zhou et al | Communication-Efficient and Attack-Resistant Federated Edge Learning with Dataset Distillation | MNIST, Landmark, IMDB, etc | Application: FL | IEEE TCC, 2022 | Code | |
Nicholas Carlini et al | No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy" | Application: Privacy | CIFAR-10 | arXiv, Sept., 2022 | ||
Guang Li et al | Dataset Distillation for Medical Dataset Sharing | Trajectory Matching | Application: Medical Data Sharing | COVID-19 Chest X-ray | arXiv, Sept., 2022 | |
Guang Li et al | Dataset Distillation using Parameter Pruning | Parameter Pruning | Image Classification | CIFAR-10/100 | arXiv, Sept., 2022 | |
Ping Liu et al | Meta Knowledge Condensation for Federated Learning | Application: FL | MNIST | arXiv, Sept., 2022 | ||
Dmitry Medvedev et al | Learning to Generate Synthetic Training Data Using Gradient Matching and Implicit Differentiation | Gradient Matching, Implicit Differentiation | Image Classification | MNIST | CCIS, 2022 | Code |
Wei Jin et al | Condensing Graphs via One-Step Gradient Matching | Gradient Matching | Graph Classification | KDD, 2022 | Code | |
Rui Song et al | Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments | Local DD | Application: FL | MNIST, CIFAR10 | arXiv, Aug., 2022 | |
Hae Beom Lee et al | Dataset Condensation with Latent Space Knowledge Factorization and Sharing | Local DD | Image Classification | arXiv, Aug., 2022 | ||
Thi-Thu-Huong Le et al | A Review of Dataset Distillation for Deep Learning | Survey | Image Classification | ICPTS, 2022 | ||
Zixuan Jiang et al | Delving into Effective Gradient Matching for Dataset Condensation | Gradient Matching | Image Classification | MNIST/FashionMNIST, SVHN, CIFAR-10/100. | arXiv, Jul., 2022 | Code |
Yuanhao Xiong et al | FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning | Application: FL | MNIST, CIFAR10/100 | arXiv, Jul., 2022 | ||
Nikolaos Tsilivis et al | Can we achieve robustness from data alone? | KIP | Security | MNIST, CIFAR-10 | arXiv, Jul., 2022 | |
Nadiya Shvai et al | DEvS: Data Distillation Algorithm Based on Evolution Strategy | Evolution Strategy | Image Classification | CIFAR-10 | GECCO, 2022 | |
Mattia Sangermano | Sample Condensation in Online Continual Learning | Gradient Matching | Application: Continual learning | SplitMNIST, SplitFashionMNIST, SplitCIFAR10 | IJCNN, 2022 | Code |
Brian Moser et al | Less is More: Proxy Datasets in NAS approaches | Application: NAS | CVPRW, 2022 | |||
George Cazenavette et al | Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation | Image Classification | CVPRW, 2022 | Code | ||
George Cazenavette et al | Dataset Distillation by Matching Training Trajectories | Trajectory Matching | Image Classification | CIFAR-100, Tiny ImageNet, ImageNet subsets | CVPR, 2022 | Code |
Kai Wang et al | CAFE: Learning to Condense Dataset by Aligning Features | Feature Alignment | Image Classification | MNIST, FashionMNIST, SVHN, CIFAR10/100 | CVPR, 2022 | Code |
Mengyang Liu et al | Graph Condensation via Receptive Field Distribution Matching | Rceptive Field Distribution Matching | Graph Classification | Cora, PubMed, Citeseer, Ogbn-arxiv, Flikcr | arXiv, Jun., 2022 | |
Saehyung Lee et al | Dataset Condensation with Contrastive Signals | Contrastive Learning | Image Classification | SVHN, CIFAR-10/100; Automobile, Terrier, Fish | ICML 2022 | Code |
Jang-Hyun Kim et al | Dataset Condensation via Efficient Synthetic-Data Parameterization | Image Classification | CIFAR-10, ImageNet, Speech Commands | ICML, 2022 | Code | |
Tian Dong et al | Privacy for Free: How does Dataset Condensation Help Privacy? | Application: Privacy | Image Classification | ICML, 2022 | ||
Paul Vicol et al | On Implicit Bias in Overparameterized Bilevel Optimization | General Optimization | Image Classification | MNIST | ICML, 2022 | |
Wei Jin et al | Graph Condensation for Graph Neural Networks | Gradient Matching | Graph Classification | Cora, Citeseer, Ogbn-arxiv; Reddit, Flickr | ICLR, 2022 | Code |
Bo Zhao et al | Synthesizing Informative Training Samples with GAN | GAN | Image Classification | CIFAR-10/100 | arXiv, Apr. 2022 | Code |
Shengyuan Hu et al | FedSynth: Gradient Compression via Synthetic Data in Federated Learning | Application: FL | MNIST, FEMNIST, Reddit | |||
Aminu Musa et al | Learning from Small Datasets: An Efficient Deep Learning Model for Covid-19 Detection from Chest X-ray Using Dataset Distillation Technique | Application: Medical Imaging | Chest X-ray | NIGERCON, 2022 | ||
Seong-Woong Kim et al | Stable Federated Learning with Dataset Condensation | Application: FL | CIFAR-10 | JCSE, 2022 | ||
Robin T. Schirrmeister et al | When less is more: Simplifying inputs aids neural network understanding | Application: Understanding NN | MNIST, Fashion-MNIST, CIFAR10/100, | arXiv, Jan, 2022 | ||
Isha Garg et al | TOFU: Towards Obfuscated Federated Updates by Encoding Weight Updates into Gradients from Proxy Data | Application: FL | arXiv, Jan., 2022 |
Papers in 2021 [Back-to-top]
Author | Title | Type | Task | Dataset | Venue | Supp. Material |
---|---|---|---|---|---|---|
Timothy Nguyen et al | Dataset Distillation with Infinitely Wide Convolutional Networks | Kernel Ridge Regression | Image Classification | MNIST, Fashion-MNIST, CIFAR-10/100, SVHN | NeurIPS, 2021 | Code |
Bo Zhao et al | Dataset Condensation with Distribution Matching | Distribution Matching | Image Classification | MNIST, CIFAR10/100, TinyImageNet | arXiv, Oct., 2021 | Code |
Ilia Sucholutsky et al | Soft-Label Dataset Distillation and Text Dataset Distillation | Label Distillation | Image/Text Classification | MNIST, IMDB | IJCNN, 2021 | Code |
Felix Wiewel et al | Condensed Composite Memory Continual Learning | Gradient Matching | Application: Continual Learning | IJCNN, 2021 | Code | |
Bo Zhao et al | Dataset Condensation with Differentiable Siamese Augmentation | Data Augmentation | Image Classification | MNIST, FashionMNIST, SVHN, CIFAR10/100 | ICML, 2021 | Code, Video |
Timothy Nguyen et al | Dataset Meta-Learning from Kernel Ridge-Regression | Kernel Ridge Regression | Image Classification | MNIST, CIFAR-10 | ICLR, 2021 | Code |
Bo Zhao et al | Dataset Condensation with Gradient Matching | Gradient Matching | Image Classification | CIFAR-10, Fashion-MNIST, MNIST, SVHN, USPS | ICLR, 2021 | Code |
New Properties of the Data Distillation Method When Working with Tabular Data | Simulation | Tabular Classification | LNISA, 2021 | Code | ||
Yongqi Li et al | Data Distillation for Text Classification | Text Classification | arXiv, Apr., 2021 | Code | ||
Ilia Sucholutsky et al | ‘Less Than One’-Shot Learning: Learning N Classes From M<N Samples | Label Distillation | Application: Few-Shot Learning | Similation | AAAI, 2021 | Code |
Papers in 2020 [Back-to-top]
Author | Title | Type | Task | Dataset | Venue | Supp. Material |
---|---|---|---|---|---|---|
Yanlin Zhou et al | Distilled One-Shot Federated Learning | Application: FL | arXiv, Sept., 2020 | |||
Ondrej Bohdal et al | Flexible Dataset Distillation: Learn Labels Instead of Images | Label Distillation | Image Classification | MNIST, CIFAR-10/100, CUB | NeurIPS workshop, 2020 | Code |
Chengeng Huang et al | Generative Dataset Distillation | Generative Adversarial Networks | MNIST | Image Classification | BigCom, 2021 | |
Jack Goetz et al | Federated Learning via Synthetic Data | Bi-level Optimization | Application: FL | arXiv, Aug., 2020 | ||
Guang Li et al | Soft-Label Anonymous Gastric X-Ray Image Distillation | Label Distillation | Application: Medical Data Sharing | X-ray Images | ICIP, 2020 |
Papers in 2019 [Back-to-top]
Author | Title | Type | Task | Dataset | Venue | Supp. Material |
---|---|---|---|---|---|---|
Sam Shleifer et al | Proxy Datasets for Training Convolutional Neural Networks | Application: Proxy Dataset Generation | Imagenette, Imagewoof | arXiv, Jun., 2019 |
Papers in 2018 [Back-to-top]
Author | Title | Type | Task | Dataset | Venue | Supp. Material |
---|---|---|---|---|---|---|
Tongzhou Wang et al | Dataset Distillation | Bi-level Optimization | Image Classification | MNIST, CIFAR-10 | arXiv, Nov., 2018 | Code |