2Delft University of Technology
[11/2024] Released the project page link.
[10/2024] Code has been released.
[10/2024] Accepted to IEEE BigData 2024
[09/2024] arXiv paper has been released.
Federated learning (FL) has emerged as a prominent method for collaboratively training machine learning models using local data from edge devices, all while keeping data decentralized. However, accounting for the quality of data contributed by local clients remains a critical challenge in FL, as local data are often susceptible to corruption by various forms of noise and perturbations, which compromise the aggregation process and lead to a subpar global model. In this work, we focus on addressing the problem of noisy data in the input space, an under-explored area compared to the label noise. We propose a comprehensive assessment of client input in the gradient space, inspired by the distinct disparity observed between the density of gradient norm distributions of models trained on noisy and clean input data. Based on this observation, we introduce a straightforward yet effective approach to identify clients with low-quality data at the initial stage of FL. Furthermore, we propose a noise-aware FL aggregation method, namely Federated Noise-Sifting (FedNS), which can be used as a plug-in approach in conjunction with widely used FL strategies. Our extensive evaluation on diverse benchmark datasets under different federated settings demonstrates the efficacy of FedNS. Our method effortlessly integrates with existing FL strategies, enhancing the global model’s performance by up to 13.68% in IID and 15.85% in non-IID settings when learning from noisy decentralized data.
- 🔍 Noise Identification: FedNS identifies noisy clients in the first training round (one-shot).
- 🛡️ Resilient Aggregation: A resilient strategy that minimizes the impact of noisy clients, ensuring robust model performance.
- 🔒 Data Confidentiality: Shares only scalar gradient norms to keep data confidential.
We provide the noisy dataset creation on various benchmarks, we shown an example of generating noisy CIFAR10:
- CIFAR10:
cd ./data/cifar10data
python create_cifar10_noisy.py
For benchmark on human annoation errors, you can refer to CIFAR10/100N. For decentralized data generation, please go to folder .\src_fed.
To prepare your experiment, please setup your configuration at the main.py. You can configure the specific federated learning strategy at server.py. You can simply execute the main script them to run the experiment, the results will save as a logs
file.
cd ./scr_fed/cifar10
python main.py
If you have any questions, please contact me via email or open an issue.
The code repository for "Collaboratively Learning Federated Models from Noisy Decentralized Data" (IEEE BigData 2024) in PyTorch. If you use any content of this repo for your work, please cite the following bib entry:
@article{li2024collaboratively,
title={Collaboratively Learning Federated Models from Noisy Decentralized Data},
author={Li, Haoyuan and Funk, Mathias and G{\"u}rel, Nezihe Merve and Saeed, Aaqib},
journal={arXiv preprint arXiv:2409.02189},
year={2024}
}