You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling

This repository contains the implementation of DIPS, a data-centric method to improve pseudo-labeling under imperfect/noisy 'labeled' data from the paper "You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling"

DIPS improves a variety of state-of-the-art pseudo-labeling algorithms (semi-supervised learning algorithms) via data-centric insights.

For more details, please read our DMLR paper: You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling.

Installation

Clone the repository
(a) Create a new virtual environment with Python 3.10. e.g:

    virtualenv dips_env

(b) Create a new conda environment with Python 3.10. e.g:

    conda create -n dips_env python=3.10

With the venv or conda env activated, run the following command from the repository directory:

Install the minimum requirements to run DIPS

pip install -r requirements.txt

Link the environment to the kernel:

python -m ipykernel install --user --name=dips_env

Logging

Outputs from scripts can be logged to Weights and Biases - wandb. An account is required and your WANDB_API_KEY and Entity need to be set in wandb.yaml file provided.

Getting started with DIPS

To get started with DIPS one can try the tutorial.ipynb notebook in the root folder

Scripts

To run the tabular experiments one can run the bash scripts found in the scripts folder, with results logged to wandb. For example:

 bash run_real_tabular.sh

Notebooks

To run the notebook experiments one can run any of the Jupyter notebooks (.ipynb) found in the notebooks folder

Computer Vision

Details to run DIPS for Computer Vision tasks (such as FixMatch) can be found in the fixmatch folder. Requirements specific to these experiments are contained therein.

Citing

If you use this code, please cite the associated paper:

@article{
dips2024,
title={You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling},
author={Nabeel Seedat and Nicolas Huynh and Fergus Imrie and Mihaela van der Schaar},
journal={Journal of Data-centric Machine Learning Research},
year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
artifacts		artifacts
data		data
fixmatch		fixmatch
notebooks		notebooks
results		results
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
overview.png		overview.png
requirements-tabular.txt		requirements-tabular.txt
requirements_full.txt		requirements_full.txt
tutorial.ipynb		tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling

Installation

Logging

Getting started with DIPS

Scripts

Notebooks

Computer Vision

Citing

About

Releases

Packages

Contributors 2

Languages

License

seedatnabeel/DIPS

Folders and files

Latest commit

History

Repository files navigation

You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling

Installation

Logging

Getting started with DIPS

Scripts

Notebooks

Computer Vision

Citing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages