Self-labeling Selective Sampling

This repo contains code for paper Self-labeling Selective Sampling. Repository is organised as follow:

folder data contains script for loading data and stores datasets.
folder plots contains images generated for paper.
folder results contains .npy files with results of experiments.
utils contains various scripts with some code used in experiments.
main directory contains code with experimenets or some helping scripts for generating images, generating tables, runing preliminary studies etc.

All of experiments are implemented in Python.

Preliminaries

To run experiments first download datasets by running script:

bash download_data.sh

Next create and activate conda environment:

conda env create -f environment.yml
conda activate active-learning

Runing experiments

To perform experiments with smaller datasets with various seed size run script:

bash run.sh

To perform experimetns with bigger datasets run:

bash run_big_datasets.sh

Both of these scripts use python script main.py. It is a script with basic code for our experiments: namely it reads proper dataset, split the data, perform initial training with seed datasets and the runs selective sampling on stream. All parameters are passed to this script by commandline. To view all posible commandline parameters run main.py --help. Code from main.py is used in tune_hyperpams.py - the script for hyperparameter tuning. We run experiments 3 times for each random seed and we evaluate multiple hyperparameter values. After that best value is determined automatically and we run experiments 10 times with selected hyperaparamters and for different random seeds (not the same as the ones used for tuning).

Results are stored in the form of the numpy files in results folder. Due to high number of results from our experiments, we split results for different active learning algorithms in different folders. We store in two separate files for each experiment the balanced accuracy obtained in experiments and the iteration number where budget have ended. The name of each file contains information about experiment it was obtained from, namely we use following naming convention: {acc/budget_end}_{base mode name}_{dataset}_seed_{seed size}_budget_{size of budget in experiment}_random_seed_{random seed used in experiment}.npy. So for example file with accuracy obtained with MLP model for wine dataset, seed size 1000, budget 0.3 and random seed 4 will be named: acc_mlp_wine_seed_1000_budget_0.3_random_seed_4.npy.

The latex tables with results for our experiments can be generated with tables.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
data		data
deepAL		deepAL
plots		plots
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
active_learning_strategies.py		active_learning_strategies.py
begin_accuracy.py		begin_accuracy.py
dataset_tables.py		dataset_tables.py
download_data.sh		download_data.sh
environment.yml		environment.yml
hyperparams_found.txt		hyperparams_found.txt
main.py		main.py
plot_base_acc.py		plot_base_acc.py
plot_hyperparams.py		plot_hyperparams.py
plot_incorrect_fraction.py		plot_incorrect_fraction.py
plot_stream.py		plot_stream.py
plots.py		plots.py
prediction_dist.py		prediction_dist.py
preliminary.py		preliminary.py
run.sh		run.sh
run_big_datasets.sh		run_big_datasets.sh
self_labeling_strategies.py		self_labeling_strategies.py
tables.py		tables.py
tune_hyperparamters.py		tune_hyperparamters.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-labeling Selective Sampling

Preliminaries

Runing experiments

About

Releases

Packages

Languages

w4k2/active-learning-data-streams

Folders and files

Latest commit

History

Repository files navigation

Self-labeling Selective Sampling

Preliminaries

Runing experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages