Skip to content

elianap/divexplorer_SIGMOD21_experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Artifacts of "Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence"

SIGMOD Latest PyPI version Github repository

This repository contains the code to run and reproduce the experiments reported in our SIGMOD 2021 paper Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence.

This is a static dump of the code to be uploaded to the ACM website and permenently hosted. It contains the scripts to directly reproduce the results reported in the paper. Please refer to our Github Repository for all the updates.

DivExplorer is a tool for analyzing datasets and finding subgroups of data where a model behaves differently than on the overall data. If you are interested in using DivExplorer, please refer to our repository and the corresponding PyPi package.

For all the details, you can refer to our paper or our project page. For a quick overview, you can refer to this video:

DivExplorer short video

Setting the environment

DivExplorer is implemented in python.

We can firstly set an environment. In following, there are the instructions using conda or venv.

Using conda

# Create the environment

conda create -n divexplorer-exp python=3.6.10

# To activate the environment:

source activate divexplorer-exp

Using venv

# Create a virtualenv folder
mkdir -p ~/venv-environments/divexplorer-exp

# Create a new virtualenv in that folder
python3 -m venv ~/venv-environments/divexplorer-exp

# Activate the virtualenv
source ~/venv-environments/divexplorer-exp/bin/activate

Once the env is activated, we install the dependencies.

Install deps

pip install -r ./requirements.txt

Project structure

├── README.md             <- The top-level README for developers using this project.
│
├── datasets              <- Raw data from third party sources used in this project.
│  ├── dataset.txt        <- File info
│  └── processed          <- Processed datasets.
│
├── divexplorer           <- The source code of the DivExplorer algorithm
│    
├── doc                   <- Documentation with reproducibility report
│
├── requirements.txt      <- The requirements file for reproducing the analysis environment
│                         
├── discretize.py         <- Utils to discretize data
├── import_datasets.py    <- Utils to import the data
├── utils_print.py        <- Utils to output and format the data
│
├── NB1_compas.ipynb      <- Notebook using the COMPAS dataset
├── NB2_adult.ipynb       <- Notebook using the COMPAS dataset
│                 
├── prepare_conda.txt     <- Instrunction to set up the environment via conda
│                         
├── run_experiments.py    <- Run ALL experiments, producing all the figures and tables of the paper
│                         
├── E01_compas.py         <- Run all the experiments for the COMPAS dataset
│                         
├── E02_adult.py          <- Run all the experiments for the adult dataset
│                         
├── E03_artificial.py     <- Run all the experiments for the artificial dataset
│                         
├── E04_redundancy.py     <- Run experiments with redundancy pruning
│                         
├── E05a_compute_performance.py   <- Compute performance results
│                         
├── E05b_plot_performance.py      <- Visualize performance results (require to run E05a first)
│                         
├── E06_stats_dataset.py  <- Output the dataset statistics (Table 4)
│                         
└── E07_survey.py         <- Output the survey results and plot

The scripts process the Adult, Heart, German, Bank UCI datasets and Probublica dataset COMPAS.

Running the experiments

You can refer to the reproducibility report, available in the doc folder, for all the details.

Running all experiments

We run ALL experiments, producing all the figures and tables reported in the paper via:

python run_experiments.py

The results are stored in the ./output folder. Specifically, we will find in ./output/figures all the figures (in pdf format) and in ./output/tables all the tables (in csv format) reported in the paper

Running specific experiments

We can also reproduce specific results.

python E0{exp-name}.py

Running COMPAS experiments

For example, we can run all the experiments associated with the COMPAS dataset with

python E01_compas.py

The script generates all the experiments associated with the COMPAS dataset. Specifically, it generates "table_1", "table_2", "table_3", "figure_1", "figure_2", "figure_3", "figure_5"of the paper.

Running adult experiments

To run all the experiments associated with the adult dataset:

python E02_adult.py

The script generates all the experiments associated with the adult dataset. Specifically, it generates "table_5", "table_6", "figure_8", "figure_9", "figure_11" of the paper.

Citation

If you use the DivExplorer package or this code in your work, please cite:

@inproceedings{pastor2021looking,
  title={Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence},
  author={Pastor, Eliana and de Alfaro, Luca and Baralis, Elena},
  booktitle={Proceedings of the 2021 International Conference on Management of Data},
  url = {https://doi.org/10.1145/3448016.3457284},
  pages={1400--1412},
  year={2021}, 
  numpages = {13} 
}

Contributors

Eliana Pastor, Elena Baralis and Luca de Alfaro.

For any clarification, comments, or suggestion please contact Eliana Pastor

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published