Artifacts of "Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence"

This repository contains the code to run and reproduce the experiments reported in our SIGMOD 2021 paper Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence.

This is a static dump of the code to be uploaded to the ACM website and permenently hosted. It contains the scripts to directly reproduce the results reported in the paper. Please refer to our Github Repository for all the updates.

DivExplorer is a tool for analyzing datasets and finding subgroups of data where a model behaves differently than on the overall data. If you are interested in using DivExplorer, please refer to our repository and the corresponding PyPi package.

For all the details, you can refer to our paper or our project page. For a quick overview, you can refer to this video:

Setting the environment

DivExplorer is implemented in python.

We can firstly set an environment. In following, there are the instructions using conda or venv.

Using conda

# Create the environment

conda create -n divexplorer-exp python=3.6.10

# To activate the environment:

source activate divexplorer-exp

Using venv

# Create a virtualenv folder
mkdir -p ~/venv-environments/divexplorer-exp

# Create a new virtualenv in that folder
python3 -m venv ~/venv-environments/divexplorer-exp

# Activate the virtualenv
source ~/venv-environments/divexplorer-exp/bin/activate

Once the env is activated, we install the dependencies.

Install deps

pip install -r ./requirements.txt

Project structure

├── README.md             <- The top-level README for developers using this project.
│
├── datasets              <- Raw data from third party sources used in this project.
│  ├── dataset.txt        <- File info
│  └── processed          <- Processed datasets.
│
├── divexplorer           <- The source code of the DivExplorer algorithm
│    
├── doc                   <- Documentation with reproducibility report
│
├── requirements.txt      <- The requirements file for reproducing the analysis environment
│                         
├── discretize.py         <- Utils to discretize data
├── import_datasets.py    <- Utils to import the data
├── utils_print.py        <- Utils to output and format the data
│
├── NB1_compas.ipynb      <- Notebook using the COMPAS dataset
├── NB2_adult.ipynb       <- Notebook using the COMPAS dataset
│                 
├── prepare_conda.txt     <- Instrunction to set up the environment via conda
│                         
├── run_experiments.py    <- Run ALL experiments, producing all the figures and tables of the paper
│                         
├── E01_compas.py         <- Run all the experiments for the COMPAS dataset
│                         
├── E02_adult.py          <- Run all the experiments for the adult dataset
│                         
├── E03_artificial.py     <- Run all the experiments for the artificial dataset
│                         
├── E04_redundancy.py     <- Run experiments with redundancy pruning
│                         
├── E05a_compute_performance.py   <- Compute performance results
│                         
├── E05b_plot_performance.py      <- Visualize performance results (require to run E05a first)
│                         
├── E06_stats_dataset.py  <- Output the dataset statistics (Table 4)
│                         
└── E07_survey.py         <- Output the survey results and plot

The scripts process the Adult, Heart, German, Bank UCI datasets and Probublica dataset COMPAS.

Running the experiments

You can refer to the reproducibility report, available in the doc folder, for all the details.

Running all experiments

We run ALL experiments, producing all the figures and tables reported in the paper via:

python run_experiments.py

The results are stored in the ./output folder. Specifically, we will find in ./output/figures all the figures (in pdf format) and in ./output/tables all the tables (in csv format) reported in the paper

Running specific experiments

We can also reproduce specific results.

python E0{exp-name}.py

Running COMPAS experiments

For example, we can run all the experiments associated with the COMPAS dataset with

python E01_compas.py

The script generates all the experiments associated with the COMPAS dataset. Specifically, it generates "table_1", "table_2", "table_3", "figure_1", "figure_2", "figure_3", "figure_5"of the paper.

Running adult experiments

To run all the experiments associated with the adult dataset:

python E02_adult.py

The script generates all the experiments associated with the adult dataset. Specifically, it generates "table_5", "table_6", "figure_8", "figure_9", "figure_11" of the paper.

Citation

If you use the DivExplorer package or this code in your work, please cite:

@inproceedings{pastor2021looking,
  title={Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence},
  author={Pastor, Eliana and de Alfaro, Luca and Baralis, Elena},
  booktitle={Proceedings of the 2021 International Conference on Management of Data},
  url = {https://doi.org/10.1145/3448016.3457284},
  pages={1400--1412},
  year={2021}, 
  numpages = {13} 
}

Contributors

Eliana Pastor, Elena Baralis and Luca de Alfaro.

For any clarification, comments, or suggestion please contact Eliana Pastor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artifacts of "Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence"

Setting the environment

Install deps

Project structure

Running the experiments

Running all experiments

Running specific experiments

Running COMPAS experiments

Running adult experiments

Citation

Contributors

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
datasets		datasets
divexplorer		divexplorer
doc		doc
output		output
E01_compas.py		E01_compas.py
E02_adult.py		E02_adult.py
E03_artificial.py		E03_artificial.py
E04_redundancy.py		E04_redundancy.py
E05a_compute_performance.py		E05a_compute_performance.py
E05b_plot_performance.py		E05b_plot_performance.py
E06_stats_dataset.py		E06_stats_dataset.py
E07_survey.py		E07_survey.py
NB1_compas.ipynb		NB1_compas.ipynb
NB2_adult.ipynb		NB2_adult.ipynb
README.md		README.md
discretize.py		discretize.py
import_datasets.py		import_datasets.py
prepare_conda.txt		prepare_conda.txt
requirements.txt		requirements.txt
run_experiments.py		run_experiments.py
utils_print.py		utils_print.py

elianap/divexplorer_SIGMOD21_experiments

Folders and files

Latest commit

History

Repository files navigation

Artifacts of "Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence"

Setting the environment

Install deps

Project structure

Running the experiments

Running all experiments

Running specific experiments

Running COMPAS experiments

Running adult experiments

Citation

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages