iirsBenchmark

Benchmark proposed in the paper "Interpretability in Symbolic Regression: a benchmark of Explanatory Methods using the Feynman data set", submited to Genetic Programming and Evolvable Machines.

Paper abstract

In some situations, the interpretability of the machine learning models play a role as important as the model accuracy. This comes from the need of trusting the prediction model, verifying some of their properties or even enforcing such properties to improve fairness. To satisfy this need, many model-agnostic explainers were proposed with the goal of working with black-box. Most of these works are focused on classification models, even though an adaptation to regression models is usually straightforward. Regression task can be explored with techniques considered white-boxes (e.g., linear regression) or gray boxes (e.g., symbolic regression), which can deliver interpretable results. The use of explanation methods in the context of regression - and, in particular, symbolic regression - is studied in this paper, coupled with different explanation methods from the literature. Experiments were performed using 100 physics equations set together with different interpretable and non-interpretable regression methods and popular explanation methods, wrapped in a module and tested through an intensive benchmark. We adapted explanation quality metrics to inspect the performance of explainers for the regression task. The results showed that, for this specific problem domain, the Symbolic Regression models outperformed all the regression models in every quality measure. Among the tested methods, Partial Effects and SHAP presented more stable results while Integrated Gradients was unstable with tree-based models. As a byproduct of this work, we released a Python library for benchmarking explanation methods with regression models. This library will be maintened by expanding it with more explainers and regressors.

Installation

First, you need to clone the repository.

Inside the root, execute the following commands (on linux).

# make sure you have this
sudo apt-get install build-essential

# Creating a conda environment
conda env create -f environment.yml
conda activate iirs-env

# Installing Operon first, it is the only dependence that is not on pypi

# Use gcc-9 (or later)
export CC=gcc-9
export CXX=gcc-9

# clone operon
cd iirsBenchmark/regressors
git clone https://github.com/heal-research/operon
cd operon

# run cmake with options
mkdir build; cd build;
cmake .. -DCMAKE_BUILD_TYPE=Release  -DBUILD_PYBIND=ON -DUSE_OPENLIBM=ON -DUSE_SINGLE_PRECISION=ON -DCERES_TINY_SOLVER=ON

# build
make VERBOSE=1 -j pyoperon

# install python package
make install

# (going back to root and) Executing the local installation 
cd ../../../..
make

Implemented regressors

The following regressors are available in iirsBenchmark:

Regressor	Class name	Type	Original implementation
XGB	XGB_regressor	Tree boosting	Scikit-learn XGB
RF	RF_regressor	Tree bagging	Scikit-learn RF
MLP	MLP_regressor	Neural network	Scikit-learn MLP
SVM	SVM_regressor	Vector machine	Scikit-learn SVM
k-NN	KNN_regressor	Instance method	Scikit-learn KNN
SR with coefficient optmization	Operon_regressor	Symbolic method	Operon framework
SR with IT representation	ITEA_regressor	Symbolic method	ITEA
Linear regression	Linear_regressor	regression analysis	Scikit-learn Linear regression
LASSO regression	Lasso_regressor	regression analysis	Scikit-learn Lasso
Single decision tree	DecisionTree_regressor	Decision tree	Scikit-learn Decision tree

The nomenclature used was <name of the regressor in Pascalcase>_regressor.

All regressors implemented provides a constructor with default values for all parameters, a fit and a predict method. If you are familiar with scikit, their usage should be straight-forward.

from iirsBenchmark.regressors import ITEA_regressor, Linear_regressor

from sklearn import datasets

housing_data = datasets.fetch_california_housing()
X, y = housing_data['data'], housing_data['target']

linear = Linear_regressor().fit(X, y)

# if you want to specify a parameter, it should be made by named arguments.
# there is few exceptions in iirsBenchmark where arguments are positional.
itea = ITEA_regressor(popsize=75).fit(X, y)

print(itea.stochastic_executions)   # True
print(linear.stochastic_executions) # False

print(itea.to_str())   # will print a symbolic equation
print(linear.to_str()) # will print a linear regression equation

The regressors are used just like any scikit-learn regressor, but our implementations extends those classes by adding a few more attributes and methods in the interpretability context:

stochastic_executions: attribute indicating if the regressor have a stochastic behavior;
interpretability_spectrum: attribute with a string indicating if the regressor is considered a white-box, gray-box or black-box;
grid_params: attribute with a dictionary where each key is one parameter of the regressor and the values are lists of possible values considered in the experiments;
to_str(): method that returns a string representation of a fitted regressor (if applicable).

Implemented explainers

Several feature attribution explanatory methods were unified in this package. The available methods are displayed below:

Explainer	Class name	Agnostic	Local	Global	Original implementation
Random explainer	RandomImportance_explainer	Y	Y	Y	Our implementation
Permutation Importance	PermutationImportance_explainer	Y	N	Y	scikit.inspection
Morris Sensitivity	MorrisSensitivity_explainer	Y	N	Y	interpretml
SHapley Additive exPlanations (SHAP)	SHAP_explainer	Y	Y	Y	shap
Shapley Additive Global importancE (SAGE)	SAGE_explainer	Y	N	Y	sage
Local Interpretable Model-agnostic Explanations (LIME)	LIME_explainer	Y	Y	N	lime
Integrated Gradients	IntegratedGradients_explainer	Y	Y	N	Our implementation
Partial Effects (PE)	PartialEffects_explainer	N	Y	Y	Our implementation
Explain by Local Approximation (ELA)	ELA_explainer	Y	Y	N	Our implementation

The naming convention is the same as the regressors, but <name of the explainer in Pascalcase>_explainer.

To explain a fitted regressor (not only the ones provided in this benchmark, but any regressor that implements a predict method), you need to instanciate the explainer, fit it to the same training data used to train the regressor, and then use the methods explain_local and explain_global to obtain feature importance explanations. If the model is not agnostic, fit will raise an exception; and if it does not support local/global explanations, it will also raise an exception when the explain functions are called.

from iirsBenchmark.explainers import SHAP_explainer, PermutationImportance_explainer

# you must pass the regressor as a named argument for every explainer constructor
shap = SHAP_explainer(predictor=itea).fit(X, y)

# Local explanation takes a matrix where each line is an observation, and
# returns a matrix where each line is the feature importance for the respective input.
# Single observations should be reshaped into a 2D array with x.reshape(1, -1).
local_exps  = shap.explain_local(X[5:10, :])
local_exp   = shap.explain_local(X[3].reshape(1, -1))

# Global explanation take more than one sample (ideally the whole train/test data)
# and returns a single global feature importance for each variable.
pe = PermutationImportance_explainer(predictor=itea).fit(X, y)

global_exp = pe.explain_global(X, y)

print(global_exp)

Feynman regressors

As mentioned, this benchmark uses the Feynman equations compiled and provided by [ref].

The feynman equations can be used just as any regressor in the module, but takes as a required argument the data set name which the regressor should refer. Then, the created instance can be used to predict new values, using the physics equations related to the informed data set.

A table of all equations can be found here, where the Filename is the column with possible data set names argument.

Explanation robustness measures

We strongly advise to read the Section 3.1 of our paper to fully understand how this measures work, and also check their implementation in iirsBenchmark.expl_measures. Although we did not propose any of this measures, we have adaptated them when implementing in iirsBenchmark.

Three different explanation measures were implemented:

Stability

The intuition is to measure the degree in which the local explanation changes for a given point compared to its neighbors.

Infidelity

The idea of infidelity is to measure the difference between two terms:

The dot product between a significant perturbation to a given input $X$ we are trying to explain and its explanation, and
The output observed for the perturbed point.

Jaccard Index

The Jaccard Index measures how similar two sets are, by calculating the ratio of its intersection size by its union size.

Usage

To use this measures you need a fitted regressor and explainer, and them only work for local explanations:

from iirsBenchmark import expl_measures

# you need to provide a neighborhood to the observation being evaluated
# with those measures

obs_to_explain = X[3].reshape(1, -1)

neighbors = expl_measures.neighborhood(
    obs_to_explain, # The observation 
    X,              # Training data to calculate the multivariate normal distribution
    factor=0.001,   # spread of the neighbors
    size=30         # number of neighbors to sample
)

expl_measures.stability(
    shap,           # the explainer we want to evaluate
    obs_to_explain, # the observation to explain
    neighbors       # sampled neighbors to evaluate the metric
)

Experiments

The package implements everything we need to create experiments to evaluate interpretability quality and robustness in the regression context.

The experiments used in the paper are in ./experiments.

Contributing

Feel free to contact the developers with suggestions, critics, or questions. You can always raise an issue on GitHub!

References

This package was built upon contributions of many researchers of the XAI field, as well as the scikit-learn and Operon framework for creating and fitting a regressor.

We would like to recognize the importance of their work. To get to know each depencence better, we suggest that you read the original works mentioned below.

Explainers

[SHAP] A Unified Approach to Interpreting Model Predictions;
[SAGE] Understanding Global Feature Contributions With Additive Importance Measures;
[LIME] "Why Should I Trust You?": Explaining the Predictions of Any Classifier;
[Permutation Importance] Random Forests;
[Morris Sensitivity] Factorial Sampling Plans for Preliminary Computational Experiments;
[Integrated Gradients] Axiomatic Attribution for Deep Networks;
[ELA] Explaining Symbolic Regression Predictions;
[Partial Effects (for symbolic regression)] Measuring feature importance of symbolic regression models using partial effects;

Regressors

[Scikit-learn module] Scikit-learn: Machine Learning in Python;
[ITEA] Interaction–Transformation Evolutionary Algorithm for Symbolic Regression;
[Operon] Operon C++: an efficient genetic programming framework for symbolic regression;

Measures

[Stability] Regularizing Black-box Models for Improved Interpretability;
[Infidelity] On the (In)fidelity and Sensitivity of Explanations;
[Jaccard Index] S-LIME: Stabilized-LIME for Model Explanation.

Comments

The development of this research is still active. We plan to extend our study by including more symbolic regression methods. As for the github, we plan to build a documentation page and provide maintence for this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

iirsBenchmark

Paper abstract

Installation

Implemented regressors

Implemented explainers

Feynman regressors

Explanation robustness measures

Stability

Infidelity

Jaccard Index

Usage

Experiments

Contributing

References

Explainers

Regressors

Measures

Comments

Files

README.md

Latest commit

History

README.md

File metadata and controls

iirsBenchmark

Paper abstract

Installation

Implemented regressors

Implemented explainers

Feynman regressors

Explanation robustness measures

Stability

Infidelity

Jaccard Index

Usage

Experiments

Contributing

References

Explainers

Regressors

Measures

Comments