Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline Prediction for TreeExplainer and XGBoost is not being computed correctly. #250

Closed
mmschlk opened this issue Oct 23, 2024 · 1 comment · Fixed by #267
Closed
Assignees
Labels
bug Something isn't working explainer All issues that are linked to explainers
Milestone

Comments

@mmschlk
Copy link
Owner

mmschlk commented Oct 23, 2024

I think we might have broken something with XGBoost and TreeExplainer. It seems like the baseline values are not being computed correctly (it's always sitting at zero). Currently there is already a TODO about this in the code.

# TODO: for the current implementation this is correct for other trees this may vary
self.baseline_value = sum(
    [treeshapiq.empty_prediction for treeshapiq in self._treeshapiq_explainers]
)

I think this needs to be resolved either in the conversion from XGBoost models into TreeModels or in the Explainer. If it can be resolved in the conversion this would definetly be the preferred fix.

Code to reproduce this:

from xgboost import XGBRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
import shapiq
x_data, y_data = shapiq.datasets.load_california_housing(to_numpy=True)
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2, random_state=42)
models = [
    XGBRegressor(random_state=42),
    DecisionTreeRegressor(max_depth=8, random_state=42),
    RandomForestRegressor(n_estimators=10, max_depth=8, random_state=42)
]
for model in models:
    print(f"{model.__class__.__name__}:")
    model.fit(x_train, y_train)
    explainer = shapiq.TreeExplainer(model=model, max_order=1, index="SV")
    sv = explainer.explain(x=x_test[0])
    baseline_value = round(sv.baseline_value, 3)
    sv = sv.get_n_order(min_order=1, order=1)
    sum_sv = round(sum(sv.values), 3)
    sum_sv_baseline = round(sum_sv + baseline_value, 3)
    y_pred = round(float(model.predict(x_test[0].reshape(1, -1))[0]), 3)
    print(f"Predicted Value: {y_pred}")
    print(f"Sum of SV + Baseline: {sum_sv_baseline}")
    print(f"Sum of SV: {sum_sv}")
    print(f"Baseline Value: {baseline_value}\n")

Produces:

XGBRegressor:
Predicted Value: 0.594
Sum of SV + Baseline: -1.477
Sum of SV: -1.477
Baseline Value: -0.0

DecisionTreeRegressor:
Predicted Value: 0.688
Sum of SV + Baseline: 0.689
Sum of SV: -1.383
Baseline Value: 2.072

RandomForestRegressor:
Predicted Value: 0.681
Sum of SV + Baseline: 0.681
Sum of SV: -1.391
Baseline Value: 2.072

For the decision tree and the random forest the predicted value and the sum of the SVs and the baseline are equal. This is to be expected. For the XGBoost model this is not the case.

@mmschlk mmschlk added the bug Something isn't working label Oct 23, 2024
@mmschlk mmschlk moved this to 📋 Backlog in shapiq development Oct 23, 2024
@mmschlk mmschlk added the explainer All issues that are linked to explainers label Oct 23, 2024
@mmschlk mmschlk added this to the v1.1.0 milestone Oct 23, 2024
@hbaniecki
Copy link
Collaborator

SIs don't sum to the model's prediction? See

import shapiq
# load data
X, y = shapiq.load_california_housing(to_numpy=True)
# train a model
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X, y)
# set up an explainer with k-SII interaction values up to order 4
explainer = shapiq.TabularExplainer(
    model=model,
    data=X,
    index="k-SII",
    max_order=2
)
# explain the model's prediction for the first sample
interaction_values = explainer.explain(X[0], budget=256)
# analyse interaction values
interaction_values.plot_force()

# vs
model.predict(X[[0]])

@mmschlk mmschlk moved this from 📋 Backlog to 🏗 In progress in shapiq development Nov 6, 2024
@mmschlk mmschlk mentioned this issue Nov 7, 2024
mmschlk added a commit that referenced this issue Nov 7, 2024
* fixes #250 and extents tests around xgb models

* integrates intercept of xgb models directly into values of the TreeModel
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in shapiq development Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working explainer All issues that are linked to explainers
Projects
Status: ✅ Done
3 participants