-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treeshap hypothesis tests #4671
Conversation
Looks like xgboost segfaulted. Can't reproduce this locally and it doesn't seem strongly related to any of these changes.
|
@RAMitchell I've seen that in another PR or nightly job, so I'm sure it is unrelated to the changes of the PR but it does look like XGB segfaulting, will dig the log |
I tried to reproduce the CI failure locally by running the test in isolation. Couldn't reproduce anything unfortunately. import numpy as np
import pandas as pd
import xgboost as xgb
from cuml.experimental.explainer.tree_shap import TreeExplainer
def test_xgb_toy_categorical():
X = pd.DataFrame({'dummy': np.zeros(5, dtype=np.float32),
'x': np.array([0, 1, 2, 3, 4], dtype=np.int32)})
y = np.array([0, 0, 1, 1, 1], dtype=np.float32)
X['x'] = X['x'].astype("category")
dtrain = xgb.DMatrix(X, y, enable_categorical=True)
params = {"tree_method": "gpu_hist", "eval_metric": "error",
"objective": "binary:logistic", "max_depth": 2,
"min_child_weight": 0, "lambda": 0}
xgb_model = xgb.train(params, dtrain, num_boost_round=1,
evals=[(dtrain, 'train')])
explainer = TreeExplainer(model=xgb_model)
out = explainer.shap_values(X)
ref_out = xgb_model.predict(dtrain, pred_contribs=True)
np.testing.assert_almost_equal(out, ref_out[:, :-1], decimal=5)
np.testing.assert_almost_equal(explainer.expected_value, ref_out[0, -1],
decimal=5)
for i in range(1000):
test_xgb_toy_categorical() |
Have you tried address sanitizer? |
n_targets = draw(st.integers(2, 5)) | ||
else: | ||
n_targets = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So n_targets
means n_classes
in the context of classification? Let's just use n_classes
for this purpose, since it's confusing otherwise. (I was wondering if we were using an unreleased feature of XGBoost)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests will support multi-output regression. I am using n_targets
as a more generic term.
I am wondering if the xgboost CI failure is due to nccl somehow. I've seen this occur in different tests calling xgboost, so the failure doesn't seem related to any particular test, as long as xgb is used. It seems to be occurring more in this PR than in other places in CI, maybe because I increased the frequency of xgboost tests. |
Took @trivialfis's advice and tried compiling cuml with address sanitizer. Turns out we were deleting a base class pointer without a virtual destructor - so the derived class deleter was not called. Hopefully this resolves the issue. |
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #4671 +/- ##
===============================================
Coverage ? 84.04%
===============================================
Files ? 252
Lines ? 20348
Branches ? 0
===============================================
Hits ? 17102
Misses ? 3246
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
Seems to be working? |
@gpucibot merge |
Stacked on #4671. - Remove extra redundant class in python layer. - Simplify the interface between C++ and python using variants. - Fix #4670 by allowing double precision data - Document TreeExplainer - Add interventional shap method - Add shapley interactions and taylor interactions - Promote from experimental - Support sklearn estimator types from xgb/lgbm (i.e. no need to convert to booster before using TreeExplainer) Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - Dante Gama Dessavre (https://github.com/dantegd) URL: #4697
Increased test coverage for TreeExplainer, greatly expanding model types tested. New tests take around 4.8s on my machine. Fixes rapidsai#4352 New bugs found: rapidsai#4663 dmlc/treelite#375 rapidsai#4670 Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4671
Stacked on rapidsai#4671. - Remove extra redundant class in python layer. - Simplify the interface between C++ and python using variants. - Fix rapidsai#4670 by allowing double precision data - Document TreeExplainer - Add interventional shap method - Add shapley interactions and taylor interactions - Promote from experimental - Support sklearn estimator types from xgb/lgbm (i.e. no need to convert to booster before using TreeExplainer) Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4697
Increased test coverage for TreeExplainer, greatly expanding model types tested. New tests take around 4.8s on my machine.
Fixes #4352
New bugs found:
#4663
dmlc/treelite#375
#4670