Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add weighted K-Means sampling for SHAP #4051

Merged
merged 11 commits into from
Jul 28, 2021

Conversation

Nanthini10
Copy link
Contributor

@Nanthini10 Nanthini10 commented Jul 13, 2021

Adding sampling method for SHAP using k-means, adapted from https://github.com/slundberg/shap/blob/9411b68e8057a6c6f3621765b89b24d82bee13d4/shap/utils/_legacy.py

Moving the code from interpret-community package for easier maintenance.

Chose not to add comparison with SHAP as it will add a dependency to SHAP not sure if we want that.

Closes #4000

@Nanthini10 Nanthini10 requested a review from a team as a code owner July 13, 2021 15:57
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jul 13, 2021
@Nanthini10 Nanthini10 added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change feature request New feature or request labels Jul 13, 2021
python/cuml/explainer/sampling.py Outdated Show resolved Hide resolved
python/cuml/explainer/sampling.py Outdated Show resolved Hide resolved
python/cuml/explainer/sampling.py Outdated Show resolved Hide resolved
python/cuml/explainer/sampling.py Outdated Show resolved Hide resolved
@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Jul 19, 2021
@Nanthini10 Nanthini10 requested a review from dantegd July 26, 2021 20:52
@Nanthini10 Nanthini10 added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Author Waiting for author to respond to review labels Jul 26, 2021
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@c9abba1). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #4051   +/-   ##
===============================================
  Coverage                ?   85.80%           
===============================================
  Files                   ?      232           
  Lines                   ?    18314           
  Branches                ?        0           
===============================================
  Hits                    ?    15714           
  Misses                  ?     2600           
  Partials                ?        0           
Flag Coverage Δ
dask 48.12% <0.00%> (?)
non-dask 78.31% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9abba1...7b6c472. Read the comment docs.

Comment on lines +68 to +92
if output_dtype == cudf.DataFrame:
group_names = X.columns
X = X.values
elif output_dtype == cudf.Series:
group_names = X.name
X = X.values.reshape(-1, 1)
elif output_dtype == pd.DataFrame:
group_names = X.columns
X = cp.array(X.values)
elif output_dtype == pd.Series:
group_names = X.name
X = cp.array(X.values.reshape(-1, 1))
else:
# it's either numpy, cupy or numba
if output_dtype == cuda.devicearray.DeviceNDArrayBase:
X = cp.array(X)
elif output_dtype == np.ndarray:
X = cp.array(X)
try:
# more than one column
group_names = [str(i) for i in range(X.shape[1])]
except IndexError:
# one column
X = X.reshape(-1, 1)
group_names = ['0']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code probably can be simplified further, but we can do that as a follow up PR for 21.10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue: #4121

@dantegd
Copy link
Member

dantegd commented Jul 28, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6242984 into rapidsai:branch-21.08 Jul 28, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Adding sampling method for SHAP using k-means, adapted from     https://github.com/slundberg/shap/blob/9411b68e8057a6c6f3621765b89b24d82bee13d4/shap/utils/_legacy.py 

Moving the code from interpret-community package for easier maintenance. 

Chose not to add comparison with SHAP as it will add a dependency to SHAP not sure if we want that.

Closes rapidsai#4000

Authors:
  - Nanthini (https://github.com/Nanthini10)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4051
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add kmeans sampling method for SHAP
3 participants