Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] support 'pred_leaf' in predict() #3792

Closed
jameslamb opened this issue Jan 19, 2021 · 1 comment · Fixed by #3919
Closed

[dask] support 'pred_leaf' in predict() #3792

jameslamb opened this issue Jan 19, 2021 · 1 comment · Fixed by #3919

Comments

@jameslamb
Copy link
Collaborator

Summary

See #3774 (comment) for background.

To close this issue, add tests to https://github.com/microsoft/LightGBM/blob/master/tests/python_package_test/test_dask.py confirming that .predict(X, pred_leaf=True) works for DaskLGBMClassifier and DaskLGBMRegressor .

Motivation

Adding this feature would allow users to understand which leaves in the trained model's trees their records fall into. This would bring the Dask interface closer to parity with lightgbm.sklearn.

References

When I first attempted this in #3774, I hit some issues related to how LightGBM uses dask.DataFrame.map_partitions() and dask.Array.map_blocks(). These function allow you to apply a function to each partition of a distributed data collection, like X.map(some_func). In some situations, they require you to provide additional information about the shape and data types of the return from some_func().

You can learn more about this in the map_partitions() docs: https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.map_partitions

The results from predict(X, pred_leaf=True) should be an array with shape (num_rows_in_X, num_trees_in_model). Note that for multi-class classification, num_trees will be n_estimators * num_classes.

For testing this feature, you may find https://github.com/jameslamb/lightgbm-dask-testing useful.

@jameslamb
Copy link
Collaborator Author

Closing this in favor of #2302, where we store all feature requests. Anyone is welcome to contribute this feature. Leave a comment on this issue if you'd like to work on it.

jameslamb added a commit that referenced this issue Feb 7, 2021
…3919)

* fix tests

* fix tests

* fix test comments

* simplify tests

* Apply suggestions from code review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant