You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding this feature would allow users to understand which leaves in the trained model's trees their records fall into. This would bring the Dask interface closer to parity with lightgbm.sklearn.
References
When I first attempted this in #3774, I hit some issues related to how LightGBM uses dask.DataFrame.map_partitions() and dask.Array.map_blocks(). These function allow you to apply a function to each partition of a distributed data collection, like X.map(some_func). In some situations, they require you to provide additional information about the shape and data types of the return from some_func().
The results from predict(X, pred_leaf=True) should be an array with shape (num_rows_in_X, num_trees_in_model). Note that for multi-class classification, num_trees will be n_estimators * num_classes.
Closing this in favor of #2302, where we store all feature requests. Anyone is welcome to contribute this feature. Leave a comment on this issue if you'd like to work on it.
Summary
See #3774 (comment) for background.
To close this issue, add tests to https://github.com/microsoft/LightGBM/blob/master/tests/python_package_test/test_dask.py confirming that
.predict(X, pred_leaf=True)
works forDaskLGBMClassifier
andDaskLGBMRegressor
.Motivation
Adding this feature would allow users to understand which leaves in the trained model's trees their records fall into. This would bring the Dask interface closer to parity with
lightgbm.sklearn
.References
When I first attempted this in #3774, I hit some issues related to how LightGBM uses
dask.DataFrame.map_partitions()
anddask.Array.map_blocks()
. These function allow you to apply a function to each partition of a distributed data collection, likeX.map(some_func)
. In some situations, they require you to provide additional information about the shape and data types of the return fromsome_func()
.You can learn more about this in the
map_partitions()
docs: https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.map_partitionsThe results from
predict(X, pred_leaf=True)
should be an array with shape (num_rows_in_X
,num_trees_in_model
). Note that for multi-class classification,num_trees
will ben_estimators * num_classes
.For testing this feature, you may find https://github.com/jameslamb/lightgbm-dask-testing useful.
The text was updated successfully, but these errors were encountered: