-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] efficiency improvements #2791
Comments
I have a suggestion but I'm not sure it's in-scope. It concerns improving efficiency during the prediction speed of trees. This in turn could improve the overall training time because gradient boosting requires asking the weak models to predict gradients. I'm not familiar with the tree implementation in LightGBM. The intuitive way of having two pointers that refer to the left and right childs doesn't seem to be the best format for evaluating a tree. Basically, it seems that there are two alternative ways:
I thought about suggesting this after having read this great blog post by Andrew Tulloch, which includes some in-depth benchmarks. I don't know if this is something LightGBM already integrates. Also I'm not well-versed in C++ so I can't help on that side. However these solutions seem rather simple to implement. |
Apart from the possibility to use CUDA instead of OpenCL for GPU trees (even though it would reduce compatibility), xgboost has made a lot of progress using Dask for distributed training (in fact, I believe that other experimental distributed training techniques they had are now deprecated). I believe someone started a project fro smoother integration between LightGBM and Dask, but I am unable to find it right now... As for GPU, I think that guys at Rapids (Nvidia) are looking forward to developing more CUDA implementations for different ML algorithms (just like they have done with xgboost, as they have rewritten the whole gpu tree learner in the library). Maybe contacting them could be a good idea. |
It's not something we officially maintain or contribute to yet, but I agree it could be a useful thing for us to help with. |
@MaxHalford Thanks very much. Actually, LightGBM is able to convert tree model to c++ codes. |
@guolinke I think the trick still works. You just need to maintain two arrays that indicate the left and right child of each node. That way you can "walk" to the next node with something like: if x[features[node]] < thresholds[node]:
node = left[node]
else:
node = right[node] |
Thank you @julioasotodv . |
@MaxHalford , LightGBM/include/LightGBM/tree.h Lines 572 to 584 in 9b26373
|
Cheers @guolinke. I don't believe there's much improvement to bring from that side then. How about the method that is discusses in this scikit-learn issue? Is it already implemented? |
@MaxHalford i quick through the PR. I think it is related to python-side histogram implementation, and our cpp implementation is already optimized for both speed and memory. And we have LRU histogram pool to further reduce memory cost. |
For the record, LightGBM is 1.5x to 2x faster than the master branch of scikit-learn in sequential mode at the moment. This difference reduces with many threads because of the diminishing returns of parallelism after 8 to 16 CPU threads but that depends on the shape of the dataset, number of nodes... I haven't investigated yet why LightGBM is so much faster in sequential mode yet. It wasn't the case 6 months ago if I remember correctly. |
@ogrisel the sequential mode is |
I just set the
Yes.
Ah thanks that's probably it then. Indeed the Higgs boson benchmark I used has not that many features. |
I've just added a new efficiency-related feature: #3369 If anyone following along here has experience with Apache Arrow, please add any context you think could help on that issue, or let me know if I've made any mistakes. |
I've started a discussion at dask/community#104 about the possibility of bringing
@StrikerRUS @guolinke I've been working with Dask a lot and would be willing to take responsibility for maintenance work on this. I agree with the decisions reached in those two |
@jameslamb Awesome proposal! |
Thanks!
Dask is a thing that allows for horizontal scaling across multiple machines, but it's also an API for expressing Python work that could be parallelized (even if that parallelization is all on one machine). So, for example, people often use Dask Dataframes as a way to work with larger-than-memory data frames on a single machine. You write code that closely follows the So if we follow that proposal, you'd be able to use a Dask-ified version of
I don't think we'd have to do this for this to be successful. The One of the main design principles of Dask is that it scales from your laptop to a distributed cluster with minimal code changes...that means that we can test against a Issues that are specific to the multi-node setting might be raised by users and have to be addressed here with manual testing, but that is already the current state of LightGBM-on-Dask with |
@jameslamb Thanks a lot for the detailed response!
OK, this is great! |
Thanks! @guolinke what do you think? If we move forward with this, I'll ask the |
awesome! sounds good to me |
Hi, I hope this is the right place to post this, but I was reading through the original LightGBM paper, and it mentions a greedy coloring scheme in which to do the (I think) one of the main optimizations that LightGBM, the feature bundling. I was wondering why a greedy method was chosen to do this instead of some other targeted approach (I think the paper mentions that bundling works well in the sparse matrix case and I believe there are coloring algorithms that specifically target sparse graphs and have more provable guarantees than a greedy method would). Sorry if this has already been asked, but has there been any work on using a different algorithm for this bundling process? I feel that this might be worth looking into, but I wasn't able to find in the code where the bundling process occurs (if it exists currently), to see if I could 'plug in' another algorithm and see if it would give a noticeable improvement. |
Is it possible to lay out the algorithm in this case of row-wise histogram computation (pointing to literature, add code comments, writing a blog post, ...)? I find it very difficult to follow the logic reading the code. |
This is to call the efficiency improvement for LightGBM, include but not limited to:
If you have any ideas, you can discuss with us here, and open the corresponding issues/pull requests if needed.
The text was updated successfully, but these errors were encountered: