You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LightGBM comes with the ability to use multiple machines for training. This can be done with the CLI, or with integrations like Spark, Kubeflow Fairing, and Dask (#3515 ).
Today, the docs refer to training with multiple machines as "parallel learning".
I think that this is not quite precise enough, and can lead to some confusion.
LightGBM has at least two types of parallelism:
within one process (shared memory), using multithreading with OpenMP
across multiple processes (possibly on multiple machines, and with distributed data), using either sockets or MPI
I think we should rename this guide to "Distributed Learning" and use the word "distributed" everywhere in the documentation that talks about using multiple machines to accomplish model training.
Wanted to open this request for comment before I start making changes. What do you think?
The text was updated successfully, but these errors were encountered:
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.
LightGBM comes with the ability to use multiple machines for training. This can be done with the CLI, or with integrations like Spark, Kubeflow Fairing, and Dask (#3515 ).
Today, the docs refer to training with multiple machines as "parallel learning".
I think that this is not quite precise enough, and can lead to some confusion.
LightGBM has at least two types of parallelism:
https://lightgbm.readthedocs.io/en/latest/Parallel-Learning-Guide.html#parallel-learning-guide only refers to the second case today.
I think we should rename this guide to "Distributed Learning" and use the word "distributed" everywhere in the documentation that talks about using multiple machines to accomplish model training.
Wanted to open this request for comment before I start making changes. What do you think?
The text was updated successfully, but these errors were encountered: