-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Quick question about num of thread #4192
Comments
Thanks for using LightGBM! Assuming that there are not other processes making heavy use of the available CPUs, you will get the best performance by setting You could use something like this to test the relative speedup from different settings of library(lightgbm)
library(microbenchmark)
library(nycflights13)
data(flights, package = "nycflights13")
flights <- as.data.frame(flights)
dtrain <- lgb.Dataset(
as.matrix(
flights[, c("year", "sched_dep_time", "distance", "hour", "minute")]
)
, label = flights[, "dep_delay"]
, free_raw_data = FALSE
, max_bin = 350
)
num_cores <- parallel::detectCores()
for (num_thread in c(num_cores - 1, num_cores)) {
print(paste0("num_thread: ", num_thread))
print(
microbenchmark::microbenchmark({
lgb.train(
params = list(
num_thread = num_thread
, objective = "regression_l2"
, num_leaves = 31L
, max_depth = 8L
, learning_rate = 0.01
, min_data_in_leaf = 1
)
, data = dtrain
, nrounds = 1000L
, verbose = -1L
)
}, times = 5, unit = "s")
)
} I installed
Your specific results will vary based on your specific dataset and the other learning parameter values you set. |
Thank you for the clarification! Just want to make it clear, we need to use library(lightgbm)
#> Loading required package: R6
library(microbenchmark)
library(nycflights13)
data(flights, package = "nycflights13")
flights <- as.data.frame(flights)
dtrain <- lgb.Dataset(
as.matrix(
flights[, c("year", "sched_dep_time", "distance", "hour", "minute")]
)
, label = flights[, "dep_delay"]
, free_raw_data = FALSE
, max_bin = 350
)
num_cores <- parallel::detectCores(logical = FALSE)
for (num_thread in c(num_cores, num_cores * 2)) {
print(paste0("num_thread: ", num_thread))
print(
microbenchmark::microbenchmark({
lgb.train(
params = list(
num_thread = num_thread
, objective = "regression_l2"
, num_leaves = 31L
, max_depth = 8L
, learning_rate = 0.01
, min_data_in_leaf = 1
)
, data = dtrain
, nrounds = 1000L
, verbose = -1L
)
}, times = 5, unit = "s")
)
}
#> [1] "num_thread: 10"
#> Unit: seconds
#> expr
#> { lgb.train(params = list(num_thread = num_thread, objective = "regression_l2", num_leaves = 31L, max_depth = 8L, learning_rate = 0.01, min_data_in_leaf = 1), data = dtrain, nrounds = 1000L, verbose = -1L) }
#> min lq mean median uq max neval
#> 2.522868 2.546463 2.593207 2.563418 2.646491 2.686795 5
#> [1] "num_thread: 20"
#> Unit: seconds
#> expr
#> { lgb.train(params = list(num_thread = num_thread, objective = "regression_l2", num_leaves = 31L, max_depth = 8L, learning_rate = 0.01, min_data_in_leaf = 1), data = dtrain, nrounds = 1000L, verbose = -1L) }
#> min lq mean median uq max neval
#> 4.536631 4.673258 4.67483 4.682869 4.687078 4.794311 5 Created on 2021-04-18 by the reprex package (v1.0.0) |
Ah yes, you are absolutely right! Are you interested in contributing a change to the documentation? I think others would benefit from that note. It would just be updating LightGBM/R-package/R/lightgbm.R Lines 95 to 97 in 7ea2bc4
And then re-generating the documentation files with commands like this: sh build-cran-package.sh
R CMD INSTALL --with-keep.source lightgbm_*.tar.gz
cd R-package
Rscript -e "roxygen2::roxygenize(load = 'installed')" |
@jameslamb Sure! I will do that, thanks! |
Great, thanks so much! Let me know if you run into any issues. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Hi, Thank you for making lightGBM in R!
I am using LightGBM in R and have a quick question about the
num_thread
.According to the manual, the number of threads is the physical core of CPU. But usually what I have seen in my R code, set the num thread is equal to num_thread - 1 such as
So if we have 4 core, use 3 for parallel and 1 for the controller. Is this applied to LightGBM too? So if I have 4 physical core of CPU and set the number of thread as 3?
The text was updated successfully, but these errors were encountered: