[R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) #6226

jameslamb · 2023-12-05T04:45:35Z

Overview

This PR proposes a fix to the following problems, described in the links above:

the R package using > 2 threads in tests and examples (leading to CRAN rejecting the package)
LightGBM having global side-effects on other OpenMP-using routines in the same process by calling omp_set_num_threads()

Related Discussions

contributes to 4 issues, closes 2, replaces 1 PR (click me)

Follow-up to set explicit number of threads in every OpenMP parallel region #6135
Contributes to Dataset construction uses all threads on the machine #5124
- might fix it, haven't tested
Contributes to [R-package] v4.0.0 CRAN submission issues #5987
- I think this fixes it, but don't want to say that for sure until the package is actually accepted by CRAN)
Contributes to light gbm hangs when loading a model file in subprocess #6137
- (I think)
Contributes to [R-package] Warnings of CRAN Package #6221
Fixes Calling multithreaded functions sets global number of OMP threads #4705
- for more, see LGBM_DatasetCreateFromCSC does not allow thread control #4598 (comment)
Fixes [R-package] tests and examples should not use more than 2 threads #5102
Replaces Keep number of threads in a global variable separate from global OMP config #6152

How I tested this

Ran all of the following on a c5a.4xlarge AWS EC2 instance (16 vCPUs, 32GiB RAM), using Ubuntu 22.04.

How I set that up (click me)

Shelled in and ran the following.

sudo apt-get update
sudo apt-get install --no-install-recommends -y \
    software-properties-common

sudo apt-get install --no-install-recommends -y \
    apt-utils \
    build-essential \
    ca-certificates \
    clang \
    cmake \
    curl \
    git \
    iputils-ping \
    jq \
    libcurl4 \
    libicu-dev \
    libomp-dev \
    libssl-dev \
    libunwind8 \
    lldb \
    locales \
    locales-all \
    netcat \
    unzip \
    zip

# use UTF-8 locale
export LANG="en_US.UTF-8"
sudo update-locale LANG=${LANG}
export LC_ALL="${LANG}"

# set up R environment
export CRAN_MIRROR="https://cran.rstudio.com"
export MAKEFLAGS=-j8
export R_LIB_PATH=~/Rlib
export R_LIBS=$R_LIB_PATH
export PATH="$R_LIB_PATH/R/bin:$PATH"
export R_APT_REPO="jammy-cran40/"
export R_LINUX_VERSION="4.3.1-1.2204.0"

mkdir -p $R_LIB_PATH

mkdir -p ~/.gnupg
echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf
sudo apt-key adv \
    --homedir ~/.gnupg \
    --keyserver keyserver.ubuntu.com \
    --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9

sudo add-apt-repository \
    "deb ${CRAN_MIRROR}/bin/linux/ubuntu ${R_APT_REPO}"

sudo apt-get update
sudo apt-get install \
    --no-install-recommends \
    -y \
        autoconf \
        automake \
        devscripts \
        r-base-core=${R_LINUX_VERSION} \
        r-base-dev=${R_LINUX_VERSION} \
        texinfo \
        texlive-latex-extra \
        texlive-latex-recommended \
        texlive-fonts-recommended \
        texlive-fonts-extra \
        tidy \
        qpdf

# install dependencies
Rscript \
    --vanilla \
    -e "install.packages(c('data.table', 'jsonlite', 'knitr', 'Matrix', 'R6', 'RhpcBLASctl', 'rmarkdown', 'testthat'), repos = '${CRAN_MIRROR}', lib = '${R_LIB_PATH}', dependencies = c('Depends', 'Imports', 'LinkingTo'), Ncpus = parallel::detectCores())"

# use clang to compile packages
mkdir -p ${HOME}/.R
cat << EOF > ${HOME}/.R/Makevars
CC=clang
CXX=clang++
CXX17=clang++
EOF

To be sure I wasn't cheating, confirmed all OpenMP environment variables were unset.

env | grep OMP
# (no results)

Then, built the R package from this branch.

how I did that (click me)

cd ${HOME}/repos/LightGBM
git checkout master
git branch -D r/tighter-thread-control || true
git fetch origin r/tighter-thread-control
git checkout r/tighter-thread-control

sh build-cran-package.sh --no-build-vignettes

First approach: dataset construction

Created a test R script which times construction of a Dataset from a numeric R matrix of shape [10_000, 10_000].

Ran that script with environment variable OMP_NUM_THREADS=16. On this branch, I saw what I'd expect if multithreading is working correctly:

more threads results in a higher ratio of CPU time to elapsed time
runs with num_threads = 1 passed to LightGBM have a {CPU}/{elapsed} <= 1
runs with num_threads = 2 passed to LightGBM have a {CPU}/{elapsed} <= 2

On master, the value of num_threads passed to LightGBM barely affected how much parallelism was used... even for num_threads = 1, I observed {CPU}/{elapsed} > 10.

details (click me)

Created this R script:

cat << EOF > check-multithreading.R
library(data.table)
library(lightgbm)

LGBM_NUM_THREADS <- as.integer(
    commandArgs(trailingOnly = TRUE)
)
if (is.na(LGBM_NUM_THREADS)){
    stop("invoke this script with an integer, like 'Rscript check-multithreading.R 6'")
}

# ensure data.table multithreading isn't used
data.table::setDTthreads(1L)

X <- matrix(rnorm(1e5), ncol=1e5)
y <- rnorm(nrow(X))

tic <- proc.time()
print(tic)
dtrain <- lightgbm::lgb.Dataset(
    data = X
    , label = y
    , params = list(
        max_bins = 128L
        , min_data_in_bin = 5L
        , num_threads = LGBM_NUM_THREADS
        , verbosity = -1L
    )
)
dtrain\$construct()
toc <- proc.time() - tic
print(toc)

ratio <- toc[[1]] / toc[[3]]
print(sprintf("ratio: %f", ratio))

# append to file of traces
cat(
    paste0("  ", LGBM_NUM_THREADS, "  -  ", round(ratio, 4))
    , file = "traces.out"
    , append = TRUE
    , sep = "\n"
)
EOF

Installed the R package

R CMD INSTALL \
  --with-keep.source \
  lightgbm_4.1.0.99.tar.gz

Ran the script like this:

rm -f ./traces.out
for i in 1 1 1 1 1 2 2 2 2 2 6 8 16; do
    OMP_NUM_THREADS=16 \
        Rscript --vanilla ./check-multithreading.R ${i}
done
cat ./traces.out

Ratio of {CPU}/{elapsed} on this branch:

  1  -  0.7534
  1  -  0.7755
  1  -  0.863
  1  -  0.7847
  1  -  0.9388
  2  -  1.349
  2  -  1.4595
  2  -  1.3784
  2  -  1.2886
  2  -  1.2867
  6  -  2.7029
  8  -  3.2346
  16  -  6.9559

Ratio of {CPU}/{elapsed} with latest master (f5b6bd6), I got the following:

  1  -  11.7955
  1  -  11.9402
  1  -  11.4011
  1  -  12.4866
  1  -  13.1563
  2  -  11.3018
  2  -  11.1033
  2  -  13
  2  -  9.7208
  2  -  11.2056
  6  -  10.3423
  8  -  10.3786
  16  -  9.7287

Second approach: `R CMD check`

Ran R CMD check as follows on the built package.

Rscript -e "remove.packages('lightgbm')"

OMP_NUM_THREADS=16 \
_R_CHECK_EXAMPLE_TIMING_THRESHOLD_=0 \
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2.0 \
R --vanilla CMD check \
    --no-codoc \
    --no-manual \
    --no-tests \
    --no-vignettes \
    --run-dontrun \
    --run-donttest \
    --timings \
    ./lightgbm_4.1.0.99.tar.gz

On this branch, no examples show {CPU}/{elapsed} >= 2.0.

timings (click me)

* checking examples ... OK
Examples with CPU (user + system) or elapsed time > 0s
                             user system elapsed
lgb.plot.interpretation     0.476  0.015   0.245
lgb.interprete              0.338  0.011   0.176
lgb.importance              0.237  0.012   0.125
lgb.model.dt.tree           0.215  0.004   0.110
lgb.cv                      0.201  0.008   0.104
saveRDS.lgb.Booster         0.171  0.008   0.090
lgb.plot.importance         0.166  0.000   0.083
lgb.Dataset.create.valid    0.118  0.036   0.076
lgb.load                    0.105  0.020   0.063
readRDS.lgb.Booster         0.119  0.004   0.061
lgb.restore_handle          0.114  0.005   0.062
predict.lgb.Booster         0.106  0.004   0.055
lgb.dump                    0.105  0.004   0.054
lgb.save                    0.105  0.003   0.054
lgb.train                   0.101  0.005   0.053
lgb.configure_fast_predict  0.096  0.008   0.053
lgb.get.eval.result         0.099  0.003   0.051
lgb.Dataset                 0.078  0.020   0.049
get_field                   0.097  0.000   0.049
set_field                   0.092  0.003   0.048
slice                       0.094  0.001   0.047
lgb.Dataset.set.categorical 0.073  0.005   0.039
lgb.Dataset.save            0.073  0.004   0.039
lgb.Dataset.construct       0.072  0.004   0.037
lgb.Dataset.set.reference   0.068  0.003   0.036
dimnames.lgb.Dataset        0.045  0.008   0.043
dim                         0.039  0.012   0.051
lgb.convert_with_rules      0.028  0.004   0.016

On master, several examples show {CPU}/{elapsed} >= 2.0, and those have ratios > 10.0.

timings (click me)

* checking examples ... OK
Examples with CPU (user + system) or elapsed time > 0s
                             user system elapsed
lgb.plot.interpretation     3.536  0.214   0.274
lgb.interprete              2.346  0.114   0.189
lgb.cv                      1.866  0.133   0.125
saveRDS.lgb.Booster         1.542  0.065   0.100
lgb.Dataset.create.valid    1.225  0.132   0.085
lgb.load                    1.043  0.078   0.070
readRDS.lgb.Booster         1.040  0.078   0.070
lgb.configure_fast_predict  0.914  0.065   0.061
lgb.model.dt.tree           0.886  0.065   0.109
lgb.dump                    0.835  0.038   0.058
lgb.Dataset                 0.745  0.058   0.050
slice                       0.745  0.029   0.048
get_field                   0.711  0.046   0.051
set_field                   0.644  0.050   0.050
lgb.Dataset.save            0.570  0.042   0.039
lgb.Dataset.set.reference   0.572  0.037   0.038
lgb.Dataset.set.categorical 0.578  0.022   0.038
lgb.Dataset.construct       0.553  0.038   0.037
lgb.importance              0.520  0.032   0.121
lgb.restore_handle          0.490  0.032   0.061
lgb.save                    0.401  0.024   0.053
lgb.train                   0.395  0.016   0.051
dimnames.lgb.Dataset        0.286  0.037   0.061
lgb.convert_with_rules      0.292  0.015   0.020
lgb.plot.importance         0.273  0.013   0.081
lgb.get.eval.result         0.218  0.036   0.051
predict.lgb.Booster         0.245  0.007   0.054
dim                         0.049  0.000   0.050
Examples with CPU time > 2 times elapsed time
                          user system elapsed  ratio
saveRDS.lgb.Booster      1.542  0.065   0.100 16.070
lgb.load                 1.043  0.078   0.070 16.014
lgb.cv                   1.866  0.133   0.125 15.992
readRDS.lgb.Booster      1.040  0.078   0.070 15.971
lgb.Dataset.create.valid 1.225  0.132   0.085 15.965
lgb.plot.interpretation  3.536  0.214   0.274 13.686
lgb.interprete           2.346  0.114   0.189 13.016

How this improves multithreading control

problem 1: `OMP_NUM_THREADS()` uses unconstrained `omp_get_num_threads()` threads

details (click me)

This block is problematic:

LightGBM/include/LightGBM/utils/openmp_wrapper.h

Lines 22 to 24 in f5b6bd6

    
           #pragma omp parallel 
        
           #pragma omp master 
        
             { ret = omp_get_num_threads(); }

With environment variable OMP_NUM_THREADS=16 set, I think that #pragma omp parallel creates a team of 16 threads, then runs omp_get_num_threads() on the master thread, then presumably releases those 16 threads.

For the small data sizes used in tests and examples, I think that unnecessary parallelized work happening on each call of OMP_NUM_THREADS() is enough to lead to ratios of {CPU}/{elapsed}.

This PR fixes that by replacing it with #pragma omp single, which changes this from "run on the master thread" to "run on any single thread in the current team" (docs link).

problem 2: some LightGBM operations don't have any thread control, others automatically reset LightGBM to "use `omp_get_num_threads()` threads"

details (click me)

For example, GBDT::LoadModelFromString(), which creates a Booster from a text representation (e.g. as is read in from a model file), parallelized some operations over trees:

https://github.com/microsoft/LightGBM/blob/f5b6bd60d9d752c8e5a75b11ab771d0422214bb4/src/boosting/gbdt_model_text.cpp#L555-LL556

But:

it doesn't accept nthreads or similar thread-control arguments
code paths from wrappers like the R and Python packages aren't guaranteed to have hit OMP_SET_NUM_THREADS() prior to calling that

For example, when loading a Booster from a pickle file:

LightGBM/python-package/lightgbm/basic.py

Lines 3460 to 3469 in f5b6bd6

    
           def __setstate__(self, state: Dict[str, Any]) -> None: 
        
               model_str = state.get('_handle', state.get('handle', None)) 
        
               if model_str is not None: 
        
                   handle = ctypes.c_void_p() 
        
                   out_num_iterations = ctypes.c_int(0) 
        
                   _safe_call(_LIB.LGBM_BoosterLoadModelFromString( 
        
                       _c_str(model_str), 
        
                       ctypes.byref(out_num_iterations), 
        
                       ctypes.byref(handle))) 
        
                   state['_handle'] = handle

This PR "solves" that by providing a new mechanism in the C API and public API of the R package to set a process-wide maximum number of threads that LightGBM will use. That's inspired by data.table::setDTthreads() (see, for example, Rdatatable/data.table#5658 (comment)).

It then proposes calling lightgbm::setLGBMthreads(2) in all R-package examples, vignettes, and tests. That should be sufficient to meet CRAN's requirements, while still allowing users of the package to get more parallelism by default.

problem 3: some `{lightgbm}` operations use `{data.table}`, but don't constrain how much multithreading it uses

details (click me)

This is described in detail in Rdatatable/data.table#5658.

I fixed this by running data.table::setDTthreads(1) in all examples, vignettes, and tests.

Notes for Reviewers

I'm sorry this is so large, but unfortunately it was done under considerable duress... CRAN have given us until December 12 to upload a new release (#6221).

I left comments below to call out the main points that I think might be controversial.

References

I consulted all of the following while working through this.

https://docs.oracle.com/cd/E19205-01/819-5270/aewbc/index.html#:~:text=Nested%20parallelism%20can%20be%20enabled,levels%20of%20nested%20parallel%20constructs.
https://stackoverflow.com/a/6934050/3986677
https://princetonuniversity.github.io/PUbootcamp/sessions/parallel-programming/Intro_PP_bootcamp_2018.pdf
https://www.openmp.org/spec-html/5.0/openmpsu35.html#x55-880002.6.1
- how the if and num_threads() clauses are evaluated
- if if() is false, only 1 thread is used
https://www.openmp.org/spec-html/5.0/openmpse23.html#x117-4350002.15
https://stackoverflow.com/a/11884188/3986677
all operations: https://www.openmp.org/wp-content/uploads/OpenMP-4.0-C.pdf
https://curc.readthedocs.io/en/latest/programming/OpenMP-C.html
https://stackoverflow.com/questions/1433204/how-do-i-use-extern-to-share-variables-between-source-files

…enMP side effects (fixes #4705, fixes #5102)

jameslamb · 2023-12-06T04:26:01Z

R-package/R/multithreading.R

+        num_threads
+    )
+    return(invisible(NULL))
+}


These functions are currently just setters and getters for LGBM_MAX_NUM_THREADS threads.

should they be named getMaxLGBMthreads() / setMaxLGBMthreads() or something else with "max" in the name?

I personally like getLGBMthreads() / setLGBMthreads() for consistency with data.table::{get/set}DTthreads(), but could be convinced

Should getLGBMthreads() even be exported in the R package's public API?

I found it useful for testing and thought users might as well, but it'd be easier to add it later than to have to remove it later

and {lightgbm} could use it in its own tests with :::

if we do keep it in the public interface... should getLGBMthreads() actually be a getter for LGBM_MAX_NUM_THREADS? Or should it return an answer to the question "how many threads will e.g. lgb.train() use if I don't pass any thread-control parameters through params"?

jameslamb · 2023-12-06T04:28:21Z

R-package/NAMESPACE

@@ -9,6 +9,7 @@ S3method(print,lgb.Booster)
 S3method(set_field,lgb.Dataset)
 S3method(slice,lgb.Dataset)
 S3method(summary,lgb.Booster)
+export(getLGBMthreads)


I chose not to add similar functions to the Python interface, since:

it's the R package that needs this more urgently

taking back parts of the public API for the Python package is MUCH harder than for the R package, as the Python package is used so much more widely

this PR is already bigger than I'm comfortable with

jameslamb · 2023-12-06T04:43:59Z

R-package/R/lgb.Booster.R

@@ -1346,6 +1354,8 @@ lgb.save <- function(booster, filename, num_iteration = NULL) {
 #' @examples
 #' \donttest{
 #' library(lightgbm)
+#' \dontshow{setLGBMthreads(2L)}
+#' \dontshow{data.table::setDTthreads(1L)}


Per https://cran.r-project.org/doc/manuals/R-exts.html

... \dontshow{} for extra commands for testing that should not be shown to users, but will be run by example()

These \dontshow{} blocks hide this code from users, but ensure it runs when CRAN checks the package.

Thanks to @jangorecki for the suggestion (Rdatatable/data.table#5658 (comment)).

jameslamb · 2023-12-06T04:45:57Z

This is ready for review.

Tagging in some others who might be interested and have opinions about it: @david-cortes @mayer79 @trivialfis @simonpcouch @AlbertoEAF

jameslamb · 2023-12-06T04:50:58Z

include/LightGBM/utils/openmp_wrapper.h

+// this can only be changed by LGBM_SetMaxThreads()
+LIGHTGBM_EXTERN_C int LGBM_MAX_NUM_THREADS;
+
+// this is modified by OMP_SET_NUM_THREADS(), for example
+// by passing num_thread through params
+LIGHTGBM_EXTERN_C int LGBM_DEFAULT_NUM_THREADS;


This makes these process-global variables, and therefore not thread safe.

Like @david-cortes alluded to in the description of #6152.

For example, if you created 2 Booster objects in different threads which had different values of num_threads in Config, one's OMP_SET_NUM_THREADS() call could affect code in the other.

I think that's an acceptable risk for now, in exchange for the other benefits of this PR.

maybe you can try thread_local, but not hurry in this PR.

Thank you! I did see that we have that preprocessor macro set up

LightGBM/include/LightGBM/utils/log.h

Lines 27 to 31 in e797985

#if defined(_MSC_VER)

#define THREAD_LOCAL __declspec(thread)

#else

#define THREAD_LOCAL thread_local

#endif

but didn't test it out. Let's save it for a follow-up PR... I think it'd be ok to release this PR's changes without making this configuration thread-safe.

guolinke · 2023-12-06T07:30:34Z

include/LightGBM/utils/openmp_wrapper.h

+// this can only be changed by LGBM_SetMaxThreads()
+LIGHTGBM_EXTERN_C int LGBM_MAX_NUM_THREADS;
+
+// this is modified by OMP_SET_NUM_THREADS(), for example
+// by passing num_thread through params
+LIGHTGBM_EXTERN_C int LGBM_DEFAULT_NUM_THREADS;


maybe you can try thread_local, but not hurry in this PR.

mayer79 · 2023-12-06T08:36:23Z

LGTM, thanks so much. I would not wait too long with resubmission, to not be under pressure if something fails on the first hand.

src/utils/openmp_wrapper.cpp

jameslamb · 2023-12-07T15:32:55Z

I would not wait too long with resubmission

^ I agree with this.

If there are not any other comments in the next 8 hours or so, I'd like to merge this and try to release a v4.2.0 to CRAN.

(NOTE: I won't cut a full LightGBM v4.2.0 release, just one to CRAN. I think we should continue with the normal process of completing all the steps at #6191 for the rest of that release. I think it's fine for those to be slightly different given the time pressure from CRAN)

jameslamb · 2023-12-07T23:03:08Z

Given the approvals and no other blocking comments, I'm going to merge this as-is. My plan is as follows:

merge this right now
update release v4.2.0 #6191 to include it and everything else on master
re-trigger the valgrind checks (these take around 5 hours to complete)
build a v4.2.0 release of the CRAN-style R package only from that branch, and submit it to CRAN

I'll post updates on #6191.

Thanks so much to everyone involved for the reviews and other contributions to getting this working!

[R-package] [c++] add tighter multithreading control, avoid global Op…

7f0de8f

…enMP side effects (fixes #4705, fixes #5102)

jameslamb added the in progress label Dec 5, 2023

fix gcc -Wmaybe-uninitialized warning

b62e46d

jameslamb added the fix label Dec 5, 2023

jameslamb changed the title ~~WIP: [R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102)~~ WIP: [R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) Dec 5, 2023

jameslamb added 8 commits December 5, 2023 08:38

Update R-package/tests/testthat/test_multithreading.R

651dbc8

Update include/LightGBM/utils/openmp_wrapper.h

600f9ec

clean up files left behind from vignette-building

c931d3c

try not inlining

6bf188b

more extern-ing

a8e666b

inline for the no-OpenMP case

921fecb

export omp.h again

b068106

limit data.table parallelism too

ebf61f1

jameslamb commented Dec 6, 2023

View reviewed changes

revert aliases change

8106f4a

jameslamb changed the title ~~WIP: [R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102)~~ [R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) Dec 6, 2023

jameslamb commented Dec 6, 2023

View reviewed changes

jameslamb marked this pull request as ready for review December 6, 2023 04:44

jameslamb requested review from guolinke, shiyu1994 and jmoralez as code owners December 6, 2023 04:44

jameslamb added awaiting review and removed in progress labels Dec 6, 2023

jameslamb commented Dec 6, 2023

View reviewed changes

This was referenced Dec 6, 2023

[R-package] Warnings of CRAN Package #6221

Closed

[R-package] v4.0.0 CRAN submission issues #5987

Closed

guolinke approved these changes Dec 6, 2023

View reviewed changes

david-cortes reviewed Dec 6, 2023

View reviewed changes

src/utils/openmp_wrapper.cpp Show resolved Hide resolved

Merge branch 'master' into r/tighter-thread-control

af20a53

jmoralez approved these changes Dec 7, 2023

View reviewed changes

jameslamb removed the awaiting review label Dec 7, 2023

jameslamb merged commit 1548b42 into master Dec 7, 2023
41 checks passed

jameslamb deleted the r/tighter-thread-control branch December 7, 2023 23:03

This was referenced Dec 11, 2023

Keep number of threads in a global variable separate from global OMP config #6152

Closed

[Windows, Cpp] LNK2001 - unresolved external symbol OMP_NUM_THREADS #6238

Closed

jameslamb mentioned this pull request Dec 28, 2023

[c++] include OpenMP-control files in MSBuild solution file (fixes #6238) #6251

Merged

david-cortes mentioned this pull request Feb 2, 2024

[R] Don't cap global number of threads dmlc/xgboost#10028

Merged

morokosi mentioned this pull request Mar 30, 2024

remove unnecessary omp single that cause deadlock (fixes #6273) #6394

Merged

jameslamb mentioned this pull request Apr 23, 2024

Dataset construction uses all threads on the machine #5124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) #6226

[R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) #6226

jameslamb commented Dec 5, 2023 •

edited

Loading

jameslamb Dec 6, 2023

jameslamb Dec 6, 2023

jameslamb Dec 6, 2023

jameslamb commented Dec 6, 2023

jameslamb Dec 6, 2023

guolinke Dec 6, 2023

jameslamb Dec 7, 2023

guolinke Dec 6, 2023

mayer79 commented Dec 6, 2023

jameslamb commented Dec 7, 2023

jameslamb commented Dec 7, 2023

	#pragma omp parallel
	#pragma omp master
	{ ret = omp_get_num_threads(); }

	def __setstate__(self, state: Dict[str, Any]) -> None:
	model_str = state.get('_handle', state.get('handle', None))
	if model_str is not None:
	handle = ctypes.c_void_p()
	out_num_iterations = ctypes.c_int(0)
	_safe_call(_LIB.LGBM_BoosterLoadModelFromString(
	_c_str(model_str),
	ctypes.byref(out_num_iterations),
	ctypes.byref(handle)))
	state['_handle'] = handle

	#if defined(_MSC_VER)
	#define THREAD_LOCAL __declspec(thread)
	#else
	#define THREAD_LOCAL thread_local
	#endif

[R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) #6226

[R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) #6226

Conversation

jameslamb commented Dec 5, 2023 • edited Loading

Overview

Related Discussions

How I tested this

First approach: dataset construction

Second approach: R CMD check

How this improves multithreading control

problem 1: OMP_NUM_THREADS() uses unconstrained omp_get_num_threads() threads

problem 2: some LightGBM operations don't have any thread control, others automatically reset LightGBM to "use omp_get_num_threads() threads"

problem 3: some {lightgbm} operations use {data.table}, but don't constrain how much multithreading it uses

Notes for Reviewers

References

jameslamb Dec 6, 2023

Choose a reason for hiding this comment

jameslamb Dec 6, 2023

Choose a reason for hiding this comment

jameslamb Dec 6, 2023

Choose a reason for hiding this comment

jameslamb commented Dec 6, 2023

jameslamb Dec 6, 2023

Choose a reason for hiding this comment

guolinke Dec 6, 2023

Choose a reason for hiding this comment

jameslamb Dec 7, 2023

Choose a reason for hiding this comment

guolinke Dec 6, 2023

Choose a reason for hiding this comment

mayer79 commented Dec 6, 2023

jameslamb commented Dec 7, 2023

jameslamb commented Dec 7, 2023

jameslamb commented Dec 5, 2023 •

edited

Loading

Second approach: `R CMD check`

problem 1: `OMP_NUM_THREADS()` uses unconstrained `omp_get_num_threads()` threads

problem 2: some LightGBM operations don't have any thread control, others automatically reset LightGBM to "use `omp_get_num_threads()` threads"

problem 3: some `{lightgbm}` operations use `{data.table}`, but don't constrain how much multithreading it uses