You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Single row prediction API takes a write lock on the Booster. This breaks performance when using the same booster in a multi-threaded environment.
Motivation
The use case is a complex algorithm processing millions of rows and sometimes needing to run a prediction.
The computation is already parallelized at the "millions-of-rows" level.
It needs to use the booster to (sometimes) generate a row and run the prediction on it.
However, currently that performs very poorly due to global contention here:
Now the predictor seems to be stored inside the booster in order to be re-usable across calls, to make the single predictions through LGBM_BoosterPredictForMatSingleRow reasonably inexpensive: this was implemented before it became recommended to make that a two-step process by using LGBM_BoosterPredictForMatSingleRowFastInit instead.
HOWEVER, as of the introduction of LGBM_BoosterPredictForMatSingleRowFastInit in #2992, we now have a "SingleRowPredictor" object that is supposed to be instantiated and preserved across calls to predict single rows with LGBM_BoosterPredictForMatSingleRowFast: that is FastConfigHandle. (Probably would have been better to name it SingleRowPredictorHandle as part of the public API but I guess that might be a little late now?)
This means that at the moment, resources necessary for LGBM_BoosterPredictForMatSingleRowFast is are split between two places:
The idea is: put all the resources that need to be initialized once before running single row predictions behind the FastConfigHandle object (aka SingleRowPredictorHandle), and take a write lock on that object, instead of the full Booster, when running predictions.
This way, when processing on multiple threads, each thread can make its call to LGBM_BoosterPredictForMatSingleRowFastInit, then work without contention:
Removes the need for taking a global write lock when predicting in single row mode
Simplifies the booster by removing the single_row_predictor_ array (should it have been named single_row_predictors instead btw? In any case, on SingleRowPredictorHandle aka FastConfigHandle we only need a single Predictor object), and removing the SetSingleRowPredictor, PredictSingleRow functions
Specifically for this, there are two approaches:
Make the LGBM_BoosterPredictForMatSingleRow instantiate its FastConfig at the beginning of the call. This is basically what it used to do but it didn't hold the predictor, so that would probably decrease performance for users that currently should use LGBM_BoosterPredictForMatSingleRowFastInit but use LGBM_BoosterPredictForMatSingleRow instead.
Make the LGBM_BoosterPredictForMatSingleRow keep using a single row predictor stored in the booster object, but that would be distinct from the one used by LGBM_BoosterPredictForMatSingleRowFast. This has the downside of being less clean, because we would need to leave the current single_row_predictor_, SetSingleRowPredictor, PredictSingleRow functions in Booster, however that maintains the current performance of LGBM_BoosterPredictForMatSingleRow for all users.
Fixesmicrosoft#6021
- Store all resources that are reused across single-row predictions in the dedicated `SingleRowPredictor` (aka `FastConfig`)
- Use that instead of resources in the `Booster` when doing single row predictions to avoid having to lock the `Booster` exclusively.
- A FastConfig being alive now takes a shared lock on the booster (it was likely very incorrect to mutate the booster while this object was already built anyway)
Summary
Single row prediction API takes a write lock on the Booster. This breaks performance when using the same booster in a multi-threaded environment.
Motivation
The use case is a complex algorithm processing millions of rows and sometimes needing to run a prediction.
The computation is already parallelized at the "millions-of-rows" level.
It needs to use the booster to (sometimes) generate a row and run the prediction on it.
However, currently that performs very poorly due to global contention here:
LightGBM/src/c_api.cpp
Line 393 in d73c6b5
Description
It seems that this write lock is required due to the fact that the
Predictor
writes inside itself as it predicts (#3771), and that thePredictor
for single row predictions is stored inside theBooster
, unlike thePredictor
for batch prediction functions, which is instantiated locally by thePredict
function, allowing locking to be shared.Now the predictor seems to be stored inside the booster in order to be re-usable across calls, to make the single predictions through
LGBM_BoosterPredictForMatSingleRow
reasonably inexpensive: this was implemented before it became recommended to make that a two-step process by usingLGBM_BoosterPredictForMatSingleRowFastInit
instead.HOWEVER, as of the introduction of
LGBM_BoosterPredictForMatSingleRowFastInit
in #2992, we now have a "SingleRowPredictor" object that is supposed to be instantiated and preserved across calls to predict single rows withLGBM_BoosterPredictForMatSingleRowFast
: that isFastConfigHandle
. (Probably would have been better to name itSingleRowPredictorHandle
as part of the public API but I guess that might be a little late now?)This means that at the moment, resources necessary for
LGBM_BoosterPredictForMatSingleRowFast
is are split between two places:Predictor
s)Config
, the idea that some single row predictor settings have been set on the booster (you better hope that nobody else is callingLGBM_BoosterPredictForMatSingleRowFastInit
with a different config on the same predictor)...)The idea is: put all the resources that need to be initialized once before running single row predictions behind the
FastConfigHandle
object (akaSingleRowPredictorHandle
), and take a write lock on that object, instead of the full Booster, when running predictions.This way, when processing on multiple threads, each thread can make its call to
LGBM_BoosterPredictForMatSingleRowFastInit
, then work without contention:single_row_predictor_
array (should it have been namedsingle_row_predictors
instead btw? In any case, onSingleRowPredictorHandle
akaFastConfigHandle
we only need a singlePredictor
object), and removing theSetSingleRowPredictor
,PredictSingleRow
functionsLGBM_BoosterPredictForMatSingleRow
instantiate itsFastConfig
at the beginning of the call. This is basically what it used to do but it didn't hold the predictor, so that would probably decrease performance for users that currently should useLGBM_BoosterPredictForMatSingleRowFastInit
but useLGBM_BoosterPredictForMatSingleRow
instead.LGBM_BoosterPredictForMatSingleRow
keep using a single row predictor stored in the booster object, but that would be distinct from the one used byLGBM_BoosterPredictForMatSingleRowFast
. This has the downside of being less clean, because we would need to leave the currentsingle_row_predictor_
,SetSingleRowPredictor
,PredictSingleRow
functions inBooster
, however that maintains the current performance ofLGBM_BoosterPredictForMatSingleRow
for all users.cc @AlbertoEAF
The text was updated successfully, but these errors were encountered: