Fix single row prediction performance in a multi-threaded environment #6021

Ten0 · 2023-08-04T21:06:08Z

Summary

Single row prediction API takes a write lock on the Booster. This breaks performance when using the same booster in a multi-threaded environment.

Motivation

The use case is a complex algorithm processing millions of rows and sometimes needing to run a prediction.
The computation is already parallelized at the "millions-of-rows" level.
It needs to use the booster to (sometimes) generate a row and run the prediction on it.
However, currently that performs very poorly due to global contention here:

LightGBM/src/c_api.cpp

Line 393 in d73c6b5

UNIQUE_LOCK(mutex_)

Description

It seems that this write lock is required due to the fact that the Predictor writes inside itself as it predicts (#3771), and that the Predictor for single row predictions is stored inside the Booster, unlike the Predictor for batch prediction functions, which is instantiated locally by the Predict function, allowing locking to be shared.

Now the predictor seems to be stored inside the booster in order to be re-usable across calls, to make the single predictions through LGBM_BoosterPredictForMatSingleRow reasonably inexpensive: this was implemented before it became recommended to make that a two-step process by using LGBM_BoosterPredictForMatSingleRowFastInit instead.
HOWEVER, as of the introduction of LGBM_BoosterPredictForMatSingleRowFastInit in #2992, we now have a "SingleRowPredictor" object that is supposed to be instantiated and preserved across calls to predict single rows with LGBM_BoosterPredictForMatSingleRowFast: that is FastConfigHandle. (Probably would have been better to name it SingleRowPredictorHandle as part of the public API but I guess that might be a little late now?)

This means that at the moment, resources necessary for LGBM_BoosterPredictForMatSingleRowFast is are split between two places:

Some of it is in the booster (the Predictors)
Some of it is in the FastConfig (Config, the idea that some single row predictor settings have been set on the booster (you better hope that nobody else is calling LGBM_BoosterPredictForMatSingleRowFastInit with a different config on the same predictor)...)

The idea is: put all the resources that need to be initialized once before running single row predictions behind the FastConfigHandle object (aka SingleRowPredictorHandle), and take a write lock on that object, instead of the full Booster, when running predictions.

This way, when processing on multiple threads, each thread can make its call to LGBM_BoosterPredictForMatSingleRowFastInit, then work without contention:

Removes the need for taking a global write lock when predicting in single row mode
Simplifies the booster by removing the single_row_predictor_ array (should it have been named single_row_predictors instead btw? In any case, on SingleRowPredictorHandle aka FastConfigHandle we only need a single Predictor object), and removing the SetSingleRowPredictor, PredictSingleRow functions
- Specifically for this, there are two approaches:
  1. Make the LGBM_BoosterPredictForMatSingleRow instantiate its FastConfig at the beginning of the call. This is basically what it used to do but it didn't hold the predictor, so that would probably decrease performance for users that currently should use LGBM_BoosterPredictForMatSingleRowFastInit but use LGBM_BoosterPredictForMatSingleRow instead.
  2. Make the LGBM_BoosterPredictForMatSingleRow keep using a single row predictor stored in the booster object, but that would be distinct from the one used by LGBM_BoosterPredictForMatSingleRowFast. This has the downside of being less clean, because we would need to leave the current single_row_predictor_, SetSingleRowPredictor, PredictSingleRow functions in Booster, however that maintains the current performance of LGBM_BoosterPredictForMatSingleRow for all users.

cc @AlbertoEAF

The text was updated successfully, but these errors were encountered:

Ten0 · 2023-08-04T21:26:44Z

@jameslamb Wouldn't the appropriate label be efficiency (and maybe feature request) rather than question ?

jameslamb · 2023-08-04T21:31:42Z

Sure.

Fixes microsoft#6021 - Store all resources that are reused across single-row predictions in the dedicated `SingleRowPredictor` (aka `FastConfig`) - Use that instead of resources in the `Booster` when doing single row predictions to avoid having to lock the `Booster` exclusively. - A FastConfig being alive now takes a shared lock on the booster (it was likely very incorrect to mutate the booster while this object was already built anyway)

jameslamb added the question label Aug 4, 2023

jameslamb added feature request efficiency feature and removed question labels Aug 4, 2023

Ten0 mentioned this issue Aug 6, 2023

Fix single row prediction performance in a multi-threaded environment #6024

Merged

jameslamb closed this as completed in #6024 Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix single row prediction performance in a multi-threaded environment #6021

Fix single row prediction performance in a multi-threaded environment #6021

Ten0 commented Aug 4, 2023

Ten0 commented Aug 4, 2023 •

edited

Loading

jameslamb commented Aug 4, 2023

Fix single row prediction performance in a multi-threaded environment #6021

Fix single row prediction performance in a multi-threaded environment #6021

Comments

Ten0 commented Aug 4, 2023

Summary

Motivation

Description

Ten0 commented Aug 4, 2023 • edited Loading

jameslamb commented Aug 4, 2023

Ten0 commented Aug 4, 2023 •

edited

Loading