Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/optimize single prediction #2992

Merged
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ OPTION(USE_TIMETAG "Set to ON to output time costs" OFF)
OPTION(USE_DEBUG "Set to ON for Debug mode" OFF)
OPTION(BUILD_STATIC_LIB "Build static library" OFF)
OPTION(BUILD_FOR_R "Set to ON if building lib_lightgbm for use with the R package" OFF)
OPTION(BUILD_PROFILING_TESTS "Set to ON to compile profiling executables for development and benchmarks." OFF)

if(APPLE)
OPTION(APPLE_OUTPUT_DYLIB "Output dylib shared library" OFF)
Expand Down Expand Up @@ -235,6 +236,12 @@ file(GLOB SOURCES
src/treelearner/*.cpp
)

if(BUILD_PROFILING_TESTS)
# For profiling builds with valgrind/callgrind use -DDEBUG=ON
add_executable(lightgbm_profile_single_row_predict profiling/profile_single_row_predict.cpp ${SOURCES} src/c_api.cpp)
add_executable(lightgbm_profile_single_row_predict_fast profiling/profile_single_row_predict_fast.cpp ${SOURCES} src/c_api.cpp)
endif(BUILD_PROFILING_TESTS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we should add profiling part into the repo... @guolinke WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the decision should I also remove the new profiling folder and its contents?

Copy link
Collaborator

@StrikerRUS StrikerRUS Jul 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the decision

Let's wait for some other opinions.

should I also remove the new profiling folder and its contents?

I think yes. Right now profiling folder looks a little bit unnatural here. I believe we should transform it into a test during the work on #261 or integrate it into other benchmarks (https://github.com/guolinke/boosting_tree_benchmarks).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gently ping @guolinke for your opinion

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it is better to remove this in the official repo.

Copy link
Collaborator

@guolinke guolinke Jul 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can keep a branch for the profiling

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guolinke I believe that a separate repo (just like for benchmarks) will be more intuitive.

@AlbertoEAF OK, seems that we got a decision. Please exclude profiling from this PR, but also save the corresponding code for some time somewhere in your fork.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input :)

I just removed it and kept it in a branch in my fork.

Copy link
Collaborator

@StrikerRUS StrikerRUS Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I've just come the thought that you can simply post the content of profiling files or the output of git diff command (I remember there were not so many files) in comment for this PR or original issue. 🙂


add_executable(lightgbm src/main.cpp ${SOURCES})
list(APPEND SOURCES "src/c_api.cpp")

Expand Down
129 changes: 128 additions & 1 deletion include/LightGBM/c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

typedef void* DatasetHandle; /*!< \brief Handle of dataset. */
typedef void* BoosterHandle; /*!< \brief Handle of booster. */
typedef void* FastConfigHandle; /*!< \brief Handle of FastConfig. */

#define C_API_DTYPE_FLOAT32 (0) /*!< \brief float32 (single precision float). */
#define C_API_DTYPE_FLOAT64 (1) /*!< \brief float64 (double precision float). */
Expand Down Expand Up @@ -577,7 +578,7 @@ LIGHTGBM_C_EXPORT int LGBM_BoosterGetEvalCounts(BoosterHandle handle,
* \param len Number of ``char*`` pointers stored at ``out_strs``.
* If smaller than the max size, only this many strings are copied
* \param[out] out_len Total number of evaluation datasets
* \param buffer_len Size of pre-allocated strings.
* \param buffer_len Size of pre-allocated strings.
* Content is copied up to ``buffer_len - 1`` and null-terminated
* \param[out] out_buffer_len String sizes required to do the full string copies
* \param[out] out_strs Names of evaluation datasets, should pre-allocate memory
Expand Down Expand Up @@ -703,6 +704,14 @@ LIGHTGBM_C_EXPORT int LGBM_BoosterCalcNumPredict(BoosterHandle handle,
int num_iteration,
int64_t* out_len);

/*!
* \brief Release FastConfig object.
*
* \param fastConfig Handle to the FastConfig object acquired with a `*FastInit()` method.
* \return 0 when it succeeds, -1 when failure happens
*/
LIGHTGBM_C_EXPORT int LGBM_FastConfigFree(FastConfigHandle fastConfig);

/*!
* \brief Make prediction for a new dataset in CSR format.
* \note
Expand Down Expand Up @@ -844,6 +853,73 @@ LIGHTGBM_C_EXPORT int LGBM_BoosterPredictForCSRSingleRow(BoosterHandle handle,
int64_t* out_len,
double* out_result);

/*!
* \brief Initialize and return a `FastConfigHandle` for use with `LGBM_BoosterPredictForCSRSingleRowFast`.
*
* Release the `FastConfig` by passing its handle to `LGBM_FastConfigFree` when no longer needed.
*
* \param handle Booster handle
* \param data_type Type of ``data`` pointer, can be ``C_API_DTYPE_FLOAT32`` or ``C_API_DTYPE_FLOAT64``
* \param num_col Number of columns
* \param parameter Other parameters for prediction, e.g. early stopping for prediction
* \param[out] out_fastConfig FastConfig object with which you can call `LGBM_BoosterPredictForMatSingleRowFast`
AlbertoEAF marked this conversation as resolved.
Show resolved Hide resolved
* \return 0 when it succeeds, -1 when failure happens
*/
LIGHTGBM_C_EXPORT int LGBM_BoosterPredictForCSRSingleRowFastInit(BoosterHandle handle,
const int data_type,
const int64_t num_col,
const char* parameter,
FastConfigHandle *out_fastConfig);

/*!
* \brief Faster variant of `LGBM_BoosterPredictForCSRSingleRow`.
*
* Score single rows after setup with `LGBM_BoosterPredictForCSRSingleRowFastInit`.
*
* By removing the setup steps from this call extra optimizations can be made like
* initializing the config only once, instead of once per call.
*
* \note
* Setting up the number of threads is only done once at `LGBM_BoosterPredictForCSRSingleRowFastInit`
* instead of at each prediction.
* If you use a different number of threads in other calls, you need to start the setup process over,
* or that number of threads will be used for this calls as well.
AlbertoEAF marked this conversation as resolved.
Show resolved Hide resolved
*
* \note
* You should pre-allocate memory for ``out_result``:
* - for normal and raw score, its length is equal to ``num_class * num_data``;
* - for leaf index, its length is equal to ``num_class * num_data * num_iteration``;
* - for feature contributions, its length is equal to ``num_class * num_data * (num_feature + 1)``.
*
* \param fastConfig_handle FastConfig object handle returned by `LGBM_BoosterPredictForCSRSingleRowFastInit`
* \param indptr Pointer to row headers
* \param indptr_type Type of ``indptr``, can be ``C_API_DTYPE_INT32`` or ``C_API_DTYPE_INT64``
* \param indices Pointer to column indices
* \param data Pointer to the data space
* \param nindptr Number of rows in the matrix + 1
* \param nelem Number of nonzero elements in the matrix
* \param predict_type What should be predicted
* - ``C_API_PREDICT_NORMAL``: normal prediction, with transform (if needed);
* - ``C_API_PREDICT_RAW_SCORE``: raw score;
* - ``C_API_PREDICT_LEAF_INDEX``: leaf index;
* - ``C_API_PREDICT_CONTRIB``: feature contributions (SHAP values)
* \param num_iteration Number of iterations for prediction, <= 0 means no limit
* \param[out] out_len Length of output result
* \param[out] out_result Pointer to array with predictions
* \return 0 when succeed, -1 when failure happens
*/
LIGHTGBM_C_EXPORT int LGBM_BoosterPredictForCSRSingleRowFast(FastConfigHandle fastConfig_handle,
const void* indptr,
int indptr_type,
const int32_t* indices,
const void* data,
int64_t nindptr,
int64_t nelem,
int predict_type,
int num_iteration,
int64_t* out_len,
double* out_result);

/*!
* \brief Make prediction for a new dataset in CSC format.
* \note
Expand Down Expand Up @@ -957,6 +1033,57 @@ LIGHTGBM_C_EXPORT int LGBM_BoosterPredictForMatSingleRow(BoosterHandle handle,
int64_t* out_len,
double* out_result);

/*!
* \brief Initialize and return a `FastConfigHandle` for use with `LGBM_BoosterPredictForMatSingleRowFast`.
*
* Release the `FastConfig` by passing its handle to `LGBM_FastConfigFree` when no longer needed.
*
* \param handle Booster handle
* \param data_type Type of ``data`` pointer, can be ``C_API_DTYPE_FLOAT32`` or ``C_API_DTYPE_FLOAT64``
* \param ncol Number of columns
* \param parameter Other parameters for prediction, e.g. early stopping for prediction
* \param[out] out_fastConfig FastConfig object with which you can call `LGBM_BoosterPredictForMatSingleRowFast`
* \return 0 when it succeeds, -1 when failure happens
*/
LIGHTGBM_C_EXPORT int LGBM_BoosterPredictForMatSingleRowFastInit(BoosterHandle handle,
int data_type,
int32_t ncol,
const char* parameter,
FastConfigHandle *out_fastConfig);

/*!
* \brief Faster variant of `LGBM_BoosterPredictForMatSingleRow`.
*
* Score a single row after setup with `LGBM_BoosterPredictForMatSingleRowFastInit`.
*
* By removing the setup steps from this call extra optimizations can be made like
* initializing the config only once, instead of once per call.
*
* \note
* Setting up the number of threads is only done once at `LGBM_BoosterPredictForMatSingleRowFastInit`
* instead of at each prediction.
* If you use a different number of threads in other calls, you need to start the setup process over,
* or that number of threads will be used for this calls as well.
AlbertoEAF marked this conversation as resolved.
Show resolved Hide resolved
*
* \param fastConfig_handle FastConfig object handle returned by `LGBM_BoosterPredictForMatSingleRowFastInit`
* \param data Single-row array data (no other way than row-major form).
* \param predict_type What should be predicted
* - ``C_API_PREDICT_NORMAL``: normal prediction, with transform (if needed);
* - ``C_API_PREDICT_RAW_SCORE``: raw score;
* - ``C_API_PREDICT_LEAF_INDEX``: leaf index;
* - ``C_API_PREDICT_CONTRIB``: feature contributions (SHAP values)
* \param num_iteration Number of iteration for prediction, <= 0 means no limit
* \param[out] out_len Length of output result
* \param[out] out_result Pointer to array with predictions
* \return 0 when it succeeds, -1 when failure happens
*/
LIGHTGBM_C_EXPORT int LGBM_BoosterPredictForMatSingleRowFast(FastConfigHandle fastConfig_handle,
const void* data,
int predict_type,
int num_iteration,
int64_t* out_len,
double* out_result);

/*!
* \brief Make prediction for a new dataset presented in a form of array of pointers to rows.
* \note
Expand Down
27 changes: 27 additions & 0 deletions profiling/profile_single_row_predict.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#include <iostream>
#include "LightGBM/c_api.h"

using namespace std;

int main() {
cout << "start\n";

BoosterHandle boosterHandle;
int num_iterations;
LGBM_BoosterCreateFromModelfile("./LightGBM_model.txt", &num_iterations, &boosterHandle);
cout << "Model iterations " << num_iterations<< "\n";

double values[] = {1.000000000000000000e+00,8.692932128906250000e-01,-6.350818276405334473e-01,2.256902605295181274e-01,3.274700641632080078e-01,-6.899932026863098145e-01,7.542022466659545898e-01,-2.485731393098831177e-01,-1.092063903808593750e+00,0.000000000000000000e+00,1.374992132186889648e+00,-6.536741852760314941e-01,9.303491115570068359e-01,1.107436060905456543e+00,1.138904333114624023e+00,-1.578198313713073730e+00,-1.046985387802124023e+00,0.000000000000000000e+00,6.579295396804809570e-01,-1.045456994324922562e-02,-4.576716944575309753e-02,3.101961374282836914e+00,1.353760004043579102e+00,9.795631170272827148e-01,9.780761599540710449e-01,9.200048446655273438e-01,7.216574549674987793e-01,9.887509346008300781e-01,8.766783475875854492e-01}; // score = 0.487278

int64_t dummy;

double score[1];
for (size_t i = 0; i < 3e5; ++i) {
LGBM_BoosterPredictForMatSingleRow(boosterHandle, values, C_API_DTYPE_FLOAT64, 28, 1, C_API_PREDICT_NORMAL, num_iterations, "", &dummy, score);
}
cout << "len=" << dummy << endl;

cout << "Score = " << score[0] << "\n";

cout << "end\n";
}
32 changes: 32 additions & 0 deletions profiling/profile_single_row_predict_fast.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#include <iostream>
#include "LightGBM/c_api.h"

using namespace std;

int main() {
cout << "start\n";

BoosterHandle boosterHandle;
int num_iterations;
LGBM_BoosterCreateFromModelfile("./LightGBM_model.txt", &num_iterations, &boosterHandle);
cout << "Model iterations " << num_iterations<< "\n";

double values[] = {1.000000000000000000e+00,8.692932128906250000e-01,-6.350818276405334473e-01,2.256902605295181274e-01,3.274700641632080078e-01,-6.899932026863098145e-01,7.542022466659545898e-01,-2.485731393098831177e-01,-1.092063903808593750e+00,0.000000000000000000e+00,1.374992132186889648e+00,-6.536741852760314941e-01,9.303491115570068359e-01,1.107436060905456543e+00,1.138904333114624023e+00,-1.578198313713073730e+00,-1.046985387802124023e+00,0.000000000000000000e+00,6.579295396804809570e-01,-1.045456994324922562e-02,-4.576716944575309753e-02,3.101961374282836914e+00,1.353760004043579102e+00,9.795631170272827148e-01,9.780761599540710449e-01,9.200048446655273438e-01,7.216574549674987793e-01,9.887509346008300781e-01,8.766783475875854492e-01}; // score = 0.487278

FastConfigHandle fastConfigHandle;
LGBM_BoosterPredictForMatSingleRowFastInit(boosterHandle, C_API_DTYPE_FLOAT64, 28, "", &fastConfigHandle);

int64_t dummy_out_len;
double score[1];
for (size_t i = 0; i < 3e5; ++i) {
LGBM_BoosterPredictForMatSingleRowFast(fastConfigHandle, values, C_API_PREDICT_NORMAL, num_iterations, &dummy_out_len, score);
}

LGBM_FastConfigFree(fastConfigHandle);

cout << "len=" << dummy_out_len << endl;

cout << "Score = " << score[0] << "\n";

cout << "end\n";
}
131 changes: 131 additions & 0 deletions src/c_api.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1735,6 +1735,56 @@ int LGBM_BoosterCalcNumPredict(BoosterHandle handle,
API_END();
}

/*!
* \brief Union to hold different int type values.
*
* Introduced with FastConfig to support multiple num_col types
* that show up in the rest of the C API prediction methods.
*/
union IntUnion {
int32_t int32;
int64_t int64;
};

/*!
* \brief Object to store resources meant for single-row Fast Predict methods.
*
* Meant to be used as a basic struct by the *Fast* predict methods only.
* It stores the configuration resources for reuse during prediction.
*
* Even the row function is stored. We score the instance at the same memory
* address all the time. One just replaces the feature values at that address
* and scores again with the *Fast* methods.
*/
struct FastConfig {
FastConfig(Booster *const booster_ptr,
const char *parameter,
const int data_type_,
const int32_t num_cols) : booster(booster_ptr), data_type(data_type_) {
ncol.int32 = num_cols;
config.Set(Config::Str2Map(parameter));
}

FastConfig(Booster *const booster_ptr,
const char *parameter,
const int data_type_,
const int64_t num_cols) : booster(booster_ptr), data_type(data_type_) {
ncol.int64 = num_cols;
config.Set(Config::Str2Map(parameter));
}

Booster* const booster;
Config config;
const int data_type;
IntUnion ncol;
};

int LGBM_FastConfigFree(FastConfigHandle fastConfig) {
API_BEGIN();
delete reinterpret_cast<FastConfig*>(fastConfig);
API_END();
}

int LGBM_BoosterPredictForCSR(BoosterHandle handle,
const void* indptr,
int indptr_type,
Expand Down Expand Up @@ -1886,6 +1936,51 @@ int LGBM_BoosterPredictForCSRSingleRow(BoosterHandle handle,
API_END();
}

int LGBM_BoosterPredictForCSRSingleRowFastInit(BoosterHandle handle,
const int data_type,
const int64_t num_col,
const char* parameter,
FastConfigHandle *out_fastConfig) {
API_BEGIN();
if (num_col <= 0) {
Log::Fatal("The number of columns should be greater than zero.");
} else if (num_col >= INT32_MAX) {
Log::Fatal("The number of columns should be smaller than INT32_MAX.");
}
AlbertoEAF marked this conversation as resolved.
Show resolved Hide resolved

auto fastConfig_ptr = std::unique_ptr<FastConfig>(new FastConfig(
reinterpret_cast<Booster*>(handle),
parameter,
data_type,
num_col));

if (fastConfig_ptr->config.num_threads > 0) {
omp_set_num_threads(fastConfig_ptr->config.num_threads);
}

*out_fastConfig = fastConfig_ptr.release();
API_END();
}

int LGBM_BoosterPredictForCSRSingleRowFast(FastConfigHandle fastConfig_handle,
const void* indptr,
int indptr_type,
const int32_t* indices,
const void* data,
int64_t nindptr,
int64_t nelem,
int predict_type,
int num_iteration,
int64_t* out_len,
double* out_result) {
API_BEGIN();
FastConfig *fastConfig = reinterpret_cast<FastConfig*>(fastConfig_handle);
auto get_row_fun = RowFunctionFromCSR<int>(indptr, indptr_type, indices, data, fastConfig->data_type, nindptr, nelem);
fastConfig->booster->PredictSingleRow(num_iteration, predict_type, static_cast<int32_t>(fastConfig->ncol.int64),
get_row_fun, fastConfig->config, out_result, out_len);
API_END();
}


int LGBM_BoosterPredictForCSC(BoosterHandle handle,
const void* col_ptr,
Expand Down Expand Up @@ -1983,6 +2078,42 @@ int LGBM_BoosterPredictForMatSingleRow(BoosterHandle handle,
API_END();
}

int LGBM_BoosterPredictForMatSingleRowFastInit(BoosterHandle handle,
const int data_type,
const int32_t ncol,
const char* parameter,
FastConfigHandle *out_fastConfig) {
API_BEGIN();
auto fastConfig_ptr = std::unique_ptr<FastConfig>(new FastConfig(
reinterpret_cast<Booster*>(handle),
parameter,
data_type,
ncol));

if (fastConfig_ptr->config.num_threads > 0) {
omp_set_num_threads(fastConfig_ptr->config.num_threads);
}

*out_fastConfig = fastConfig_ptr.release();
API_END();
}

int LGBM_BoosterPredictForMatSingleRowFast(FastConfigHandle fastConfig_handle,
const void* data,
const int predict_type,
const int num_iteration,
int64_t* out_len,
double* out_result) {
API_BEGIN();
FastConfig *fastConfig = reinterpret_cast<FastConfig*>(fastConfig_handle);
// Single row in row-major format:
auto get_row_fun = RowPairFunctionFromDenseMatric(data, 1, fastConfig->ncol.int32, fastConfig->data_type, 1);
fastConfig->booster->PredictSingleRow(num_iteration, predict_type, fastConfig->ncol.int32,
get_row_fun, fastConfig->config,
out_result, out_len);
API_END();
}


int LGBM_BoosterPredictForMats(BoosterHandle handle,
const void** data,
Expand Down
Loading