forcedsplits does not check features index boundary #5517

elena-sharova · 2022-09-30T11:24:54Z

Description

When using forcedsplits_filename parameter, there seems to be no check on the boundaries of the provided features index.
This makes it possible to get an access violation error. Ideally there should be a check to avoid run-time crashes.

Reproducible example

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import json
from lightgbm import LGBMClassifier

df = load_iris(return_X_y=True, as_frame=True)[0]
df.columns = [c.replace(" ","_") for c in df.columns]
x_cols = df.columns
df['target']=load_iris(return_X_y=True, as_frame=True)[1]

# The dataset has four features
X_train, X_test, y_train, y_test = train_test_split(df[x_cols], df['target'], test_size=0.33, random_state=42)

lgbm_model = LGBMClassifier(random_state=42)

# we proceed as if the dataset has five features
spl_dict = {"feature": 4,
 "threshold": 3.0,
 "left": {
        "feature": 0,
        "threshold": 3.25
    },
"right": {
    "feature": 0,
    "threshold": 3.25
    }
}

# write dict to a json file
with open('splits.json', 'w') as fp:
    json.dump(spl_dict, fp)

lgbm_model = LGBMClassifier(random_state=42,forcedsplits_filename="splits.json")
lgbm_model.fit(X_train, y_train)

The above gives this error:

OSError                                   Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 lgbm_model.fit(X_train, y_train)

File ~\AppData\Local\Continuum\anaconda3\envs\ml_env\lib\site-packages\lightgbm\sklearn.py:967, in LGBMClassifier.fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    964         else:
    965             valid_sets[i] = (valid_x, self._le.transform(valid_y))
--> 967 super().fit(X, _y, sample_weight=sample_weight, init_score=init_score, eval_set=valid_sets,
    968             eval_names=eval_names, eval_sample_weight=eval_sample_weight,
    969             eval_class_weight=eval_class_weight, eval_init_score=eval_init_score,
    970             eval_metric=eval_metric, early_stopping_rounds=early_stopping_rounds,
    971             verbose=verbose, feature_name=feature_name, categorical_feature=categorical_feature,
    972             callbacks=callbacks, init_model=init_model)
    973 return self

File ~\AppData\Local\Continuum\anaconda3\envs\ml_env\lib\site-packages\lightgbm\sklearn.py:748, in LGBMModel.fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    745 evals_result = {}
    746 callbacks.append(record_evaluation(evals_result))
--> 748 self._Booster = train(
    749     params=params,
    750     train_set=train_set,
    751     num_boost_round=self.n_estimators,
    752     valid_sets=valid_sets,
    753     valid_names=eval_names,
    754     fobj=self._fobj,
    755     feval=eval_metrics_callable,
    756     init_model=init_model,
    757     feature_name=feature_name,
    758     callbacks=callbacks
    759 )
    761 if evals_result:
    762     self._evals_result = evals_result

File ~\AppData\Local\Continuum\anaconda3\envs\ml_env\lib\site-packages\lightgbm\engine.py:292, in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    284 for cb in callbacks_before_iter:
    285     cb(callback.CallbackEnv(model=booster,
    286                             params=params,
    287                             iteration=i,
    288                             begin_iteration=init_iteration,
    289                             end_iteration=init_iteration + num_boost_round,
    290                             evaluation_result_list=None))
--> 292 booster.update(fobj=fobj)
    294 evaluation_result_list = []
    295 # check evaluation result.

File ~\AppData\Local\Continuum\anaconda3\envs\ml_env\lib\site-packages\lightgbm\basic.py:3021, in Booster.update(self, train_set, fobj)
   3019 if self.__set_objective_to_none:
   3020     raise LightGBMError('Cannot update due to null objective function.')
-> 3021 _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
   3022     self.handle,
   3023     ctypes.byref(is_finished)))
   3024 self.__is_predicted_cur_iter = [False for _ in range(self.__num_dataset)]
   3025 return is_finished.value == 1

OSError: exception: access violation reading 0x0000025A797419E8

Environment info

LightGBM version or commit hash: 3.3.2

Command(s) you used to install LightGBM: conda install -c conda-forge lightgbm

Additional Comments

The text was updated successfully, but these errors were encountered:

btrotta · 2022-12-30T03:49:27Z

@elena-sharova Thanks for reporting this, and for the reproducible example. I've opened a PR to fix this.

jameslamb · 2022-12-30T04:05:18Z

+1 to that, excellent bug report @elena-sharova . Thank you so much for the effort you put into it.

github-actions · 2023-08-19T03:02:42Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jmoralez added the bug label Oct 16, 2022

btrotta mentioned this issue Dec 30, 2022

Check feature indexes in forced split file (fixes #5517) #5653

Merged

jameslamb closed this as completed in #5653 Dec 30, 2022

jameslamb pushed a commit that referenced this issue Dec 30, 2022

Check feature indexes in forced split file (fixes #5517) (#5653)

f84bfcf

github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forcedsplits does not check features index boundary #5517

forcedsplits does not check features index boundary #5517

elena-sharova commented Sep 30, 2022

btrotta commented Dec 30, 2022

jameslamb commented Dec 30, 2022

github-actions bot commented Aug 19, 2023