Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forcedsplits does not check features index boundary #5517

Closed
elena-sharova opened this issue Sep 30, 2022 · 3 comments · Fixed by #5653
Closed

forcedsplits does not check features index boundary #5517

elena-sharova opened this issue Sep 30, 2022 · 3 comments · Fixed by #5653
Labels

Comments

@elena-sharova
Copy link

Description

When using forcedsplits_filename parameter, there seems to be no check on the boundaries of the provided features index.
This makes it possible to get an access violation error. Ideally there should be a check to avoid run-time crashes.

Reproducible example

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import json
from lightgbm import LGBMClassifier

df = load_iris(return_X_y=True, as_frame=True)[0]
df.columns = [c.replace(" ","_") for c in df.columns]
x_cols = df.columns
df['target']=load_iris(return_X_y=True, as_frame=True)[1]

# The dataset has four features
X_train, X_test, y_train, y_test = train_test_split(df[x_cols], df['target'], test_size=0.33, random_state=42)

lgbm_model = LGBMClassifier(random_state=42)

# we proceed as if the dataset has five features
spl_dict = {"feature": 4,
 "threshold": 3.0,
 "left": {
        "feature": 0,
        "threshold": 3.25
    },
"right": {
    "feature": 0,
    "threshold": 3.25
    }
}

# write dict to a json file
with open('splits.json', 'w') as fp:
    json.dump(spl_dict, fp)

lgbm_model = LGBMClassifier(random_state=42,forcedsplits_filename="splits.json")
lgbm_model.fit(X_train, y_train)

The above gives this error:

OSError                                   Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 lgbm_model.fit(X_train, y_train)

File ~\AppData\Local\Continuum\anaconda3\envs\ml_env\lib\site-packages\lightgbm\sklearn.py:967, in LGBMClassifier.fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    964         else:
    965             valid_sets[i] = (valid_x, self._le.transform(valid_y))
--> 967 super().fit(X, _y, sample_weight=sample_weight, init_score=init_score, eval_set=valid_sets,
    968             eval_names=eval_names, eval_sample_weight=eval_sample_weight,
    969             eval_class_weight=eval_class_weight, eval_init_score=eval_init_score,
    970             eval_metric=eval_metric, early_stopping_rounds=early_stopping_rounds,
    971             verbose=verbose, feature_name=feature_name, categorical_feature=categorical_feature,
    972             callbacks=callbacks, init_model=init_model)
    973 return self

File ~\AppData\Local\Continuum\anaconda3\envs\ml_env\lib\site-packages\lightgbm\sklearn.py:748, in LGBMModel.fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    745 evals_result = {}
    746 callbacks.append(record_evaluation(evals_result))
--> 748 self._Booster = train(
    749     params=params,
    750     train_set=train_set,
    751     num_boost_round=self.n_estimators,
    752     valid_sets=valid_sets,
    753     valid_names=eval_names,
    754     fobj=self._fobj,
    755     feval=eval_metrics_callable,
    756     init_model=init_model,
    757     feature_name=feature_name,
    758     callbacks=callbacks
    759 )
    761 if evals_result:
    762     self._evals_result = evals_result

File ~\AppData\Local\Continuum\anaconda3\envs\ml_env\lib\site-packages\lightgbm\engine.py:292, in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    284 for cb in callbacks_before_iter:
    285     cb(callback.CallbackEnv(model=booster,
    286                             params=params,
    287                             iteration=i,
    288                             begin_iteration=init_iteration,
    289                             end_iteration=init_iteration + num_boost_round,
    290                             evaluation_result_list=None))
--> 292 booster.update(fobj=fobj)
    294 evaluation_result_list = []
    295 # check evaluation result.

File ~\AppData\Local\Continuum\anaconda3\envs\ml_env\lib\site-packages\lightgbm\basic.py:3021, in Booster.update(self, train_set, fobj)
   3019 if self.__set_objective_to_none:
   3020     raise LightGBMError('Cannot update due to null objective function.')
-> 3021 _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
   3022     self.handle,
   3023     ctypes.byref(is_finished)))
   3024 self.__is_predicted_cur_iter = [False for _ in range(self.__num_dataset)]
   3025 return is_finished.value == 1

OSError: exception: access violation reading 0x0000025A797419E8

Environment info

LightGBM version or commit hash: 3.3.2

Command(s) you used to install LightGBM: conda install -c conda-forge lightgbm


Additional Comments

@btrotta
Copy link
Collaborator

btrotta commented Dec 30, 2022

@elena-sharova Thanks for reporting this, and for the reproducible example. I've opened a PR to fix this.

@jameslamb
Copy link
Collaborator

+1 to that, excellent bug report @elena-sharova . Thank you so much for the effort you put into it.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants