Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] remove 'fobj' in favor of passing custom objective function in params #5052

Merged

Conversation

TremaMiguel
Copy link
Contributor

@TremaMiguel TremaMiguel commented Mar 4, 2022

Context

This PR closes contributes to #3244

Changes

  • Any custom evaluation passed to train through params['metric'] or feval` is accepted.
  • Allow passing custome objective functions through params['objective']. In the case that params['objective'] and fobj are both callable the first one is taken into account. fobj argument is removed.

Test

  • Functional tests with both classification and regression cases.

@TremaMiguel TremaMiguel changed the title feat: custom objective and metric in training params [python-package] custom objective and metric in training params Mar 4, 2022
python-package/lightgbm/engine.py Outdated Show resolved Hide resolved
python-package/lightgbm/engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
@shiyu1994
Copy link
Collaborator

@TremaMiguel Thanks for working on this! Could you please check my comments above?

@TremaMiguel
Copy link
Contributor Author

@shiyu1994 thanks for the time for reviewing, I've provided some answer to open threads.

@jameslamb jameslamb changed the title [python-package] custom objective and metric in training params [python-package] custom objective and metric in training params (fixes #3244) Mar 10, 2022
@shiyu1994
Copy link
Collaborator

Close and reopen to retriever CI.

@shiyu1994 shiyu1994 closed this Mar 11, 2022
@shiyu1994 shiyu1994 reopened this Mar 11, 2022
Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The changes LGTM now. I'm not sure if other reviewers would like to take a look before this is merging. So gently ping @StrikerRUS @jameslamb @jmoralez @guolinke.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 12, 2022

@TremaMiguel Thanks for your work!

Since scikit-learn wrapper internally calls train() function,

self._Booster = train(
params=params,
train_set=train_set,
num_boost_round=self.n_estimators,
valid_sets=valid_sets,
valid_names=eval_names,
fobj=self._fobj,
feval=eval_metrics_callable,
init_model=init_model,
feature_name=feature_name,
callbacks=callbacks
)

we can remove duplicated code from sklearn.py that is in engine.py now, right?
# Separate built-in from callable evaluation metrics
eval_metrics_callable = [_EvalFunctionWrapper(f) for f in eval_metric_list if callable(f)]
eval_metrics_builtin = [m for m in eval_metric_list if isinstance(m, str)]
# concatenate metric from params (or default if not provided in params) and eval_metric
params['metric'] = [params['metric']] if isinstance(params['metric'], (str, type(None))) else params['metric']
params['metric'] = [e for e in eval_metrics_builtin if e not in params['metric']] + params['metric']
params['metric'] = [metric for metric in params['metric'] if metric is not None]

@StrikerRUS
Copy link
Collaborator

Also, please make the same changes for cv() function.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 12, 2022

These changes are quite complicated and require more work that it seems from the first glance. So I suggest to split this PR into two ones with changes related to custom objective and metric respectively.

@TremaMiguel TremaMiguel requested a review from jmoralez as a code owner March 15, 2022 01:54
@jameslamb jameslamb mentioned this pull request Apr 14, 2022
60 tasks
tests/python_package_test/test_engine.py Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
params['objective'] = 'None' # objective = nullptr for unknown objective
params['objective'] = _ObjectiveFunctionWrapper(self._objective)
else:
params['objective'] = 'None'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see now.
Seems that LightGBM checks the consistency for objective and number of classes even during predicting.

void Config::CheckParamConflict() {

Full logs:

==================================================================================================== FAILURES ====================================================================================================
________________________________________________________________________________________ test_multiclass_custom_objective ________________________________________________________________________________________

    def test_multiclass_custom_objective():
        centers = [[-4, -4], [4, 4], [-4, 4]]
        X, y = make_blobs(n_samples=1_000, centers=centers, random_state=42)
        params = {'n_estimators': 10, 'num_leaves': 7}
        builtin_obj_model = lgb.LGBMClassifier(**params)
        builtin_obj_model.fit(X, y)
        builtin_obj_preds = builtin_obj_model.predict_proba(X)

        custom_obj_model = lgb.LGBMClassifier(objective=sklearn_multiclass_custom_objective, **params)
        custom_obj_model.fit(X, y)
>       custom_obj_preds = softmax(custom_obj_model.predict(X, raw_score=True))

test_sklearn.py:1299:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
d:\miniconda3\lib\site-packages\lightgbm\sklearn.py:1050: in predict
    result = self.predict_proba(X, raw_score, start_iteration, num_iteration,
d:\miniconda3\lib\site-packages\lightgbm\sklearn.py:1063: in predict_proba
    result = super().predict(X, raw_score, start_iteration, num_iteration, pred_leaf, pred_contrib, **kwargs)
d:\miniconda3\lib\site-packages\lightgbm\sklearn.py:813: in predict
    return self._Booster.predict(X, raw_score=raw_score, start_iteration=start_iteration, num_iteration=num_iteration,
d:\miniconda3\lib\site-packages\lightgbm\basic.py:3538: in predict
    return predictor.predict(data, start_iteration, num_iteration,
d:\miniconda3\lib\site-packages\lightgbm\basic.py:813: in predict
    preds, nrow = self.__pred_for_np2d(data, start_iteration, num_iteration, predict_type)
d:\miniconda3\lib\site-packages\lightgbm\basic.py:903: in __pred_for_np2d
    return inner_predict(mat, start_iteration, num_iteration, predict_type)
d:\miniconda3\lib\site-packages\lightgbm\basic.py:873: in inner_predict
    _safe_call(_LIB.LGBM_BoosterPredictForMat(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ret = -1

    def _safe_call(ret: int) -> None:
        """Check the return value from C API call.

        Parameters
        ----------
        ret : int
            The return value from C API calls.
        """
        if ret != 0:
>           raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
E           lightgbm.basic.LightGBMError: Number of classes must be 1 for non-multiclass training

d:\miniconda3\lib\site-packages\lightgbm\basic.py:142: LightGBMError
---------------------------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------------------------
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000075 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 510
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 2
[LightGBM] [Info] Start training from score -1.096614
[LightGBM] [Info] Start training from score -1.099613
[LightGBM] [Info] Start training from score -1.099613
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000071 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 510
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 2
[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
---------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------
[LightGBM] [Fatal] Number of classes must be 1 for non-multiclass training

Let's then pass 'None' for the custom object during predict phase.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor suggestions for tests.

@TremaMiguel
Copy link
Contributor Author

@StrikerRUS thanks for the time taken to review, I've addressed your comments. Let me know if something else is missing.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this hard work with API and test changes! LGTM!

@StrikerRUS
Copy link
Collaborator

@jameslamb @shiyu1994 @jmoralez @guolinke Please help with another pair of eyes.

@TremaMiguel
Copy link
Contributor Author

Perhaps a helping hand here @jameslamb ?

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes and threads tonight, I have no additional comments. Thanks very much for all the hard work on this @TremaMiguel , and as always for the very thorough review @StrikerRUS .

@jameslamb jameslamb changed the title [python-package] custom objective in training and cv params (fixes #3244) [python-package] remove 'fobj' in favor of passing custom objective function in params (fixes #3244) Apr 22, 2022
@jameslamb jameslamb merged commit 416ecd5 into microsoft:master Apr 22, 2022
@jameslamb
Copy link
Collaborator

I changed the PR title to explicitly include fobj, so it's easier to find for people looking through the release notes trying to understand why their code is broken.

@jameslamb jameslamb changed the title [python-package] remove 'fobj' in favor of passing custom objective function in params (fixes #3244) [python-package] remove 'fobj' in favor of passing custom objective function in params Apr 22, 2022
@TremaMiguel
Copy link
Contributor Author

TremaMiguel commented Apr 22, 2022

@jameslamb @StrikerRUS @shiyu1994 thank you very much for reviewing this. 🙌

@StrikerRUS
Copy link
Collaborator

@TremaMiguel Let's go further with feval according to your plan! 😃

@bfan1256
Copy link

bfan1256 commented Aug 5, 2023

Is there a specific way to use this with .train() I've tried to specifically set params['objective'] but get Unknown type parameter:objective got:function

And I'm simply passing a function reference

@jmoralez
Copy link
Collaborator

jmoralez commented Aug 7, 2023

Hey @bfan1256, you can find an example on how to use it here. If that doesn't work for you could you please open a new issue?

Copy link

github-actions bot commented Nov 8, 2023

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants