Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] record_evaluation callback doesn't work with cv function #4943

Closed
jmoralez opened this issue Jan 11, 2022 · 4 comments · Fixed by #4947
Closed

[python-package] record_evaluation callback doesn't work with cv function #4943

jmoralez opened this issue Jan 11, 2022 · 4 comments · Fixed by #4947
Labels

Comments

@jmoralez
Copy link
Collaborator

Description

The record_evaluation callback fails when trying to use it with the cv function. Since some integrations that log metrics may use that callback to save the training information and the cv function is very useful to do hyperparameter tuning I believe these two should work together.

Reproducible example

import lightgbm as lgb
import numpy as np

X = np.random.rand(100, 2)
y = np.random.rand(100)
ds = lgb.Dataset(X, y)
params = {'objective': 'l2', 'num_leaves': 5, 'verbose': -1}
eval_result = {}
callbacks = [lgb.record_evaluation(eval_result)]
bst = lgb.train(params, ds, num_boost_round=2, valid_sets=[ds], callbacks=callbacks)
print(eval_result)
# {'training': OrderedDict([('l2', [0.08664710410521784, 0.08580684200793261])])}

eval_result = {}
callbacks = [lgb.record_evaluation(eval_result)]
cv_hist = lgb.cv(
    params, ds, num_boost_round=2, stratified=False, callbacks=callbacks, eval_train_metric=True
)
# Traceback (most recent call last):
#   File "record_eval.py", line 14, in <module>
#     cv_hist = lgb.cv(params, ds, stratified=False, callbacks=callbacks)
#   File "/hdd/github/LightGBM/python-package/lightgbm/engine.py", line 582, in cv
#     cb(callback.CallbackEnv(model=cvfolds,
#   File "/hdd/github/LightGBM/python-package/lightgbm/callback.py", line 140, in _callback
#     _init(env)
#   File "/hdd/github/LightGBM/python-package/lightgbm/callback.py", line 134, in _init
#     for data_name, eval_name, _, _ in env.evaluation_result_list:
# ValueError: too many values to unpack (expected 4)

This can be easily fixed by capturing the extra content from env.evaluation_result_list with *_ here:

for data_name, eval_name, _, _ in env.evaluation_result_list:
and here:
for data_name, eval_name, result, _ in env.evaluation_result_list:

which would yield:
{'cv_agg': OrderedDict([('train l2', [0.06319766819578873, 0.06253713563921684]), ('valid l2', [0.06679350623951755, 0.06694167044391186])])}

Environment info

LightGBM version or commit hash: db045f4

Additional Comments

I'm happy to make the described changes if other maintainers agree with my solution.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 11, 2022

Ah, I remember we had a similar problem in early_stopping callback: #2209 and had to split that string.

# split is needed for "<dataset type> <metric>" case (e.g. "train l1")
first_metric = env.evaluation_result_list[0][1].split(" ")[-1]

# split is needed for "<dataset type> <metric>" case (e.g. "train l1")
eval_name_splitted = env.evaluation_result_list[i][1].split(" ")
if first_metric_only and first_metric != eval_name_splitted[-1]:
continue # use only the first metric for early stopping
if ((env.evaluation_result_list[i][0] == "cv_agg" and eval_name_splitted[0] == "train"
or env.evaluation_result_list[i][0] == env.model._train_data_name)):
_final_iteration_check(env, eval_name_splitted, i)
continue # train data for lgb.cv or sklearn wrapper (underlying lgb.train)

I believe we should preserve the nested structure of resulting dictionery in which we have dataset names at the first level and metric names at the second one, e.g.

{
  'cv_agg':
    {
      'train':
        {
          'l2': [0.06319766819578873, 0.06253713563921684]
        },
      'valid':
        {
          'l2': [0.06679350623951755, 0.06694167044391186]
        }
    }
}

@jmoralez
Copy link
Collaborator Author

I agree with preserving the structure but I think the cv_agg key doesn't provide much value, maybe we could drop it?

@StrikerRUS
Copy link
Collaborator

... maybe we could drop it?

Yeah, for sure!

@jmoralez jmoralez added the bug label Jan 13, 2022
StrikerRUS pushed a commit that referenced this issue Feb 15, 2022
…) (#4947)

* make record_evaluation compatible with cv

* test multiple metrics in cv

* lint

* fix cv with train metric. save stdv as well

* always add dataset prefix to cv_agg

* remove unused function
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants