-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] add more hints in sklearn.py #5460
Conversation
python-package/lightgbm/sklearn.py
Outdated
eval_metric=None, | ||
eval_at=(1, 2, 3, 4, 5), | ||
eval_metric: Optional[_LGBM_ScikitEvalMetricType] = None, | ||
eval_at: Iterable[int] = (1, 2, 3, 4, 5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be Sequence[int]
, given that for example range(5)
is an iterable but is not a valid type for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤩 Excellent point. I was just following the docstring and didn't consider this. Thanks very much for noting it!!!
I tried the following from the root of the repo, just to see what would happen:
sample code (click me)
from pathlib import Path
import numpy as np
import lightgbm as lgb
from sklearn.datasets import load_svmlight_file
rank_example_dir = Path('examples/lambdarank')
X_train, y_train = load_svmlight_file(str(rank_example_dir / 'rank.train'))
X_test, y_test = load_svmlight_file(str(rank_example_dir / 'rank.test'))
q_train = np.loadtxt(str(rank_example_dir / 'rank.train.query'))
q_test = np.loadtxt(str(rank_example_dir / 'rank.test.query'))
gbm = lgb.LGBMRanker(n_estimators=10)
gbm.fit(
X_train,
y_train,
group=q_train,
eval_set=[(X_test, y_test)],
eval_group=[q_test],
eval_at=range(3),
callbacks=[
lgb.early_stopping(10),
lgb.reset_parameter(learning_rate=lambda x: max(0.01, 0.1 - 0.01 * x))
]
)
And you're right...passing a range for eval_at
causes a failure when serializing the parameters to string to pass them through the C API functions.
File ".../site-packages/lightgbm/basic.py", line 326, in param_dict_to_str
raise TypeError(f'Unknown type of parameter:{key}, got:{type(val).__name__}')
TypeError: Unknown type of parameter:eval_at, got:range
Given that, I think the hint here should be even stricter than typing.Sequence
. Since this keyword argument is passed directly through to params
and there's no other code in LightGBM manipulating its value, I think it can only accept values that are valid for lightgbm.basic.param_dict_to_str()
.
For eval_at
, I think that means only a list of ints or tuple of ints is valid. param_dict_to_str()
supports list
, tuple
, and set
, but set
isn't appropriate for eval_at
because sets aren't iterable (e.g. don't have any ordering).
LightGBM/python-package/lightgbm/basic.py
Line 323 in 3d4e08e
if isinstance(val, (list, tuple, set)) or is_numpy_1d_array(val): |
I just pushed 81c234f which:
- sets the hint for
eval_at
toUnion[List[int], Tuple[int]]
- replaces use of the word "iterable" in the relevant docstrings with "list or tuple of int"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmoralez I won't merge this until you have a chance to respond, since what I did here is slightly different than what you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Co-authored-by: Nikita Titov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Contributes to #3756.
Adds some more type hints to places in
sklearn.py
that are missing them.Notes for Reviewers
I intentionally avoided hints on anything "array-like", since those will require some extra research and should be handled separately (#3756 (comment)).