Skip to content

Commit

Permalink
[docs] document rounding behavior of floating point numbers in catego…
Browse files Browse the repository at this point in the history
…rical features (#5009)
  • Loading branch information
StrikerRUS authored Feb 16, 2022
1 parent d31346f commit 057ba07
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/Advanced-Topics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Categorical Feature Support

- Categorical features must be encoded as non-negative integers (``int``) less than ``Int32.MaxValue`` (2147483647).
It is best to use a contiguous range of integers started from zero.
Floating point numbers in categorical features will be rounded towards 0.

- Use ``min_data_per_group``, ``cat_smooth`` to deal with over-fitting (when ``#data`` is small or ``#category`` is large).

Expand Down
2 changes: 2 additions & 0 deletions python-package/lightgbm/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1159,6 +1159,7 @@ def __init__(self, data, label=None, reference=None,
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
params : dict or None, optional (default=None)
Other parameters for Dataset.
free_raw_data : bool, optional (default=True)
Expand Down Expand Up @@ -3563,6 +3564,7 @@ def refit(
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
dataset_params : dict or None, optional (default=None)
Other parameters for Dataset ``data``.
free_raw_data : bool, optional (default=True)
Expand Down
2 changes: 2 additions & 0 deletions python-package/lightgbm/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ def train(
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
keep_training_booster : bool, optional (default=False)
Whether the returned Booster will be used to keep training.
If False, the returned value will be converted into _InnerPredictor before returning.
Expand Down Expand Up @@ -463,6 +464,7 @@ def cv(params, train_set, num_boost_round=100,
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
fpreproc : callable or None, optional (default=None)
Preprocessing function that takes (dtrain, dtest, params)
and returns transformed versions of those.
Expand Down
1 change: 1 addition & 0 deletions python-package/lightgbm/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ def __call__(self, preds, dataset):
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
callbacks : list of callable, or None, optional (default=None)
List of callback functions that are applied at each iteration.
See Callbacks in Python API for more information.
Expand Down

0 comments on commit 057ba07

Please sign in to comment.