You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many LightGBM interfaces (in R, Python, C++, and others) accept a key-value map params, which can be used to override LightGBM's default configuration. The valid values are documented at https://lightgbm.readthedocs.io/en/latest/Parameters.html.
For many of those parameters, LightGBM recognizes a "main" parameter name and one or more "aliases" (other names which set the same configuration).
For example, main parameter num_iterations can also be referred to in user code as n_iter, num_round, and more (docs link).
On the C++ side, LightGBM guarantees reproducible behavior whenever multiple of these aliases are provided in the same call, like this:
{
"num_round": 100,
"n_iter": 200
}
LightGBM's C++, R, and Python packages should all make identical choices in such situations.
Motivation
Ensuring that LightGBM always chooses the same configuration given a certain content of params eliminates one possible source of the same code producing different results at different times or in different environments. That might save maintainers and users of the project time that would otherwise be lost investigating changes in results.
Description
Currently, the C++ side has some logic to make the choice of alias reproducible.
Instead of replicating that logic in Python and R code, I believe this feature should be implemented similarly to the approach taken in #4829. The full list of recognized aliases is known at compile time, so it shouldn't necessary to write R and Python code similar to that C++ code which checks name lengths and alphabetic ordering every time params is processed.
changing ParameterAlias::KeyAliasTransform() in C++ to use the output of Config::DumpAliases() or some other code on Config, iterate over aliases in order, and prefer the first one that it finds (taking advantage of the fact that the aliases have already been sorted)
To avoid the overhead of serializing and deserializing a JSON string, it might also be useful to add an intermediate method for Config::DumpAliases() that returns arrays of names, and re-use that across both Config::DumpAliases() and that alias-resolution code in ParameterAlias::KeyAliasTransform().
Summary
Many LightGBM interfaces (in R, Python, C++, and others) accept a key-value map
params
, which can be used to override LightGBM's default configuration. The valid values are documented at https://lightgbm.readthedocs.io/en/latest/Parameters.html.For many of those parameters, LightGBM recognizes a "main" parameter name and one or more "aliases" (other names which set the same configuration).
For example, main parameter
num_iterations
can also be referred to in user code asn_iter
,num_round
, and more (docs link).On the C++ side, LightGBM guarantees reproducible behavior whenever multiple of these aliases are provided in the same call, like this:
LightGBM's C++, R, and Python packages should all make identical choices in such situations.
Motivation
Ensuring that LightGBM always chooses the same configuration given a certain content of
params
eliminates one possible source of the same code producing different results at different times or in different environments. That might save maintainers and users of the project time that would otherwise be lost investigating changes in results.Description
Currently, the C++ side has some logic to make the choice of alias reproducible.
LightGBM/include/LightGBM/config.h
Lines 1141 to 1144 in fc0c8fd
Instead of replicating that logic in Python and R code, I believe this feature should be implemented similarly to the approach taken in #4829. The full list of recognized aliases is known at compile time, so it shouldn't necessary to write R and Python code similar to that C++ code which checks name lengths and alphabetic ordering every time
params
is processed.I think these could all be kept in sync by:
ParameterAlias::KeyAliasTransform()
and instead having https://github.com/microsoft/LightGBM/blob/master/helpers/parameter_generator.py pre-sort all aliases that wayParameterAlias::KeyAliasTransform()
in C++ to use the output ofConfig::DumpAliases()
or some other code onConfig
, iterate over aliases in order, and prefer the first one that it finds (taking advantage of the fact that the aliases have already been sorted)To avoid the overhead of serializing and deserializing a JSON string, it might also be useful to add an intermediate method for
Config::DumpAliases()
that returns arrays of names, and re-use that across bothConfig::DumpAliases()
and that alias-resolution code inParameterAlias::KeyAliasTransform()
.References
Created based on #5289 (comment).
The changes in #4829 are highly relevant to this issue, and reading that PR will help those looking to understand this issue more thoroughly.
Initial PR with deterministic aliases resolution method at cpp side: #961.
The text was updated successfully, but these errors were encountered: