-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolving aliased argument names with lightgbm #53
Comments
Sorry for the delayed response @simonpcouch . I don't know a lot about But I do think I can answer your core question. Yes, the information in Given that:
I recommend the following:
I know that's not ideal, but the list of parameters and their aliases changes so infrequently, and LightGBM's release cycle is so unreliable, that I think it's the best way forward for what you're working on here. |
Thanks for chiming in, @jameslamb! This is helpful—it's good to have a better sense of these aliases' lifecycles, and a thumbs-up on hard-coding those values in rather than waiting on an export for To save you some time navigating the parsnip docs (sorry for not clarifying here), loosely: parsnip differentiates between "main" (or "model") arguments and "engine" arguments. lightgbm is one possible "engine" for boosted trees in parsnip, alongside friends like xgboost or C5.0. Main arguments are hyperparameters for boosted trees that are common to (most) all boosted tree engines, and have a standardized parsnip argument name and structure to be passed to mtcars$cyl <- as.factor(mtcars$cyl)
library(bonsai)
#> Loading required package: parsnip
bt <- boost_tree(trees = 100, min_n = 5) %>% set_mode("classification")
bt_xgb <- bt %>% set_engine("xgboost") %>% fit(cyl ~ ., mtcars)
bt_c50 <- bt %>% set_engine("C5.0") %>% fit(cyl ~ ., mtcars)
bt_lgb <- bt %>% set_engine("lightgbm") %>% fit(cyl ~ ., mtcars)
predict(bt_xgb, head(mtcars))
#> # A tibble: 6 × 1
#> .pred_class
#> <fct>
#> 1 6
#> 2 6
#> 3 4
#> 4 4
#> 5 8
#> 6 6
predict(bt_c50, head(mtcars))
#> # A tibble: 6 × 1
#> .pred_class
#> <fct>
#> 1 6
#> 2 6
#> 3 4
#> 4 6
#> 5 8
#> 6 6
predict(bt_lgb, head(mtcars))
#> # A tibble: 6 × 1
#> .pred_class
#> <fct>
#> 1 6
#> 2 6
#> 3 4
#> 4 6
#> 5 8
#> 6 6
# to supply an "engine argument":
bt_lgb2 <- bt %>% set_engine("lightgbm", num_leaves = 10) %>% fit(cyl ~ ., mtcars)
predict(bt_lgb2, head(mtcars))
#> # A tibble: 6 × 1
#> .pred_class
#> <fct>
#> 1 6
#> 2 6
#> 3 4
#> 4 6
#> 5 8
#> 6 6 Created on 2022-11-28 with reprex v2.0.2 |
Ahhhhh got it, thanks very much for that explanation! And sorry again for the delayed response. I'll try to be more responsive in the future. |
No need to apologize—your insight has been much appreciated! |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Following up on @jameslamb's comment here—thank you for being willing to discuss. :)
Some background, for the GitHub archeologists:
lightgbm allows passing many of its arguments with aliases. On the parsnip side, these include both main and engine arguments to
boost_tree()
, including the now-tunable engine argumentnum_leaves
. On the lightgbm side, these include both "core" and "control" arguments.As of now, any aliases supplied to
set_engine
are passed in the dots ofbonsai::train_lightgbm()
to the dots oflightgbm::lgb.train()
. lightgbm's machinery takes care of resolving aliases, with some rules that generally prevent silent failures while tuning:https://github.com/microsoft/LightGBM/blob/e45fc48405e9877138ffb5f7e1fd4c449752d323/R-package/R/utils.R#L176-L181
min_n
->min_data_in_leaf
:If a main argument is marked for tuning and a lightgbm alias is supplied as an engine arg, we ignore the alias silently. (Note that
bonsai::train_lightgbm()
setslgb.train()
'sverbose
argument to1L
if one isn't supplied.)The scariest issue I'd anticipate is the user not touching the main argument (that will be translated to the main, non-alias
lgb.train
argument), but setting the alias inset_engine()
. In that case, thebonsai::train_lightgbm()
default kicks in, and the user-supplied engine argument is silently ignored in favor of the default supplied as the non-alias lightgbm argument.🫣Reprex here. (Click to expand)
Marking a main argument for tuning, as usual:
Marking a main argument for tuning, and supplying its non-alias translation as engine arg:
Marking a main argument for tuning, and supplying an alias to tune as engine arg:
Note that both params end up in the resulting object, though only one is reference when making predictions.
Created on 2022-11-04 with reprex v2.0.2
I think the best approach here would be to raise a warning or error whenever an alias that maps to a main
boost_tree()
argument is supplied, and note that it can be resolved by passing as a main argument toboost_tree()
. Otherwise, passing aliases as engine arguments (i.e. that don't map to main arguments) seems unproblematic to me. Another option is to setverbose
to a setting that allows lightgbm to propogate its own prompts with duplicated aliases when any alias is supplied, though this feels like it might obscuretrain_lightgbm()
s role in passing a non-aliased argument. Either way, this requires being able to detect when an alias is supplied.A question for you, James, if you're up for it—is there any sort of dictionary that we could reference that would contain these mappings? A list like that currently outputted by
lightgbm:::.PARAMETER_ALIASES()
would be perfect, though that also contains the parameters listed under "Learning Control Parameters".We could also put that together ourselves—we'd just need the mappings for 8 of them:
Created on 2022-11-04 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: