Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling mtry with lightgbm #12

Closed
simonpcouch opened this issue May 3, 2022 · 4 comments
Closed

handling mtry with lightgbm #12

simonpcouch opened this issue May 3, 2022 · 4 comments

Comments

@simonpcouch
Copy link
Contributor

The package currently uses this setting:

bonsai/R/lightgbm_data.R

Lines 161 to 168 in eb13b9e

parsnip::set_model_arg(
model = "boost_tree",
eng = "lightgbm",
parsnip = "mtry",
original = "feature_fraction",
func = list(pkg = "dials", fun = "mtry"),
has_submodel = FALSE
)

feature_fraction is actually mtry / number_of_predictors, though. From what I understand, rules::mtry_prop can be of use here. Still need to look into the most principled way to then handle mtry passed to boost_tree.

@hfrick
Copy link
Member

hfrick commented May 4, 2022

if you end up using rules::mtry_prop() then it might make sense to migrate that to dials so that you don't need to depend on rules for that

@simonpcouch
Copy link
Contributor Author

simonpcouch commented May 4, 2022

Okay, notes to future self:

From parsnip's perspective, the user-facing argument mtry is meant to accommodate both senses of this argument:

mtry | A number for the number (or proportion) of predictors that will be randomly 
       sampled at each split when creating the tree models (specific engines only)

dials documents mtry only in its sense as a number.

rules::mtry_prop, hopefully soon dials::mtry_prop, sets the default ranges etc. of the argument that is still passed as mtry by the user. mtry_prop is not user-facing in boost_tree. parsnip then handles everything else besides:

  1. checking mtry ranges and prompting if mtry > 1 to let the user know that mtry is used in its sense as a proportion for this engine—rlang::abort here.
  2. making sure that we prompt informatively if mtry is passed to boost_tree and feature_fraction is passed to set_engine—probably just rlang::warn here and let mtry take precedence.

I think it may be more usable to take in mtry in its sense as a number (i.e. not as _prop) and then convert in the backend—so that a user wouldn't need to update a model argument because of the engine they chose—but not sure if this is inconsistent with other implementations.

EDIT: changed interpretation after noticing dials' mtry documentation.

@simonpcouch
Copy link
Contributor Author

Closing in favor of #14. :)

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants