-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] User-friendly redesign for lightgbm()
#4968
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for your continued interest in LightGBM!
However, I'm not willing to review this pull request in its current form.
As we have politely mentioned to you in the past, such large pull requests with multiple unrelated changes are very difficult to review. Some examples: #4207 (review), #4685 (review), .
As we've also politely mentioned to you several times in the past, making comments like "similar to those of other packages" or "more in line with what is used in core R packages" without providing reference links or specific evidence unnecessarily increases the effort required to review such proposals. Some examples: #4692 (comment), #4692 (review), #4207 (comment), #4812 (review), #4597 (comment), #4681 (comment).
This pull request is proposing several very large breaking changes to the R package's interface, each of which requires careful consideration and conversation about what the benefits of those changes are and whether they are worth the cost of breaking users' existing code. We have an obligation to LightGBM's existing users to understand and carefully consider changes to this project, especially changes that break existing code, and can't do that effectively for so many changes at one time.
We really appreciate all of your help with LightGBM, but need to be protective of the limited time that this project's small group of mostly-volunteer maintainers can devote to maintaining the project.
Could you please make each of the related sets of changes a separate pull request or proposal in an issue, which could be reviewed and discussed independently, and in each please provide specific evidence describing the benefit of the proposed change?
Looking through these changes tonight, I see the following sets of changes which should each be their own pull request / issue:
- Moving parameters
objective
,threads
, andmetric
out ofparams
list in examples and tests and passing it through keyword arguments. - Other changes in tests that move parameters out of
params
and into keyword arguments oflightgbm::lightgbm()
. - Changes to argument names in
predict.lgb.Booster()
. - Changing
lightgbm::lightgbm()
from a regular function to a set of S3 methods that dispatch on the type ofdata
(WITHOUT introducing the newformula
support or changing any argument names). - Adding support for formulas.
- Automatically setting
categorical_features
based on data frame column types (as part of [R-package] Accept data frames as inputs #4323). - Handling
factor
columns indata.frame
inputs (as part of [R-package] Accept data frames as inputs #4323) - Changing argument names
data
andlabel
toX
andy
inlightgbm::lightgbm()
. - non-standard evaluation of arguments using
eval.parent()
,deparse1()
,substitute()
, etc. - adding column names to predictions
- addition of a "class" type of prediction, where
predict()
returns the predicted class instead of probabilities for classification tasks
@david-cortes Thank you so much for your huge work! @jameslamb I absolutely agree with you that this PR should be spit into smaller ones. I just want to add that open one big PR with all changes first is a good idea because maintainers can assess overall idea and understand a final state that the proposing changes move the project into. Big PR can be treated as a playground for initial and general discussions (demonstrative alternative to original feature request). Also, it's good to test with CI how all proposed connected changes fit together. See for example dmlc/xgboost#7545 (comment) or dmlc/xgboost#7309 (comment). |
ref #4295
This PR redesigns the R interface for
lightgbm()
so as to make it similar to those of other packages and base R's, while keeping the oldlgb.train
interface as-is. This redesign includes changes such as:data.frame
inputs and handling the encoding of categorical variables in the X/y interface.