-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Rename data
-> newdata
in predict
#4973
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for this proposal and for the list of examples!
I did some research tonight and agree with your proposal... even though stats::predict()
is not opinionated about the name of the keyword argument referring to "data to create predictions from" in S3 methods that extend it, newdata
is the most popular name used.
In addition to the packages mentioned in this PR's description, I also looked at a few other packages in the CRAN Machine Learning Task View (link).
package | signature of predict() S3 method |
code link |
---|---|---|
{arules} |
predict.itemMatrix(object, newdata, ...) |
link |
{caret} |
predict.train(object, newdata, ...) |
link |
{catboost} |
catboost.caret$predict(modelFit, newdata, ...) |
link |
{C5.0} |
predict.C5.0(object, newdata, ...) |
link |
{gbm} |
predict.gbm(object, newdata, ...) |
link |
{mboost} |
predict.mboost(object, newdata, ...) |
link |
{randomForest} |
predict.randomForest(object, newdata, ...) |
link |
{rpart} |
predict.rpart(object, newdata, ...) |
link |
{stats} |
predict.glm(object, newdata = NULL, ...) |
link |
{stats} |
predict.lm(object, newdata, ...) |
link |
Examples of other patterns:
package | signature of predict() S3 method |
code link |
---|---|---|
{catboost} |
catboost.predict(model, pool, ...) |
link |
{elasticnet} |
predict.ent(model, newx, ...) |
link |
{lars} |
predict.lars(object, newx, ...) |
link |
{lightgbm} |
predict.lgb.Booster(object, data, ...) |
link |
{parsnip} |
predict.model_fit(object, new_data, ...) |
link |
{RGF} |
a $predict(x) method on an R6 object |
link |
Given that...I support changing this argument name in predict.lgb.Booster()
, as you've proposed. And I'm ok with this being a breaking change from the perspective of anyone using keyword arguments instead of positional arguments. Since newdata
doesn't have a default argument, any existing code like (predict(bst, data = X)
) will break with a loud error like argument "newdata" is missing, with no default
.
Normally I'd say this kind of breaking change should go through a round of deprecation warnings, but support making it now for the same reasons mentioned in #5007 (comment).
@StrikerRUS or @Laurae2 at your earliest convenience, could you give your opinion on whether or not we should merge this breaking change as-in?
Sorry, could you please clarify which other possible ways we have? Did you mean
by this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided examples make me think newdata
is a kind of "stardard" name in the R-lang world, thus I support this renaming.
@StrikerRUS I'm sorry for not being clear. I meant: Option 1: Just change second argument name from Option 2: Change second argument name from Which has the following implications: # will not break with either option
preds <- predict(booster, X)
# Option 1: would fail, with error 'argument "newdata" is missing, with no default'
# Option 2: would succeed, with warning like 'using "data" as a keyword argument is deprecated, use "newdata"'
preds <- predict(booster, data = X) Since |
@jameslamb Thanks for the extremely detailed explanation!
Me too. |
@david-cortes Sorry for the inconvenience, but could you please sync this PR with |
Updated. |
Closing and re-opening to re-trigger CI. I'm not sure why several of the CI builds failed (across different operating systems and with different errors). |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
ref #4968
Typically, the prediction function for model objects in R gets passed the prediction data as an argument named
newdata
, while this package instead names itdata
. All of base R's prediction functions (for example from thestats
package which is part of base R and around which most other alike packages are modeled), such aspredict.lm
orpredict.glm
, name the argumentnewdata
.From a quick look at popular decision tree packages, most of them also name their argument as
newdata
- examples off the top of m mind that I searched for:xgboost
,randomForest
,gbm
,C5.0
,rpart
,randomForestSRC
. The only outlier out there isranger
which does not name itnewdata
. I guess the ratio ofnewdata
to something else would grow considerably higher if packages for linear models were considered.This PR renames the argument to
newdata
to be more in line with other packages that R users are likely to be familiar with.