Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Rename data -> newdata in predict #4973

Merged
merged 2 commits into from
Mar 12, 2022

Conversation

david-cortes
Copy link
Contributor

@david-cortes david-cortes commented Jan 23, 2022

ref #4968

Typically, the prediction function for model objects in R gets passed the prediction data as an argument named newdata, while this package instead names it data. All of base R's prediction functions (for example from the stats package which is part of base R and around which most other alike packages are modeled), such as predict.lm or predict.glm, name the argument newdata.

From a quick look at popular decision tree packages, most of them also name their argument as newdata - examples off the top of m mind that I searched for: xgboost, randomForest, gbm, C5.0, rpart, randomForestSRC. The only outlier out there is ranger which does not name it newdata. I guess the ratio of newdata to something else would grow considerably higher if packages for linear models were considered.

This PR renames the argument to newdata to be more in line with other packages that R users are likely to be familiar with.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for this proposal and for the list of examples!

I did some research tonight and agree with your proposal... even though stats::predict() is not opinionated about the name of the keyword argument referring to "data to create predictions from" in S3 methods that extend it, newdata is the most popular name used.

In addition to the packages mentioned in this PR's description, I also looked at a few other packages in the CRAN Machine Learning Task View (link).

package signature of predict() S3 method code link
{arules} predict.itemMatrix(object, newdata, ...) link
{caret} predict.train(object, newdata, ...) link
{catboost} catboost.caret$predict(modelFit, newdata, ...) link
{C5.0} predict.C5.0(object, newdata, ...) link
{gbm} predict.gbm(object, newdata, ...) link
{mboost} predict.mboost(object, newdata, ...) link
{randomForest} predict.randomForest(object, newdata, ...) link
{rpart} predict.rpart(object, newdata, ...) link
{stats} predict.glm(object, newdata = NULL, ...) link
{stats} predict.lm(object, newdata, ...) link

Examples of other patterns:

package signature of predict() S3 method code link
{catboost} catboost.predict(model, pool, ...) link
{elasticnet} predict.ent(model, newx, ...) link
{lars} predict.lars(object, newx, ...) link
{lightgbm} predict.lgb.Booster(object, data, ...) link
{parsnip} predict.model_fit(object, new_data, ...) link
{RGF} a $predict(x) method on an R6 object link

Given that...I support changing this argument name in predict.lgb.Booster(), as you've proposed. And I'm ok with this being a breaking change from the perspective of anyone using keyword arguments instead of positional arguments. Since newdata doesn't have a default argument, any existing code like (predict(bst, data = X)) will break with a loud error like argument "newdata" is missing, with no default.

Normally I'd say this kind of breaking change should go through a round of deprecation warnings, but support making it now for the same reasons mentioned in #5007 (comment).

@StrikerRUS or @Laurae2 at your earliest convenience, could you give your opinion on whether or not we should merge this breaking change as-in?

@StrikerRUS
Copy link
Collaborator

@jameslamb

could you give your opinion on whether or not we should merge this breaking change as-in?

Sorry, could you please clarify which other possible ways we have? Did you mean

a round of deprecation warnings

by this?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided examples make me think newdata is a kind of "stardard" name in the R-lang world, thus I support this renaming.

@jameslamb
Copy link
Collaborator

jameslamb commented Mar 3, 2022

@StrikerRUS I'm sorry for not being clear. I meant:

Option 1: Just change second argument name from data to newdata.

Option 2: Change second argument name from data to newdata, add data = NULL at the end of the signature, and if any non-NULL value is passed to data, use that value but raise a deprecation warning. Then in a later release (like 4.1.x), remove support for keyword argument data completely.

Which has the following implications:

# will not break with either option
preds <- predict(booster, X)

# Option 1: would fail, with error 'argument "newdata" is missing, with no default'
# Option 2: would succeed, with warning like 'using "data" as a keyword argument is deprecated, use "newdata"'
preds <- predict(booster, data = X)

Since newdata seems like a widely-accepted standard and since this project hasn't been doing releases very often and seems like it might still be a long time until 4.0, I support Option 1 because of the reduced maintenance burden and because the breaking change results in a loud error that will be easy for users to understand and react to.

@StrikerRUS
Copy link
Collaborator

@jameslamb Thanks for the extremely detailed explanation!

I support Option 1

Me too.

@StrikerRUS
Copy link
Collaborator

@david-cortes Sorry for the inconvenience, but could you please sync this PR with master branch to pass CI tests?

@david-cortes
Copy link
Contributor Author

Updated.

@jameslamb
Copy link
Collaborator

Closing and re-opening to re-trigger CI. I'm not sure why several of the CI builds failed (across different operating systems and with different errors).

@jameslamb jameslamb closed this Mar 9, 2022
@jameslamb jameslamb reopened this Mar 9, 2022
@StrikerRUS StrikerRUS merged commit 9307d53 into microsoft:master Mar 12, 2022
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants