Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Request: work with R serialization functions #4296

Closed
david-cortes opened this issue May 16, 2021 · 3 comments · Fixed by #4685
Closed

[R-package] Request: work with R serialization functions #4296

david-cortes opened this issue May 16, 2021 · 3 comments · Fixed by #4685

Comments

@david-cortes
Copy link
Contributor

Currently, lightgbm for R provides custom functions for serializing models in RDS files.

These functions do not play well with R's native serialization capabilities - for example:

  • If using lightgbm from some machine learning framework that wraps it (creating an object that contains a lightgbm model object inside), it will not be possible to de-serialize models, since such objects (from the wrapper framework) will only be serializable through R's own functions.
  • If restarting an R session, lightgbm models will be lost.

This additionally has the issue that there is no checking for null pointers (which happens after de-serializing a model that was not serialized through lightgbm's own functions) - see #4208

Would be better if it would instead play along with R's own save/load and saveRDS/readRDS. In order for this to work, it'd have to keep a copy of the model as serialized raw bytes inside the object, check if a model pointer is null before using it, and reconstruct it from the raw bytes if so. Would be nice if it also could give an informative error message when a dataset object contains a null pointer.

@jameslamb
Copy link
Collaborator

Thanks very much for writing this up. I think we need some more information about this request.

Could you provide more details on what you mean by "play along with R's own save / load and saveRDS / readRDS?
* It would be helpful if you could, for example, provide a code snippet showing something you wish you could do in {lightgbm} that is impossible or difficult today.

@david-cortes
Copy link
Contributor Author

Example:

  • Launch RStudio.
  • If it doesn't have the default settings, go to Tools -> Global Options... -> Workspace -> Save workspace to .RData on exit: -> set to Ask.
  • Run the following code:
library(lightgbm)
data(iris)
y = iris$Sepal.Length
X = as.matrix(iris[, -1])
m = lightgbm(data=X, label=y, params=list(objective="regression"), verbose=-1)
  • Click Session -> Restart R.
  • Run the following code (will crash, but even if it didn't, the model would not be usable):
predict(m, as.matrix(X))

@jameslamb
Copy link
Collaborator

Ah ok perfect, thank you! Yes I definitely think we'd like to support this. Thank you very much for writing it up.

Following our policy for feature requests, I'm going to close this now that I've added it to the list of feature requests at #2302. Contributions toward this feature are welcomed! Anyone reading this, please leave a comment if you'd like to help with this feature and the issue will be re-opened.

jameslamb added a commit that referenced this issue Dec 4, 2021
…readRDS() (fixes #4296) (#4685)

* idiomatic serialization

* linter

* linter, namespace

* comments, linter, fix failing test

* standardize error messages for null handles

* auto-restore handle in more functions

* linter

* missing declaration

* correct wrong signature

* fix docs

* Update R-package/R/lgb.train.R

Co-authored-by: James Lamb <[email protected]>

* Update R-package/R/lgb.drop_serialized.R

Co-authored-by: James Lamb <[email protected]>

* Update R-package/R/lgb.restore_handle.R

Co-authored-by: James Lamb <[email protected]>

* Update R-package/R/lgb.restore_handle.R

Co-authored-by: James Lamb <[email protected]>

* Update R-package/R/lgb.make_serializable.R

Co-authored-by: James Lamb <[email protected]>

* move 'restore_handle' from feature importance to dump method

* missing header

* move arguments order, update docs

* linter

* avoid leaving files in working directory

* add test for save_model=NULL

* missing comma

* Update R-package/R/lgb.restore_handle.R

Co-authored-by: Nikita Titov <[email protected]>

* Update R-package/src/lightgbm_R.cpp

Co-authored-by: Nikita Titov <[email protected]>

* change name of error function

* update comment

* restore old serialization functions but set as deprecated

* Update R-package/R/readRDS.lgb.Booster.R

Co-authored-by: Nikita Titov <[email protected]>

* Update R-package/R/saveRDS.lgb.Booster.R

Co-authored-by: Nikita Titov <[email protected]>

* update docs

* Update R-package/R/readRDS.lgb.Booster.R

Co-authored-by: James Lamb <[email protected]>

* Update R-package/R/saveRDS.lgb.Booster.R

Co-authored-by: James Lamb <[email protected]>

* Update R-package/tests/testthat/test_basic.R

Co-authored-by: James Lamb <[email protected]>

* Update R-package/R/readRDS.lgb.Booster.R

Co-authored-by: James Lamb <[email protected]>

* comments

* fix variable name

* restore serialization test for linear models

* Update R-package/R/lightgbm.R

Co-authored-by: James Lamb <[email protected]>

* update docs

* fix issues with null terminator

Co-authored-by: James Lamb <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants