Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenges when sharing models due to package versions #127

Closed
tylerlittlefield opened this issue Jul 25, 2022 · 4 comments
Closed

Challenges when sharing models due to package versions #127

tylerlittlefield opened this issue Jul 25, 2022 · 4 comments
Labels
feature a feature request or enhancement

Comments

@tylerlittlefield
Copy link

tylerlittlefield commented Jul 25, 2022

I am currently sharing development models using the example workflow:

library(vetiver)
library(pins)

# connect to model board
b <- board_s3(
  bucket = "models", 
  prefix = "label/", 
  region = structure("us-east-1", tags = list(type = "scalar"))
)

# checkout the version you want
v <- vetiver_pin_read(
  board = b, 
  name = "label", 
  version = "20220701T182136Z-9905d"
)

# create some dummy data
new_data <- data.frame(
  predictor = c(
    "some text",
    "more text"
  )
)

# occasionally, models require packages be loaded
load_pkgs(v$metadata$required_pkgs)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

# make predictions
predict(v, new_data)
#> # A tibble: 2 × 1
#>   .pred_class
#>   <fct>      
#> 1 TRUE       
#> 2 FALSE

However, we have found that this doesn't always work due to package versions being different from computer to computer. Occasionally, a user will call predict on new data and an error is returned as a result of package version differences. We have come to this conclusion because some computers can run the exact same code without error.

Is there a suggested workflow or any guidance on how to share development models? We are thinking of making a function like with_model_env that would leverage the renv.lock to run the model with the same set of packages temporarily but wanted to see if there is a better approach.

@juliasilge
Copy link
Member

This is related to #4.

We already have the infrastructure for creating the needed renv lockfile, which we use when creating Dockerfiles.

What we still need to do is store a hash of the renv lockfile when creating the vetiver_model() or when pinning it, and then check the hash when doing vetiver_pin_read().

@juliasilge juliasilge added the feature a feature request or enhancement label Jul 26, 2022
@tylerlittlefield
Copy link
Author

Understood, that makes sense. In the few times where package versions have been an issue, we have been able to resolve by installing the latest of all required packages using:

install.packages(v$metadata$required_pkgs)

@machow
Copy link

machow commented Dec 1, 2022

Just a high-level thought, it seems like there may be three levels of requirements that could be stashed with the model:

  • current: the name of the package (no version number)
  • package-style requirements: the name of the package pinned to version number (or something like that)
  • deployment-style requirements: a lockfile pinning the exact version of all installed packages

(I have no idea which is most useful, how these things should be threaded together 😬)

@juliasilge
Copy link
Member

We've got new support for renv now implemented via #154! If you would like to try it out, you can install with devtools::install_github("rstudio/vetiver-r") and then use the new check_renv = TRUE argument when you write or read. You can see a model that was stored using this new argument here. Click on "Raw metadata" to see the new metadata like required_pkgs and renv_lock.

Let me know if you run into further issues or questions with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants