Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix saving and loading ulam() models when using cmdstanr; Fix log likelihood bugs for multi_normal outcome variables #425

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

timjzee
Copy link

@timjzee timjzee commented Mar 20, 2024

By default, cmdstanr saves model output in temporary files and only loads that data into memory when it is required. This means that when a fitted model is written to a file, a lot of data is not included in that file. So when the current R session is exited, the temporary files are deleted, and the fitted models can no longer be fully loaded. See example code below:

> library(rethinking)
Loading required package: cmdstanr
This is cmdstanr version 0.7.1
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /vol/customopt/cmdstan-2.34.1
- CmdStan version: 2.34.1

> fit_stan <- ulam(
    alist(
        y ~ dnorm( mu , sigma ),
        mu ~ dnorm( 0 , 10 ),
        sigma ~ dexp( 1 )
    ), data=list(y=c(-1,1)), file = "test"
)
Compiling Stan program...
Running MCMC with 1 chain, with 1 thread(s) per chain...

Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1 finished in 0.0 seconds.
Saving result as test.rds

# Temporary files still exist, so we can access our model data:
> precis(fit_stan)
       mean   sd  5.5% 94.5% rhat ess_bulk
mu    -0.12 1.06 -1.91  1.46 1.00   159.00
sigma  1.43 0.79  0.60  2.75 1.01   169.57

> q()

# however when we exit and start a new session:

> library(rethinking)
> fit_stan <- readRDS("test.rds")
> precis(fit_stan)
Error in read_cmdstan_csv(files = self$output_files(include_failed = FALSE),  :
  Assertion on 'files' failed: File does not exist: '/scratch/RtmpDWbuV6/ulam_cmdstanr_8b4a70079e6636864e593d38ad729313-202403191850-1-5b97c7.csv'.

The fix consists of two lines of code, and was inspired by these relevant pages:
https://discourse.mc-stan.org/t/error-in-read-cmdstan-csv-assertion-on-files-failed/30150
https://mc-stan.org/cmdstanr/reference/read_cmdstan_csv.html
https://mc-stan.org/cmdstanr/reference/fit-method-save_output_files.html

Using stanfit <- as_cmdstan_fit(cmdstanfit$output_files()) we assign the data from the temporary files to stanfit and store the data inside the ulam object.

Using this fix, the example code works:

> library(rethinking)
> fit_ulam <- readRDS("test.rds")
> precis(fit_ulam)
       mean   sd  5.5% 94.5% rhat ess_bulk
mu    -0.03 0.95 -1.62  1.31    1   218.40
sigma  1.53 0.72  0.72  2.88    1   100.27

This was tested on:

  • Ubuntu 22.04 LTS
  • R 4.1.2
  • CmdStan 2.34.1
  • cmdstanr 0.7.1
  • latest rethinking code

@timjzee timjzee changed the title Fix saving and loading ulam() models when using cmdstanr Fix saving and loading ulam() models when using cmdstanr; Fix log likelihood bugs for multi_normal outcome variables Jul 3, 2024
@timjzee
Copy link
Author

timjzee commented Jul 3, 2024

I also fixed issues that arise when specifying log_lik==TRUE on multivariate models. The first issue, fixed by 1a040c3, is that currently ulam assumes that Stan can calculate the log likelihood of multivariate models using a vectorised notation. This is not the case, as noted by a Stan developer here.

The second issue turned out to be an indexing issue specific to multivariate outcomes, see this explanation for full details, including example model. I made sure that my fix did not break anything for models with a single outcome variable.

Paging @rmcelreath , as these issues (including the temporary file issue) are likely to affect other users and my fixes are small enough to quickly check.

@FishModeler
Copy link

Hello! Could you potentially clarify this edit - from a student at their wits end with this issue :')

I run my model and everything finishes fine. So do I immediately run the stanfit <- as_cmdstan_fit(cmdstanfit$output_files()) line of code afterwards? I tried doing that and running the line of code exactly as shown, but it gives me the error "Error: object 'cmdstanfit' not found. I replaced cmdstanfit with the name of my model and it give me the error of "Error in ModelName$output_files : $ operator not defined for the s4 class."

Any help would be greatly appreciated cause this issue has been a huge set back and I've tried to understand all the coding behind the scenes but can't make any sense of it. Thank you!

@timjzee
Copy link
Author

timjzee commented Sep 26, 2024

Hi @FishModeler ,

Basically what you have to do is replace your current install of the rethinking package with my version of the rethinking package. To do that, open up R and do the following:

  • uninstall rethinking: remove.packages("rethinking")
  • install the remotes package to allow installing from GitHub: install.packages("remotes")
  • load the remotes package: library(remotes)
  • install my version of rethinking: install_github("timjzee/rethinking")

That should be it!

@FishModeler
Copy link

Thank you so much for your quiet reply! Your help has relieved so much stress.
I did as you suggested. The model ran fine, but now getting this error:
"Error in if (max(save_warmup) == 0L {:
missing value where TRUE/FALSE needed"
Any insight?

@timjzee
Copy link
Author

timjzee commented Sep 27, 2024

No problem! I need a little more information to help you with your new error. Why don't you edit your issue #443 and include the code you're trying to run and the full error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants