Fix saving and loading ulam() models when using cmdstanr; Fix log likelihood bugs for multi_normal outcome variables #425

timjzee · 2024-03-20T09:57:12Z

By default, cmdstanr saves model output in temporary files and only loads that data into memory when it is required. This means that when a fitted model is written to a file, a lot of data is not included in that file. So when the current R session is exited, the temporary files are deleted, and the fitted models can no longer be fully loaded. See example code below:

> library(rethinking)
Loading required package: cmdstanr
This is cmdstanr version 0.7.1
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /vol/customopt/cmdstan-2.34.1
- CmdStan version: 2.34.1

> fit_stan <- ulam(
    alist(
        y ~ dnorm( mu , sigma ),
        mu ~ dnorm( 0 , 10 ),
        sigma ~ dexp( 1 )
    ), data=list(y=c(-1,1)), file = "test"
)
Compiling Stan program...
Running MCMC with 1 chain, with 1 thread(s) per chain...

Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1 finished in 0.0 seconds.
Saving result as test.rds

# Temporary files still exist, so we can access our model data:
> precis(fit_stan)
       mean   sd  5.5% 94.5% rhat ess_bulk
mu    -0.12 1.06 -1.91  1.46 1.00   159.00
sigma  1.43 0.79  0.60  2.75 1.01   169.57

> q()

# however when we exit and start a new session:

> library(rethinking)
> fit_stan <- readRDS("test.rds")
> precis(fit_stan)
Error in read_cmdstan_csv(files = self$output_files(include_failed = FALSE),  :
  Assertion on 'files' failed: File does not exist: '/scratch/RtmpDWbuV6/ulam_cmdstanr_8b4a70079e6636864e593d38ad729313-202403191850-1-5b97c7.csv'.

The fix consists of two lines of code, and was inspired by these relevant pages:
https://discourse.mc-stan.org/t/error-in-read-cmdstan-csv-assertion-on-files-failed/30150
https://mc-stan.org/cmdstanr/reference/read_cmdstan_csv.html
https://mc-stan.org/cmdstanr/reference/fit-method-save_output_files.html

Using stanfit <- as_cmdstan_fit(cmdstanfit$output_files()) we assign the data from the temporary files to stanfit and store the data inside the ulam object.

Using this fix, the example code works:

> library(rethinking)
> fit_ulam <- readRDS("test.rds")
> precis(fit_ulam)
       mean   sd  5.5% 94.5% rhat ess_bulk
mu    -0.03 0.95 -1.62  1.31    1   218.40
sigma  1.53 0.72  0.72  2.88    1   100.27

This was tested on:

Ubuntu 22.04 LTS
R 4.1.2
CmdStan 2.34.1
cmdstanr 0.7.1
latest rethinking code

timjzee · 2024-07-03T14:43:06Z

I also fixed issues that arise when specifying log_lik==TRUE on multivariate models. The first issue, fixed by 1a040c3, is that currently ulam assumes that Stan can calculate the log likelihood of multivariate models using a vectorised notation. This is not the case, as noted by a Stan developer here.

The second issue turned out to be an indexing issue specific to multivariate outcomes, see this explanation for full details, including example model. I made sure that my fix did not break anything for models with a single outcome variable.

Paging @rmcelreath , as these issues (including the temporary file issue) are likely to affect other users and my fixes are small enough to quickly check.

FishModeler · 2024-09-26T04:17:06Z

Hello! Could you potentially clarify this edit - from a student at their wits end with this issue :')

I run my model and everything finishes fine. So do I immediately run the stanfit <- as_cmdstan_fit(cmdstanfit$output_files()) line of code afterwards? I tried doing that and running the line of code exactly as shown, but it gives me the error "Error: object 'cmdstanfit' not found. I replaced cmdstanfit with the name of my model and it give me the error of "Error in ModelName$output_files : $ operator not defined for the s4 class."

Any help would be greatly appreciated cause this issue has been a huge set back and I've tried to understand all the coding behind the scenes but can't make any sense of it. Thank you!

timjzee · 2024-09-26T07:17:07Z

Hi @FishModeler ,

Basically what you have to do is replace your current install of the rethinking package with my version of the rethinking package. To do that, open up R and do the following:

uninstall rethinking: remove.packages("rethinking")
install the remotes package to allow installing from GitHub: install.packages("remotes")
load the remotes package: library(remotes)
install my version of rethinking: install_github("timjzee/rethinking")

That should be it!

FishModeler · 2024-09-26T23:42:40Z

Thank you so much for your quiet reply! Your help has relieved so much stress.
I did as you suggested. The model ran fine, but now getting this error:
"Error in if (max(save_warmup) == 0L {:
missing value where TRUE/FALSE needed"
Any insight?

timjzee · 2024-09-27T07:17:28Z

No problem! I need a little more information to help you with your new error. Why don't you edit your issue #443 and include the code you're trying to run and the full error message.

timjzee added 7 commits March 19, 2024 17:40

Update ulam-function.R

bc34b23

Update ulam-function.R

7cd5ba0

Fix log_lik for multi_normals with >2 variables

1a040c3

Check error location

a58e536

Check error

d86334d

Debug

cb59daf

Hacky fix for #2

8af6439

timjzee changed the title ~~Fix saving and loading ulam() models when using cmdstanr~~ Fix saving and loading ulam() models when using cmdstanr; Fix log likelihood bugs for multi_normal outcome variables Jul 3, 2024

FishModeler mentioned this pull request Sep 26, 2024

Issue saving ulam object when using cmdstanr #443

Closed

VojtechFiala mentioned this pull request Oct 16, 2024

Save and load model object issue (cmdstanr) #447

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix saving and loading ulam() models when using cmdstanr; Fix log likelihood bugs for multi_normal outcome variables #425

Fix saving and loading ulam() models when using cmdstanr; Fix log likelihood bugs for multi_normal outcome variables #425

timjzee commented Mar 20, 2024

timjzee commented Jul 3, 2024

FishModeler commented Sep 26, 2024

timjzee commented Sep 26, 2024

FishModeler commented Sep 26, 2024

timjzee commented Sep 27, 2024

Fix saving and loading ulam() models when using cmdstanr; Fix log likelihood bugs for multi_normal outcome variables #425

Are you sure you want to change the base?

Fix saving and loading ulam() models when using cmdstanr; Fix log likelihood bugs for multi_normal outcome variables #425

Conversation

timjzee commented Mar 20, 2024

timjzee commented Jul 3, 2024

FishModeler commented Sep 26, 2024

timjzee commented Sep 26, 2024

FishModeler commented Sep 26, 2024

timjzee commented Sep 27, 2024