Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulate.gam with ocat returning unexpected values #319

Open
hhp94 opened this issue Oct 29, 2024 · 2 comments
Open

simulate.gam with ocat returning unexpected values #319

hhp94 opened this issue Oct 29, 2024 · 2 comments

Comments

@hhp94
Copy link

hhp94 commented Oct 29, 2024

In simulate-methods.R in gratia, for ocat models, predict(type = "response") currently returns the probability of each category. However, the fix.family.rd function expects a vector of linear predictor values instead. This discrepancy leads to incorrect handling in the simulation.gam output for ocat.

I've pasted the mgcv::fix.family.rd function for the ocat family here for reference:

function (mu, wt, scale) 
{
  theta <- get(".Theta")
  R = length(theta) + 2
  alpha <- rep(0, R + 1)
  alpha[1] <- -Inf
  alpha[R + 1] <- Inf
  alpha[2] <- -1
  if (R > 2) {
    ind <- 3:R
    # looks like alpha is the vector of cut points along logistic distribution
    alpha[ind] <- alpha[2] + cumsum(exp(theta))
  }
  y <- u <- runif(length(mu))
  # mu should be on the same scale as logit(u)?
  u <- mu + log(u/(1 - u))
  for (i in 1:R) {
    y[u > alpha[i] & u <= alpha[i + 1]] <- i
  }
  y
}

Reproducible Example

library(gratia)
library(mgcv)
d <- data_sim(dist = "ocat")
m <- gam(y ~ s(x0), data = d, family = ocat(R = 4), method = "REML")
sim <- simulate(m, data = d[1, ])
attr(sim, "seed") <- NULL
sim

In this example, sim is a $4 \times 1$ matrix because fix.family.rd interpreted the output of predict(type = "response"), which is Pr(y = 1, 2, 3, 4), as linear predictor values, leading to unexpected results. I'm fairly sure this is a bug. If you agree, let me know how you'd like to fix it and I can open a PR. Should predict(type = "link") be used in simulate.gam instead?

Thank you so much for teaching me about gam and {mgcv}.

@gavinsimpson
Copy link
Owner

Thanks for the report. This is one of those unintentional things; I didn't know the ocat() had an $rd component. These non-standard families need specialist support and that's something I hadn't gotten to yet (my recollection was that none of them had $rd in the family).

This is a bug (I should probably check to see if a particular family is supported and throw an error for those that don't work). And yes, the fix will be to pass in a linear predictor value for each observation we're simulating for. I don't think the fix is that trivial however; I will need to handle all the other non-standard families and some of those will require different handling to that needed for ocat(). I have begun to tackle this in fitted_values(), where I have a list of standard families that just work normally and then other families that need special handling through a pre- or post-processing step. This is how I'm handling ocat() in fitted_values() for example. I'll need to think how best to do this in simulate() and also try to not duplicate code as the details of all these parameterisations for all the families makes my head hurt from time to time.

@hhp94
Copy link
Author

hhp94 commented Oct 31, 2024

Oh, I’d be happy to help! To be honest, I'm still getting familiar with posterior simulations on a theoretical level, so working with your implementations in gratia is a great opportunity for me to learn. Nevertheless, I'll go through your vignette more carefully, review the fitted_values code, and see what might be the best approach. In the meantime, if you have an idea and would like help implementing it, just let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants