Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No systematic pattern in residuals detected with DHARMa, but present in residuals vs. fitted plot obtained with broom #444

Open
MLieke opened this issue Oct 9, 2024 · 4 comments

Comments

@MLieke
Copy link

MLieke commented Oct 9, 2024

Dear Florian or Melina or ...

First of all thank you for the very helpful package and all the effort you put in helping people out with their analyses!

As for me, I was hoping I could get your opinion on the following.

For our study on the difference in densities of certain organism groups between four land use types, we collected data at 15 different locations. At each of those location, at least three of the four land use types were present, and we took two samples per habitat type per location. Thereby, we obtained 88 observations in total.

Our model was very simple and specified as follows:

model <- glmmTMB(density ~ landuse + (1|location),
                 family = tweedie,
                 data = beetles)

When I use:

simres <- simulateResiduals(fittedModel = model, plot = F)
plot(simres)

I get the following output that does not really reveal an issue with the model:
image

However, when I use:

library(broom.mixed)
aa <- augment(model, data = beetles)

aa$.fitted0 <- predict(model, newdata = transform(beetles, location=NA),type="response")
aa$.resid0 <- beetles$density-aa$.fitted0
(gg3 <- ggplot(aa, aes(.fitted0,.resid0))
    + geom_line(aes(group=location),colour="gray")
    + geom_point(aes(shape=landuse))
    + geom_smooth()
) 

This produces the following output that seems to reveal a pattern in the residuals, I presume caused by the outliers in the fourth land use type (the + signs)?
image

So my questions would be
a) Would you consider that if this pattern doesn't show up in the DHARMa diagnostics, it would not be too problematic, or would this non-detection rather be caused by the small sample size?
b) If you think it would be necessary to do something about this, would you have a recommendation for how to deal with this, taking into consideration the limited sample size (I tried adding dispformula= ~landuse, but thought this actually wouldn't be advisable given the risk of overfitting here and it also didn't really remedy this issue) and the fact that we would like it if the modelling approach would not be too advanced (given a rather broad audience)?

Thank you very much in advance and have a nice day!

@melina-leite
Copy link
Collaborator

Dear @MLieke,

I'll start with another question:
It seems you are counting the number of beetles in the different landuses within each location, right? Why are you using a Tweedie distribution if you have the integer response Number of individuals? If the landuses have different sizes, or the effort in each landuse (e.g. number of traps) is different, you can still use an integer response N.beetles and model the area or effort (to have it converted to density) as an offset (in glmmTMB it would be offset(log(X)). Then, you could use a more appropriate distribution for count data, such as Poisson or Negative Binomial.

However, your model's diagnostic with the density and family Tweedie seems okay (maybe a little bit underdispersed because you are using the Tweedie distribution?).

For the second part, it seems you are trying to obtain the raw residuals (observed - fitted), which would work only in the case of a Gaussian model (lm). As you are using a Tweedie (the same would apply to a Poisson or Negative Binomial) distribution, the raw residuals don't apply anymore because you have a distribution from the exponential family (where mean and variance are not independent of each other). I suggest you take a look at the initial part of the DHARMa vignette, which shows why raw (and other residual types) don't work for glms and glmms.

Answering your questions:

a) From what you presented, I'd say that your model has an okay residual diagnostic, but:
b) if my suspicion of the nature of your data is correct, I'd not adjust a Tweedie distribution but a Poisson or Negative Binomial instead.

Best,
Melina

@florianhartig
Copy link
Owner

Hi,

if your response is count I agree with @melina-leite, but not sure, maybe you have another response that necessities something like tweedie.

In any case, I would note that the raw and DHARMa residuals actually tell you the same story - you have 4 prediction values, values 2 and 4 are wider distributed. DHARMa just distributes the values differently on the x axis, as per default we transform to boxplot (makes it easier to see the residuals if they are all on one point). You can suppress this, see help.

The fact that prediction value 4 is wide if plotted raw is because quantile residuals (in DHARMa) account for the expected dispersion. Thus, if you are concerned about heteroskedasticity in your raw plots, you should look at the DHARMa results.

Based on your raw and DHARMa residuals, I don't really see an issue either.

@MLieke
Copy link
Author

MLieke commented Oct 10, 2024

Dear Melina and Florian

Thank you very much for your super quick replies!

As an answer to your question, Melina:
Indeed, we obtained non-integers by averaging the counts across two or three sampling sessions. This we could indeed probably have better approached by using an offset term in the model so that we could still use a Poisson or negative binomial distribution, but as we also in many cases had a lot of zeroes, we preferred using a model with tweedie distribution over a zero-inflated Poisson or negative binomial model (mostly because of the easier interpretation). I hope this makes sense?
(I did compare models using a tweedie and negative binomial distribution for a species group for which there were no zeroes in the data and the counts were still integers, and the model output was reassuringly extremely similar.)

For the second part: I see, thank you! I based this approach for checking the patterns in the residuals on what I found recommended here by Ben Bolker for checking a beta regression: [https://stats.stackexchange.com/questions/423274/checking-a-beta-regression-model-via-glmmtmb-with-dharma-package]
Out of curiosity: what would your thoughts on this be?

Thank you both a lot for your thoughts and clarifications!

Best regards
Lieke

@florianhartig
Copy link
Owner

Hi @MLieke,

about the beta: I'm not sure what you mean - yes, there was a problem with glmmTMB, but this was fixed long ago. Once the quantile residuals work, they are always preferable over raw residuals, as you have no idea to know if the pattern you see in your raw plot is actually inconsistent with a complicated GLM distribution (such as the tweedie).

What the DHARMa results show you is that your raw residuals (that you see in your raw plot) do not deviate significantly from the expectation of the fitted tweedie.

Best
F

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants