-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No systematic pattern in residuals detected with DHARMa, but present in residuals vs. fitted plot obtained with broom #444
Comments
Dear @MLieke, I'll start with another question: However, your model's diagnostic with the density and family Tweedie seems okay (maybe a little bit underdispersed because you are using the Tweedie distribution?). For the second part, it seems you are trying to obtain the raw residuals (observed - fitted), which would work only in the case of a Gaussian model (lm). As you are using a Tweedie (the same would apply to a Poisson or Negative Binomial) distribution, the raw residuals don't apply anymore because you have a distribution from the exponential family (where mean and variance are not independent of each other). I suggest you take a look at the initial part of the DHARMa vignette, which shows why raw (and other residual types) don't work for glms and glmms. Answering your questions: a) From what you presented, I'd say that your model has an okay residual diagnostic, but: Best, |
Hi, if your response is count I agree with @melina-leite, but not sure, maybe you have another response that necessities something like tweedie. In any case, I would note that the raw and DHARMa residuals actually tell you the same story - you have 4 prediction values, values 2 and 4 are wider distributed. DHARMa just distributes the values differently on the x axis, as per default we transform to boxplot (makes it easier to see the residuals if they are all on one point). You can suppress this, see help. The fact that prediction value 4 is wide if plotted raw is because quantile residuals (in DHARMa) account for the expected dispersion. Thus, if you are concerned about heteroskedasticity in your raw plots, you should look at the DHARMa results. Based on your raw and DHARMa residuals, I don't really see an issue either. |
Dear Melina and Florian Thank you very much for your super quick replies! As an answer to your question, Melina: For the second part: I see, thank you! I based this approach for checking the patterns in the residuals on what I found recommended here by Ben Bolker for checking a beta regression: [https://stats.stackexchange.com/questions/423274/checking-a-beta-regression-model-via-glmmtmb-with-dharma-package] Thank you both a lot for your thoughts and clarifications! Best regards |
Hi @MLieke, about the beta: I'm not sure what you mean - yes, there was a problem with glmmTMB, but this was fixed long ago. Once the quantile residuals work, they are always preferable over raw residuals, as you have no idea to know if the pattern you see in your raw plot is actually inconsistent with a complicated GLM distribution (such as the tweedie). What the DHARMa results show you is that your raw residuals (that you see in your raw plot) do not deviate significantly from the expectation of the fitted tweedie. Best |
Dear Florian or Melina or ...
First of all thank you for the very helpful package and all the effort you put in helping people out with their analyses!
As for me, I was hoping I could get your opinion on the following.
For our study on the difference in densities of certain organism groups between four land use types, we collected data at 15 different locations. At each of those location, at least three of the four land use types were present, and we took two samples per habitat type per location. Thereby, we obtained 88 observations in total.
Our model was very simple and specified as follows:
When I use:
I get the following output that does not really reveal an issue with the model:
However, when I use:
This produces the following output that seems to reveal a pattern in the residuals, I presume caused by the outliers in the fourth land use type (the + signs)?
So my questions would be
a) Would you consider that if this pattern doesn't show up in the DHARMa diagnostics, it would not be too problematic, or would this non-detection rather be caused by the small sample size?
b) If you think it would be necessary to do something about this, would you have a recommendation for how to deal with this, taking into consideration the limited sample size (I tried adding dispformula= ~landuse, but thought this actually wouldn't be advisable given the risk of overfitting here and it also didn't really remedy this issue) and the fact that we would like it if the modelling approach would not be too advanced (given a rather broad audience)?
Thank you very much in advance and have a nice day!
The text was updated successfully, but these errors were encountered: