Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isat treats each initial block as GUM and hence often does not search at all if diagnostics of that "GUM" don't pass #39

Open
jkurle opened this issue May 21, 2021 · 3 comments
Assignees

Comments

@jkurle
Copy link
Collaborator

jkurle commented May 21, 2021

Hi all,

I have encountered a major issue with isat(). It arises from the fact that each block search starts with the model in which all indicators from that block are added. This is treated as the GUM but selection of indicators is only undertaken if the "GUM" passes all diagnostic tests.

As an example: Suppose we have a sample of 100 observations and we want to do IIS. In this case, 4 blocks of 25 indicators each are used as the starting points for the search. For example, the first block includes indicators iis1-iis25, the second one iis26-iis50, and so on. The problem is that each of these starting models (regressors + set of indicators) is internally treated as the GUM in getsFun(). getsFun() only starts its search, however, if all diagnostic tests are passed. That means that some blocks are not even searched.

In the following minimal reproducible example I have added two outliers to the sample. The cause the normality test to reject for each of the blocks. So when indicators iis1-iis25 are included, the outlier at observation 100 causes the normality test of that "GUM" to fail and hence none of the iis1-iis25 indicators are actually selected over. The same happens for the other three blocks such that in effect, no paths are searched. In other examples, I have encountered less extreme versions but it has happened (even with data under the null (no contamination)) to me that some of the blocks were not searched at all. Similarly, it also occurs for less extreme outliers, e.g. you can change the outliers to only 3 or 2.5 and still observe that behaviour.

library(gets)
# the issue actually also arises for other seeds that I randomly tried, e.g. also seeds 1-10
# for seed 12345, no search is undertaken for the two middle blocks but at least the ones with contamination are searched
set.seed(11)
u <- rnorm(100)
# create deterministic outliers at observations 1 and 100
u[100] <- 4 # alternatively try 3 or 2.5
u[1] <- 4 # alternatively try 3 or 2.5
x <- rnorm(100)
y <- 2*x + u
# no search is conducted
isat(y = y, mxreg = x, iis = TRUE, sis = FALSE, t.pval = 1/100, normality.JarqueB = 0.05)
# to visualise the outliers
model <- lm(y ~ x, data = data.frame(cbind(y, x)))
plot(model$residuals)

I think it is concerning that even small contamination of only 2% of the sample causes the whole procedure to break down. I guess this is why Autometrics searches over more block compositions rather than "chronologically".

It is not clear what the actual GUM should be in our case. At least with indicator saturation, we have (potentially many, many) more regressors than observations, so cannot estimate the most general model and check for misspecification. I am therefore suggesting that we turn off diagnostics for the initial path searches and only select indicators based on statistical significance. Then, the diagnostics could be turned on at the final selection (when all retained IIS, SIS, TIS etc. are added together). Alternatively, we could turn the diagnostics on already a bit earlier, when the final selection of a specific indicator type is made. By that I mean e.g. after all IIS blocks were searched and the final selection of IIS is made, we could turn on diagnostics.

I don't want to turn off diagnostics completely for selecting indicators. Sometimes, an observation can be an "outlier" (unusual) not in terms of the size of its error but because it does not match the more general pattern of the data, such as homoskedasticity, arch, etc.

@jkurle
Copy link
Collaborator Author

jkurle commented Jul 13, 2022

I have to push this topic. I have relatively many settings/datasets in which no selection is undertaken due to the fact that diagnostics don't pass in the initial block searches, such that no selection is done and an empty model is returned.

Any suggestions how to deal with it or opinions on my suggestions in the initial post?

@namtran6701
Copy link

I was wondering if this problem has been resolved since no selection is performed and the model is empty if the intial block searches don't pass.

@moritzpschwarz
Copy link
Collaborator

I was wondering if this problem has been resolved since no selection is performed and the model is empty if the intial block searches don't pass.

As far as I know, no - but thanks for pushing it again. I will follow-up on this to see if we can maybe find a solution.

I presume that, as a workaround, you know that you can turn off diagnostics altogether by setting normality.JarqueB = NULL, ar.LjungB = NULL, arch.LjungB = NULL?

i.e. for the example above:

isat(y = y, mxreg = x, iis = TRUE, sis = FALSE, t.pval = 1/100, normality.JarqueB = NULL, ar.LjungB = NULL, arch.LjungB = NULL)

Not ideal, I know...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants