isat treats each initial block as GUM and hence often does not search at all if diagnostics of that "GUM" don't pass #39

jkurle · 2021-05-21T23:13:41Z

Hi all,

I have encountered a major issue with isat(). It arises from the fact that each block search starts with the model in which all indicators from that block are added. This is treated as the GUM but selection of indicators is only undertaken if the "GUM" passes all diagnostic tests.

As an example: Suppose we have a sample of 100 observations and we want to do IIS. In this case, 4 blocks of 25 indicators each are used as the starting points for the search. For example, the first block includes indicators iis1-iis25, the second one iis26-iis50, and so on. The problem is that each of these starting models (regressors + set of indicators) is internally treated as the GUM in getsFun(). getsFun() only starts its search, however, if all diagnostic tests are passed. That means that some blocks are not even searched.

In the following minimal reproducible example I have added two outliers to the sample. The cause the normality test to reject for each of the blocks. So when indicators iis1-iis25 are included, the outlier at observation 100 causes the normality test of that "GUM" to fail and hence none of the iis1-iis25 indicators are actually selected over. The same happens for the other three blocks such that in effect, no paths are searched. In other examples, I have encountered less extreme versions but it has happened (even with data under the null (no contamination)) to me that some of the blocks were not searched at all. Similarly, it also occurs for less extreme outliers, e.g. you can change the outliers to only 3 or 2.5 and still observe that behaviour.

library(gets)
# the issue actually also arises for other seeds that I randomly tried, e.g. also seeds 1-10
# for seed 12345, no search is undertaken for the two middle blocks but at least the ones with contamination are searched
set.seed(11)
u <- rnorm(100)
# create deterministic outliers at observations 1 and 100
u[100] <- 4 # alternatively try 3 or 2.5
u[1] <- 4 # alternatively try 3 or 2.5
x <- rnorm(100)
y <- 2*x + u
# no search is conducted
isat(y = y, mxreg = x, iis = TRUE, sis = FALSE, t.pval = 1/100, normality.JarqueB = 0.05)
# to visualise the outliers
model <- lm(y ~ x, data = data.frame(cbind(y, x)))
plot(model$residuals)

I think it is concerning that even small contamination of only 2% of the sample causes the whole procedure to break down. I guess this is why Autometrics searches over more block compositions rather than "chronologically".

It is not clear what the actual GUM should be in our case. At least with indicator saturation, we have (potentially many, many) more regressors than observations, so cannot estimate the most general model and check for misspecification. I am therefore suggesting that we turn off diagnostics for the initial path searches and only select indicators based on statistical significance. Then, the diagnostics could be turned on at the final selection (when all retained IIS, SIS, TIS etc. are added together). Alternatively, we could turn the diagnostics on already a bit earlier, when the final selection of a specific indicator type is made. By that I mean e.g. after all IIS blocks were searched and the final selection of IIS is made, we could turn on diagnostics.

I don't want to turn off diagnostics completely for selecting indicators. Sometimes, an observation can be an "outlier" (unusual) not in terms of the size of its error but because it does not match the more general pattern of the data, such as homoskedasticity, arch, etc.

The text was updated successfully, but these errors were encountered:

jkurle · 2022-07-13T18:36:45Z

I have to push this topic. I have relatively many settings/datasets in which no selection is undertaken due to the fact that diagnostics don't pass in the initial block searches, such that no selection is done and an empty model is returned.

Any suggestions how to deal with it or opinions on my suggestions in the initial post?

namtran6701 · 2023-05-03T21:10:24Z

I was wondering if this problem has been resolved since no selection is performed and the model is empty if the intial block searches don't pass.

moritzpschwarz · 2023-05-03T21:20:12Z

I was wondering if this problem has been resolved since no selection is performed and the model is empty if the intial block searches don't pass.

As far as I know, no - but thanks for pushing it again. I will follow-up on this to see if we can maybe find a solution.

I presume that, as a workaround, you know that you can turn off diagnostics altogether by setting normality.JarqueB = NULL, ar.LjungB = NULL, arch.LjungB = NULL?

i.e. for the example above:

isat(y = y, mxreg = x, iis = TRUE, sis = FALSE, t.pval = 1/100, normality.JarqueB = NULL, ar.LjungB = NULL, arch.LjungB = NULL)

Not ideal, I know...

jkurle mentioned this issue May 25, 2021

isat does not count number of estimations during model selection #40

Closed

moritzpschwarz assigned jjreade and fpretis May 26, 2021

jkurle mentioned this issue Oct 5, 2021

isat user.diagnostics error in final arx() call #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

isat treats each initial block as GUM and hence often does not search at all if diagnostics of that "GUM" don't pass #39

isat treats each initial block as GUM and hence often does not search at all if diagnostics of that "GUM" don't pass #39

jkurle commented May 21, 2021

jkurle commented Jul 13, 2022

namtran6701 commented May 3, 2023

moritzpschwarz commented May 3, 2023

isat treats each initial block as GUM and hence often does not search at all if diagnostics of that "GUM" don't pass #39

isat treats each initial block as GUM and hence often does not search at all if diagnostics of that "GUM" don't pass #39

Comments

jkurle commented May 21, 2021

jkurle commented Jul 13, 2022

namtran6701 commented May 3, 2023

moritzpschwarz commented May 3, 2023