Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Heteroskedastocity-robust wild boot test #82

Open
shpp9212 opened this issue Apr 13, 2023 · 14 comments
Open

Question on Heteroskedastocity-robust wild boot test #82

shpp9212 opened this issue Apr 13, 2023 · 14 comments

Comments

@shpp9212
Copy link

Hi,
When clustering is not done and variant = 21 or 31,
I can see that in the data generating process, residuals are transformed accordingly (HC2 or HC3-type)
but are HC2 or HC3 applied to HCCMEs?
To me it seems like for cov estimations, HC0 type is used.
Sorry if I am misunderstanding.

@s3alfisc
Copy link
Member

Hi @shpp9212 , good catch! Indeed, you could say that the HC0 type is used for computing the variance-covariance matrices. But then the HC0 vcov matrix is multiplied with a small sample correction of (N-1) / (N-k). So in that sense, I think it is fair to call this HC1, even though the textbook definition of HC1 would be a correction of N / (N-k)?

The "31" and "21" types mean: use the HC2/3 type for the dgp, but estimate all variance covariance matrices with the HC1. There is a little bit more background on this in the fwildclusterboot vignette and in the paper by MacKinnon, Nielsen & Webb that it links. It should not be too difficult to also implement "22" and "33" types, I'd be happy about any PR but can eventually also implement this myself if it is of interest =)

I hope this helps!

@shpp9212
Copy link
Author

shpp9212 commented Apr 17, 2023

Hi, Thanks for the comment.
I am not an expert econometrician, so I couldn't really understand the background papers 'Roodman et al' (2019) or 'MacKinnon et al' (2022). My current interest is in heteroskedasticity-robust wild boot tests, and I have a further issue.

I believe (I could be wrong, please let me know) your packages without clustering matches with the following paper's methods.

https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1268.pdf

I was trying to replicate figure 5 of the paper (page 24), but could not succeed.

I believe w3r2, which is superior to w2r2 with respect to test size (close to nominal 0.05 significance level), corresponds to the following code (figure 5 uses HC1) :
wildboottest(model, B = 399, param='x5',bootstrap_type='31',show=False)

But what I get is its performance is worse than when I use bootstrap_type='21', which I believe correponds to w2r2.

The following is the python code I used. (Sorry I am not very good at coding)
I also tested with R, but the picture was similar.

If you are familiar with the literature, could you let me know which part of my understanding is wrong?
(My understanding of your package could be wrong, or something like that)

Thank you very much!

My code

import pandas as pd
import numpy as np
import statsmodels.api as sm
from wildboottest.wildboottest import wildboottest
total=1000
thres=0.05
N=40
beta5=0
freq_11=[]
freq_21=[]
freq_31=[]
for gamma in np.linspace(0,2,21):
    count_11=0
    count_21=0
    count_31=0
    for i in range(1,total+1):
        x2=np.random.randn(N)
        x3=np.random.randn(N)
        x4=np.random.randn(N)
        x5=np.random.randn(N)
        x2=np.exp(x2)
        x3=np.exp(x3)
        x4=np.exp(x4)
        x5=np.exp(x5)
        yhat=1+x2+x3+x4+beta5*x5
        sigma=yhat**(2*gamma)
        sigma=np.sqrt(sigma/np.mean(sigma))
        y=yhat+sigma*np.random.randn(N)
        df=pd.DataFrame({'y':y,'x2':x2,'x3':x3,'x4':x4,'x5':x5})
        model = sm.OLS(df['y'], sm.add_constant(df[['x2','x3','x4','x5']]))
        p_val_11 = wildboottest(model, B = 399, param='x5',bootstrap_type='11',show=False)['p-value'][0]
        p_val_21 = wildboottest(model, B = 399, param='x5',bootstrap_type='21',show=False)['p-value'][0]
        p_val_31 = wildboottest(model, B = 399, param='x5',bootstrap_type='31',show=False)['p-value'][0]
        if p_val_11<thres:
            count_11=count_11+1
        if p_val_21<thres:
            count_21=count_21+1
        if p_val_31<thres:
            count_31=count_31+1
    freq_11.append(count_11/total)
    freq_21.append(count_21/total)
    freq_31.append(count_31/total)

plot with plotly

gamma=np.linspace(0,2,21)
from plotly.subplots import make_subplots
import plotly.graph_objects as go
fig = make_subplots()
fig.update_layout(
    width=1000,
    height=500)
fig.add_trace(go.Scatter(x=gamma, y=freq_11,
                    mode='lines',name='w1r2')),
fig.add_trace(go.Scatter(x=gamma, y=freq_21,
                    mode='lines',name='w2r2')),
fig.add_trace(go.Scatter(x=gamma, y=freq_31,
                    mode='lines',name='w3r2'))
fig.show()

@shpp9212
Copy link
Author

shpp9212 commented Apr 17, 2023

My R code

library(fwildclusterboot)
library(lmtest)
library(sandwich)
library(tictoc)

tic()

total=50000
thres=0.05
N=40
beta5=0

freq_11=c()
freq_21=c()
freq_31=c()

for(gamma in seq(0,2,0.1)){
  count_11=0
  count_21=0
  count_31=0
  for(i in 1:total){
    
    x2=rnorm(N)
    x3=rnorm(N)
    x4=rnorm(N)
    x5=rnorm(N)
    
    x2=exp(x2)
    x3=exp(x3)
    x4=exp(x4)
    x5=exp(x5)
    
    
    yhat=1+x2+x3+x4+beta5*x5
    
    sigma=yhat**(2*gamma)
    sigma=sqrt(sigma/mean(sigma))
    
    y=yhat+sigma*rnorm(N)
    
    fm<-lm(y~x2+x3+x4+x5)
    boot_11 <- boottest(fm,param = "x5",B = 399,bootstrap_type="11")
    boot_21 <- boottest(fm,param = "x5",B = 399,bootstrap_type="21")
    boot_31 <- boottest(fm,param = "x5",B = 399,bootstrap_type="31")
    
    
    if(boot_11$p_val<thres)
    {count_11<-count_11+1}
    
    if(boot_21$p_val<thres)
    {count_21<-count_21+1}
    
    if(boot_31$p_val<thres)
    {count_31<-count_31+1}
    
  }
  
  freq_11=c(freq_11,count_11/total)
  freq_21=c(freq_21,count_21/total)
  freq_31=c(freq_31,count_31/total)
}

gamma=seq(0,2,0.1)
plot(gamma,freq_11,col="black",ylim=c(0,0.2))
points(gamma,freq_21,col="purple")
points(gamma,freq_31,col="red")
legend(x="topleft", legend=c("w1r2","w2r2","w3r2"),lty=1:3,
       col=c("black", "purple","red"))

toc()

It took about 3 hours to run in my local PC.
figure5
The figure i get with 50000 replications from R
(Similar pattern with python; I am prohibited uploading the python figure from my current environment)

@s3alfisc
Copy link
Member

Very cool! So this suggests that the WRE11 performs better than the '21' and '31' versions?

Are you aware of the paper by Flachaire? He benchmarks the '11' vs '22' and '33' bootstraps, which are not implemented in fwildclusterboot (thought it would be easy to add them), using a slightly larger sample size N = 1000.

Is there a particular reasons that you are not specifiying the length of freq_11=c() prior to the simulation? This might help with performance. Also, boottest has a nthreads argument, which might help with performance with parallelisation (but it might not in this case as both N and B are small).

Also, I have just open sourced some simulations on the cluster bootstrap here, in case this helps you in any way =)

Best, Alex

@shpp9212
Copy link
Author

Thanks for your suggestions.
My concern is that the result I got is not consistent with the above paper by MacKinnon, if my understanding is correct.
'31' rejection rate line should be closer to the 0.05 nominal line than '21' or '11', but I got different results.
fwildclusterboot's implementation might have different details from the methods from the paper, which I don't really understand right now.
Thank you for your cool packages & efforts!

@s3alfisc
Copy link
Member

Thanks for the nice words! I could come up with two reasons why your results might differ from MNW: first, they focus on clustered errors, and its not intuitively clear to me why they should translate to non-clustered errors? I guess the next thing I would do is to increase the sample size slightly - I think that MNW pick a sample size significantly higher than that?

@shpp9212
Copy link
Author

Actually I am refering to another paper,
https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1268.pdf
Here he uses sample size 40, examines non clustered version.
There, Figure 5 seems to be contradicting my simulation.
Please see my comments above, if interested.
Many thanks.

@s3alfisc
Copy link
Member

Thanks, I was not really aware of this paper. Really cool! One other potential reason for differences: MacKinnon uses equal-tailed pvalues, while fwildclusterboot uses two-tailed pvalues by default. You can change that with the p_val_type function argument.

@s3alfisc
Copy link
Member

Beyond that, I am currently running your code with N = 100 as I got a little curious =) I'll look at the MacKinnon figures in more detail tomorrow, I think it is not super straightforward to see if your numbers agree with the paper or not.

@shpp9212
Copy link
Author

Seems like wildboottest imposes two-tailed pvalues right now. It seems like equal-tailed pvalues also produce a figure similar to two-tailed pvalues, which is mysterious to me. Perhaps I need to read the papers you suggested. Thank you!

@s3alfisc
Copy link
Member

From the MacKinnon paper:

"The remaining figures deal with wild bootstrap tests. Experiments were performed
for 12 variants of the wild bootstrap. There are three transformations of the residuals
(denoted by w1, w2, or w3, because they are equivalent to HC1, HC2, or HC3), two
types of residuals (restricted and unrestricted, denoted by r or u), and two ways of
generating the v∗
j (F1 or F2, denoted by 1 or 2)."

So I think that MacKinnon's simulation do the same thing as the Flachaire paper - they impose the HC corrections on the bootstrap dgp and use the same correction when computing the vcov. I.e. for the w2r2, he imposes the HC2 correction on the bootstrap dgp and computes variance covariance matrices as HC2. fwildclusterboot and wildboottest instead compute the vcov's always as HC1.

So in other words, you are not exactly replicating MacKinnon's paper with the '21' bootstrap type, as he in fact implements a '22' bootstrap type. Adding the '22' to both R or Python is more or less straightforward, and I'd be more than happy about a PR =)

Practically, what you would have to do is to add a 'type' argument to the get_tstat() method:

def get_tstat(self, type == "1"):
    
    if type == "1": 
        cov = self.small_sample_correction * self.RXXinvX_2 @ np.power(self.uhat, 2)
        self.t_stat = (np.transpose(self.R) @ self.beta_hat - self.r) / np.sqrt(cov)
    elif type == "2": 
        resid_multiplier = ...
        cov = self.small_sample_correction * self.RXXinvX_2 @ np.power(resid_multiplier * self.uhat, 2)

and the same in

For resid multipliers, you can take a look at this part of the code.

Similar changes would have to be implemented for the get_tboot method.

Plus some upstream changes to allow inference of types "22" and "33" =)

Would you be interested in implementing this? If yes, I could write a more detailed PR for either the Python or R package =)

@shpp9212
Copy link
Author

Hi, Thank you for your kind help, I will definitely give it a try. However, I actually think what the paper intended is indeed "21" and "31" in the figure 5 I was concerned with, because in the next paragraph in page 15, it says:
"There are five different HCCMEs and 12 different bootstrap DGPs. Thus each experiment produces 60 sets of rejection frequencies."
and
"Figures 4 and 5 present results for HC3 and HC1, respectively, combined with eight different bootstrap DGPs for n = 40."
I should be fine with the kind comments you already provided. I will try more variants and let you know if I find out anything interesting.
Thanks!

@shpp9212
Copy link
Author

shpp9212 commented Apr 23, 2023

Hi,
I experimented with a very naive implementation, and got a pretty much similar picture as before.
I implemented the following five wild bootstrap procedures and again "w1r2, HC1" was the closest to 0.05 nominal line.
"w1r2, HC1", "w2r2, HC1", "w3r2, HC1", "w2r2, HC2", "w3r2, HC3"
Sorry I am not allowed to upload the picture from my current environment. In w2 or w3, the hat matrix is computed with the full X matrix, right?
Thank you.

library(lmtest)
library(sandwich)
library(tictoc)


total=10000
thres=0.05
N=40
beta5=0
B=399

freq_11=rep(0,6)
freq_21=rep(0,6)
freq_31=rep(0,6)
freq_22=rep(0,6)
freq_33=rep(0,6)


dgp1 <- function(){
  return(pred+sqrt(N/(N-5))*fm_rest$residuals*dqrng::dqsample(x = c(-1L, 1L),size = N,replace = TRUE))
}
dgp2 <- function(){
  return(pred+fm_rest$residuals/sqrt(1-h)*dqrng::dqsample(x = c(-1L, 1L),size = N,replace = TRUE))
}
dgp3 <- function(){
  return(pred+fm_rest$residuals/(1-h)*dqrng::dqsample(x = c(-1L, 1L),size = N,replace = TRUE))
}

j=1
for(gamma in seq(0,2,0.4)){
  print(gamma)
  
  tic()
  count_11=0
  count_21=0
  count_31=0
  count_22=0
  count_33=0
  for(i in 1:total){
    
    x2=rnorm(N)
    x3=rnorm(N)
    x4=rnorm(N)
    x5=rnorm(N)
    
    x2=exp(x2)
    x3=exp(x3)
    x4=exp(x4)
    x5=exp(x5)
    
    
    yhat=1+x2+x3+x4+beta5*x5
    
    sigma=yhat**(2*gamma)
    sigma=sqrt(sigma/mean(sigma))
    
    y=yhat+sigma*rnorm(N)
    
    fm<-lm(y~x2+x3+x4+x5)
    
    h=as.vector(hatvalues(fm))
    
    t_HC1<-coeftest(fm, df = Inf, vcov = vcovHC, type = "HC1")[15]
    t_HC2<-coeftest(fm, df = Inf, vcov = vcovHC, type = "HC2")[15]
    t_HC3<-coeftest(fm, df = Inf, vcov = vcovHC, type = "HC3")[15]
    
    fm_rest<-lm(y~x2+x3+x4)
    
    pred=predict(fm_rest,data.frame(x2,x3,x4))
    
    boot_t_11=rep(0,B)
    boot_t_21=rep(0,B)
    boot_t_31=rep(0,B)
    boot_t_22=rep(0,B)
    boot_t_33=rep(0,B)
    
    k=1
    for(i in 1:B){
      boot_t_11[k]=coeftest(lm(dgp1()~x2+x3+x4+x5), df = Inf, vcov = vcovHC, type = "HC1")[15]
      boot_t_21[k]=coeftest(lm(dgp2()~x2+x3+x4+x5), df = Inf, vcov = vcovHC, type = "HC1")[15]
      boot_t_31[k]=coeftest(lm(dgp3()~x2+x3+x4+x5), df = Inf, vcov = vcovHC, type = "HC1")[15]
      boot_t_22[k]=coeftest(lm(dgp2()~x2+x3+x4+x5), df = Inf, vcov = vcovHC, type = "HC2")[15]
      boot_t_33[k]=coeftest(lm(dgp3()~x2+x3+x4+x5), df = Inf, vcov = vcovHC, type = "HC3")[15]
      k=k+1
    }
    
    
    boot_11_p_val<-2*min(sum(boot_t_11<=t_HC1)/B,sum(boot_t_11>t_HC1)/B)
    boot_21_p_val<-2*min(sum(boot_t_21<=t_HC1)/B,sum(boot_t_21>t_HC1)/B)
    boot_31_p_val<-2*min(sum(boot_t_31<=t_HC1)/B,sum(boot_t_31>t_HC1)/B)
    boot_22_p_val<-2*min(sum(boot_t_22<=t_HC2)/B,sum(boot_t_22>t_HC2)/B)
    boot_33_p_val<-2*min(sum(boot_t_33<=t_HC3)/B,sum(boot_t_33>t_HC3)/B)
    
    if(boot_11_p_val<thres)
    {count_11<-count_11+1}
    
    if(boot_21_p_val<thres)
    {count_21<-count_21+1}
    
    if(boot_31_p_val<thres)
    {count_31<-count_31+1}
    
    if(boot_22_p_val<thres)
    {count_22<-count_22+1}
    
    if(boot_33_p_val<thres)
    {count_33<-count_33+1}
    
  }
  
  freq_11[j]=count_11/total
  freq_21[j]=count_21/total
  freq_31[j]=count_31/total
  freq_22[j]=count_22/total
  freq_33[j]=count_33/total
  j=j+1
  toc() # takes several hours...
}


gamma=seq(0,2,0.4)
plot(gamma,freq_11,col="black",ylim=c(0,0.2))
points(gamma,freq_21,col="purple")
points(gamma,freq_31,col="red")
points(gamma,freq_22,col="blue")
points(gamma,freq_33,col="orange")
legend(x="topleft", legend=c("w1r2, HC1","w2r2, HC1","w3r2, HC1","w2r2, HC2","w3r2, HC3"),lty=rep(1,5),
       col=c("black", "purple","red","blue","orange"))

@s3alfisc
Copy link
Member

s3alfisc commented Apr 24, 2023

Hi, just as a quick catch up - I have been traveling in the last days and will try to take a closer look tomorrow. But at a first glance, it looks right to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants