Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixest:::vcov_hetero_internal resets small sample adj to N / (N-1) #518

Closed
s3alfisc opened this issue Jul 14, 2024 · 3 comments
Closed

fixest:::vcov_hetero_internal resets small sample adj to N / (N-1) #518

s3alfisc opened this issue Jul 14, 2024 · 3 comments

Comments

@s3alfisc
Copy link

s3alfisc commented Jul 14, 2024

Hi Laurent (@lrberge),

On my quest for identical standard errrors between fixest and pyfixest, I have consistently had difficulties to match vcov matrices for heteroskedastic errors. The main difference seems to arise because of different small sample corrections.

Contrary to what is described in the vignette on standard errors, the heteroskedastic vcov seems to get a small sample correction of N / N-1 if adj = TRUE.

Is this documented anywhere? If not, where could I best add documentation via a PR?

Note that in the fixest:::vcov_hetero_internal function, the logic seems to be based on cluster.adj flag, which I assume is internally always set to FALSE for heteroskedastic errors as the N / (N-1) correction does not seem to be applied when setting cluster.adj = FALSE via the feols() API, while it has an impact when adjust ssc=ssc(adj = FALSE).

Example

library(fwildclusterboot)
library(fixest)
data(voters)
head(voters)

voters = na.omit(voters)

fit = feols(
  proposition_vote ~ treatment,
  weights = ~weights, 
  data = voters,
  vcov = "hetero",
  ssc = ssc(adj = FALSE)
)
N = fit$nobs
k = length(fit$coefficients)

scores = fit$scores

X = model.matrix(fit)
Y = voters$proposition_vote

if(is.null(fit$weights)){
  weights = rep(1, length(Y))
} else {
  weights = voters$weights
}

W = diag(weights)
uhat = resid(fit)
sig_i = uhat / sqrt(weights)

all.equal(sqrt(weights) * X, sqrt(W) %*% X)
X = sqrt(weights) * X
Y = sqrt(weights) * Y

tXXinv = solve(crossprod(X))
all.equal(X * uhat, diag(uhat) %*% X )
Xu = X * uhat * sqrt(weights)
Omega = crossprod(Xu)
sigma_hand = tXXinv %*% Omega %*% tXXinv
sigma_hand * N / (N-1)
# (Intercept)     treatment
# (Intercept)  0.000948515 -0.0009485150
# treatment   -0.000948515  0.0009999048
vcov(fit)
# (Intercept)     treatment
# (Intercept)  0.000948515 -0.0009485150
# treatment   -0.000948515  0.0009999048

fixest:::vcov_hetero_internal

function (bread, scores, sandwich, ssc, nthreads, ...) 
{
    if (!sandwich) {
        vcov_noAdj = cpppar_crossprod(scores, 1, nthreads)
    }
    else {
        n = nrow(scores)
        adj = ifelse(ssc$cluster.adj, n/(n - 1), 1)
        vcov_noAdj = cpppar_crossprod(cpppar_matprod(scores, 
            bread, nthreads), 1, nthreads) * adj
    }
    vcov_noAdj

}

@s3alfisc s3alfisc changed the title fixest:::vcov_hetero_internal resets small sample adj if cluster.adj = FALSE fixest:::vcov_hetero_internal resets small sample adj to N / (N-1) Jul 14, 2024
@s3alfisc
Copy link
Author

s3alfisc commented Jul 14, 2024

Relatedly: with hetero errors, changing the cluster.adj from TRUE to FALSE leads to a change in standard errors, which I believe should not be the case? Likely related to this line adj = ifelse(ssc$cluster.adj, n/(n - 1), 1) in fixest:::vcov_hetero_internal?

fit1 = feols(
  proposition_vote ~ treatment,
  weights = ~weights, 
  data = voters,
  vcov = "hetero",
  ssc = ssc(adj = TRUE, cluster.adj = TRUE)
)

fit2 = feols(
  proposition_vote ~ treatment,
  weights = ~weights, 
  data = voters,
  vcov = "hetero",
  ssc = ssc(adj = TRUE, cluster.adj = FALSE)
)

etable(fit1, fit2)
# Dependent Var.:   proposition_vote   proposition_vote
# 
# Constant        0.8746*** (0.0308) 0.8746*** (0.0308)
# treatment       0.1131*** (0.0317) 0.1131*** (0.0316)
# _______________ __________________ __________________
# S.E. type       Heteroskedas.-rob. Heteroskedas.-rob.
# Observations                   300                300
# R2                         0.04762            0.04762
# Adj. R2                    0.04443            0.04443
# ---
#   Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

se(fit1)
# > se(fit1)
# (Intercept)   treatment 
# 0.03084960  0.03167428 
se(fit2)
# > se(fit2)
# (Intercept)   treatment 
# 0.03079814  0.03162145 

We also have that

vcov(fit1) / vcov(fit2)
N = nobs(fit1)
N / (N-1)

@s3alfisc
Copy link
Author

Here are the dof corrections in comparison to sandwich:

library(fwildclusterboot)
library(fixest)
data(voters)
head(voters)

voters = na.omit(voters)

fit = feols(
  proposition_vote ~ treatment + income,
  #weights = ~weights, 
  data = voters,
  vcov = "hetero",
  ssc = ssc(adj = FALSE)
)
N = nobs(fit)
k = length(coef(fit))

adj1 = N / (N-1)
adj2 = (N-1) / (N-k)
adj3 = N / (N-k)

sandwich::vcovHC(fit, type = "HC1") / sandwich::vcovHC(fit, type = "HC0")
# [1] 1.010101
adj1
# [1] 1.003344
adj2
# [1] 1.006734
N / (N-k)
# [1] 1.010101

(vcov(fit, vcov = "hetero", ssc = ssc(adj = FALSE, cluster.adj = FALSE))   ) / sandwich::vcovHC(fit, type = "HC0")
# FALSE, FALSE -> no adjustment
(vcov(fit, vcov = "hetero", ssc = ssc(adj = TRUE, cluster.adj = FALSE))   ) / sandwich::vcovHC(fit, type = "HC0")
adj2
(vcov(fit, vcov = "hetero", ssc = ssc(adj = FALSE, cluster.adj = TRUE))   ) / sandwich::vcovHC(fit, type = "HC0")
adj1

(vcov(fit, vcov = "hetero", ssc = ssc(adj = TRUE, cluster.adj = TRUE))   ) / sandwich::vcovHC(fit, type = "HC0")
adj3

@s3alfisc
Copy link
Author

I went back to our mail exchange from a very long time ago and it turns out that you've already answered my question @lrberge - I just did not get it properly 🤦

I went back to my mail exchange with Laurent & it turns out he already answered my question - I just misunderstood him 😅 But know I think I've got it!

My misunderstanding was that the cluster.adj argument was only relevant when clustered errors are computed. But this not the case! In fact, cluster.adj multiplies even heteroskedastic errors with a factor of G / (G-1) if it is TRUE. Because all clusters are singletons, we have that G / (G-1) = N / (N-1).

Therefore, we have the follwing ssc correction:

    adj = TRUE & cluster.adj = TRUE: multiply with (N-1) / (N-k) x N / (N-1)  = N / (N-k)
    adj = TRUE & cluster.adj = FALSE:   (N-1) / (N-k)
    adj = FALSE & cluster.adj = FALSE: 1
    adj = FALSE & cluster.adj = TRUE: N / (N-1)

And this is also what fixest implements =) Sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant