Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in terms.formula(tmp, simplify = TRUE) : invalid model formula #26

Closed
ccmullally opened this issue Dec 27, 2021 · 5 comments
Closed

Comments

@ccmullally
Copy link

I'm receiving the above error when trying to run boottest using a feols model. I don't have same issue with lm. I'm probably doing something wrong, but just in case this is a bug, I wanted to let you know.


library(dplyr)
library(fwildclusterboot)
library(fixest)
# 
data(mtcars)
feols_fit <- feols(disp ~  am + hp, weights = ~qsec, cluster = ~cyl, data = mtcars)

boot_feols <- boottest(feols_fit, clustid = "cyl", param = "am", B = 499)

lm_fit <- lm(disp ~  am + hp, mtcars, weights = qsec)

boot_lm <- boottest(lm_fit, clustid = "cyl", param = "am", B = 499)
@s3alfisc
Copy link
Owner

Hi, thanks - boottest() currently only works if the weights argument is passed to lm() and feols() as a vector. I was not aware of this (fixest has a lot of different options to specify models, which makes my life rather complicated), so thanks for letting me know about it! I'll try to fix this over the next days.

Until then, a workaround for you is to run

library(dplyr)
library(fwildclusterboot)
library(fixest)

# 
data(mtcars)

feols_fit <- feols(disp ~ am + hp, weights = mtcars$qsec, cluster = "cyl", data = mtcars)
boot_feols <- boottest(feols_fit, clustid = "cyl", param = "am", B = 499, seed = 986)

lm_fit <- lm(disp ~  am + hp, mtcars, weights = qsec)
boot_lm <- boottest(lm_fit, clustid = "cyl", param = "am", B = 499, seed = 986)

generics::tidy(boot_feols)
generics::tidy(boot_lm)
# > generics::tidy(boot_feols)
# term  estimate statistic p.value  conf.low conf.high
# 1 1*am = 0 -100.2779  -2.69509       0 -166.8028 -48.54865
# > generics::tidy(boot_lm)
# term  estimate statistic p.value  conf.low conf.high
# 1 1*am = 0 -100.2779  -2.69509       0 -166.8028 -48.54865

@s3alfisc
Copy link
Owner

s3alfisc commented Dec 28, 2021

Btw as you are new to R and use WLS - one difference between R and Stata is that R does not differentiate between frequency and sampling/probability weights. In R's lm() function as well as in feols(), it is my understanding that weights are always treated as sampling/probability weights. This does not matter for point estimation, but for inference via small sample corrections - with frequency weights, the number of rows of the data M no longer corresponds to the number of observations N, and in consequence, R implements an incorrect small sample adjustment m = (M-k) / (M-1). This does not matter for wild bootstrap inference, as the p-value is calculated as mean(abs(m x t_stat) <= abs(m x t_boot)) - both t_stat and t_boot are multiplied with the same factor.

@ccmullally
Copy link
Author

Very interesting, thanks. I will add this to my problem set so the students know.

s3alfisc added a commit that referenced this issue Dec 29, 2021
…xest cluster arguments to be a formula, column vector or character; aligned updates to pre-processing for lm and felm. Note: if column vector are fed in, they need to be specified in relation to the input data.frame, e.g. as data$weights.
@s3alfisc
Copy link
Owner

With commit 874facb, boottest() should no longer fail when the weights argument in feols() is specified as a formula, or when the cluster argument in feols is specified as a column vector (with the caveat that the column vector needs to be specified in relation to the input data, e.g. feols(fml, data, cluster = data$cluster, weights = data$weights), I will add this option in the next days).

@ccmullally
Copy link
Author

ccmullally commented Dec 29, 2021 via email

s3alfisc added a commit that referenced this issue Dec 29, 2021
s3alfisc added a commit that referenced this issue Jan 17, 2022
…e specified as vectors but without reference to the input data set, hence not as data$weights. While this is legal for feols(), lm() and felm(), I want to make sure that the weights vector is part of the input data.frame - which is neccessary for reasons of data processing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants