Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in mboot when clustering at two variables #175

Open
cmjoyce opened this issue May 18, 2023 · 10 comments
Open

Error in mboot when clustering at two variables #175

cmjoyce opened this issue May 18, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@cmjoyce
Copy link

cmjoyce commented May 18, 2023

Hi there,

I'm using the did package and need to account for clustering at the district level, which is different from my idname (individuals residing in these clusters). Based on the existing documentation, I've accounted for individual and district level clustering. The code and error message are as follows:

att_gt(yname = "outcome",
       tname = "year",
       gname = "g",
       idname = "id",
       xformla = ~ 1,
       data = df,
       panel = FALSE,
       weightsname = "weight_adj",
       clustervars = c("id", "dist_id"),
       control_group = "notyettreated",
       print_details = TRUE,
       bstrap=TRUE, cband=FALSE
)

Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) : 
  can't handle that many cluster variables

I've tried making a vector of these variables and using that as my clustervars, but that just errors out.

Is there a way to get around this error and account for both clustering variables?

Thanks very much,
Caroline

@pedrohcgs
Copy link
Collaborator

pedrohcgs commented May 18, 2023 via email

@cmjoyce
Copy link
Author

cmjoyce commented May 18, 2023

Hi thanks for the quick response!

There's no error when I use just dist_id as my clustering variable, though I end up with some very large (confusingly so) standard errors for some treatment groups-- especially if including individual-level covariates. But if clustering only on district gives correctly calculated standard errors I will assume the issue is on my end.

Caroline

@bcallaway11
Copy link
Owner

@cmjoyce, sorry for the delayed response. I am surprised that you got an error with the first version that you sent. I am marking that as a bug as I think it should work.

That being said, by default, we already cluster at the unit level (in your case "id"), so clustering on both ends up being redundant. This is not a fix for the large standard errors, but they are the ones that I think you were trying to get from the beginning.

@bcallaway11 bcallaway11 added the bug Something isn't working label Aug 29, 2023
@cmjoyce
Copy link
Author

cmjoyce commented Aug 30, 2023

Yes, I think my clustering on two variables was redundant -- I tweaked some things and got it working. I limited my clustering to one variable to avoid the error message.
Thanks for the awesome package!

@bcallaway11
Copy link
Owner

Ok, great!

Note to self: I am going to leave this open as I think this could be confusing for users. Need to think about what behavior should be if user provides includes "id" among the clustering variables.

@kdjiffa
Copy link

kdjiffa commented Apr 3, 2024

Hi,
I am having a similar issue related to this post. I have balanced panel data where I want to cluster at group and time level. I am using the individual id variable in clustervars instead of the group variable as per the documentation. I have 3 time periods (years), 3,000 observations per period and 1,000 per group which amounts to 9,000 observations in total. Below is my code and error

csdid_out <- att_gt(yname = "Y2it",
tname = "year",
gname = "first.treat",
idname = "id",
est_method = "reg",
data = data,
panel = TRUE,
clustervars = c("id", "year"),
control_group = "notyettreated",
bstrap = TRUE,
cband = FALSE,
)
Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) :
can't handle time-varying cluster variables

I will appreciate any help on this.

@pedrohcgs
Copy link
Collaborator

pedrohcgs commented Apr 3, 2024 via email

@kdjiffa
Copy link

kdjiffa commented Apr 3, 2024

Thanks for your quick feedback. In fact, what I meant is group*period (intersection) level clustering. What is the best way to cluster at such level?
Thanks

@pedrohcgs
Copy link
Collaborator

pedrohcgs commented Apr 3, 2024 via email

@kdjiffa
Copy link

kdjiffa commented Apr 4, 2024

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants