Error in mboot when clustering at two variables #175

cmjoyce · 2023-05-18T20:49:04Z

Hi there,

I'm using the did package and need to account for clustering at the district level, which is different from my idname (individuals residing in these clusters). Based on the existing documentation, I've accounted for individual and district level clustering. The code and error message are as follows:

att_gt(yname = "outcome",
       tname = "year",
       gname = "g",
       idname = "id",
       xformla = ~ 1,
       data = df,
       panel = FALSE,
       weightsname = "weight_adj",
       clustervars = c("id", "dist_id"),
       control_group = "notyettreated",
       print_details = TRUE,
       bstrap=TRUE, cband=FALSE
)

Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) : 
  can't handle that many cluster variables

I've tried making a vector of these variables and using that as my clustervars, but that just errors out.

Is there a way to get around this error and account for both clustering variables?

Thanks very much,
Caroline

pedrohcgs · 2023-05-18T20:52:42Z

Hi Caroline, Does the error remains when you just use dist_id as cluster variable? Thanks

On Thu, May 18, 2023 at 15:49 cmjoyce ***@***.***> wrote: Hi there, I'm using the did package and need to account for clustering at the district level, which is different from my idname (individuals residing in these clusters). Based on the existing documentation, I've accounted for individual and district level clustering. The code and error message are as follows: att_gt(yname = "outcome", tname = "year", gname = "g", idname = "id", xformla = ~ 1, data = df, panel = FALSE, weightsname = "weight_adj", clustervars = c("id", "dist_id"), control_group = "notyettreated", print_details = TRUE, bstrap=TRUE, cband=FALSE ) Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) : can't handle that many cluster variables I've tried making a vector of these variables and using that as my clustervars, but that just errors out. Is there a way to get around this error and account for both clustering variables? Thanks very much, Caroline — Reply to this email directly, view it on GitHub <#175>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABE7344GREUVLUUC4OMMRVTXG2DMXANCNFSM6AAAAAAYG6EXVM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- ---------------------------------------------------- *Pedro H. C. Sant'Anna* *https://psantanna.com <https://psantanna.com>*

cmjoyce · 2023-05-18T20:59:44Z

Hi thanks for the quick response!

There's no error when I use just dist_id as my clustering variable, though I end up with some very large (confusingly so) standard errors for some treatment groups-- especially if including individual-level covariates. But if clustering only on district gives correctly calculated standard errors I will assume the issue is on my end.

Caroline

bcallaway11 · 2023-08-29T22:39:06Z

@cmjoyce, sorry for the delayed response. I am surprised that you got an error with the first version that you sent. I am marking that as a bug as I think it should work.

That being said, by default, we already cluster at the unit level (in your case "id"), so clustering on both ends up being redundant. This is not a fix for the large standard errors, but they are the ones that I think you were trying to get from the beginning.

cmjoyce · 2023-08-30T19:31:32Z

Yes, I think my clustering on two variables was redundant -- I tweaked some things and got it working. I limited my clustering to one variable to avoid the error message.
Thanks for the awesome package!

bcallaway11 · 2023-09-14T12:46:14Z

Ok, great!

Note to self: I am going to leave this open as I think this could be confusing for users. Need to think about what behavior should be if user provides includes "id" among the clustering variables.

kdjiffa · 2024-04-03T20:37:20Z

Hi,
I am having a similar issue related to this post. I have balanced panel data where I want to cluster at group and time level. I am using the individual id variable in clustervars instead of the group variable as per the documentation. I have 3 time periods (years), 3,000 observations per period and 1,000 per group which amounts to 9,000 observations in total. Below is my code and error

csdid_out <- att_gt(yname = "Y2it",
tname = "year",
gname = "first.treat",
idname = "id",
est_method = "reg",
data = data,
panel = TRUE,
clustervars = c("id", "year"),
control_group = "notyettreated",
bstrap = TRUE,
cband = FALSE,
)
Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) :
can't handle time-varying cluster variables

I will appreciate any help on this.

pedrohcgs · 2024-04-03T22:19:18Z

Time should not be used as cluster in a DiD procedure with with fixed T. You cant make inference with 3 observations…

…

---------------------------------------------------- *Pedro H. C. Sant'Anna* *https://psantanna.com <https://psantanna.com>*

---------------------------------------------------- Warning: This email may contain confidential or privileged information intended only for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please understand that any disclosure, copying, distribution, or use of the contents of this email is strictly prohibited. ----------------------------------------------------

On Wed, Apr 3, 2024 at 16:37 kdjiffa ***@***.***> wrote: Hi, I am having a similar issue related to this post. I have balanced panel data where I want to cluster at group and time level. I am using the individual id variable in clustervars instead of the group variable as per the documentation. I have 3 time periods (years), 3,000 observations per period and 1,000 per group which amounts to 9,000 observations in total. Below is my code and error csdid_out <- att_gt(yname = "Y2it", tname = "year", gname = "first.treat", idname = "id", est_method = "reg", data = data, panel = TRUE, clustervars = c("id", "year"), control_group = "notyettreated", bstrap = TRUE, cband = FALSE, ) Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) : can't handle time-varying cluster variables I will appreciate any help on this. — Reply to this email directly, view it on GitHub <#175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABE7344G7JILX2DOCCOSHG3Y3RSBNAVCNFSM6AAAAAAYG6EXVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGUZTMMZQHE> . You are receiving this because you commented.Message ID: ***@***.***>

kdjiffa · 2024-04-03T23:33:52Z

Thanks for your quick feedback. In fact, what I meant is group*period (intersection) level clustering. What is the best way to cluster at such level?
Thanks

pedrohcgs · 2024-04-03T23:35:38Z

Just use the id. Thanks

…

---------------------------------------------------- *Pedro H. C. Sant'Anna* *https://psantanna.com <https://psantanna.com>*

---------------------------------------------------- Warning: This email may contain confidential or privileged information intended only for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please understand that any disclosure, copying, distribution, or use of the contents of this email is strictly prohibited. ----------------------------------------------------

On Wed, Apr 3, 2024 at 19:34 kdjiffa ***@***.***> wrote: Thanks for your quick feedback. In fact, what I meant is group*period (intersection) level clustering. What is the best way to cluster at such level? Thanks — Reply to this email directly, view it on GitHub <#175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABE7344DTJ3RXYXW2I5LOVDY3SGXLAVCNFSM6AAAAAAYG6EXVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHAYDONZRGA> . You are receiving this because you commented.Message ID: ***@***.***>

kdjiffa · 2024-04-04T00:01:34Z

Thanks

bcallaway11 added the bug Something isn't working label Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in mboot when clustering at two variables #175

Error in mboot when clustering at two variables #175

cmjoyce commented May 18, 2023

pedrohcgs commented May 18, 2023 via email

cmjoyce commented May 18, 2023

bcallaway11 commented Aug 29, 2023

cmjoyce commented Aug 30, 2023

bcallaway11 commented Sep 14, 2023

kdjiffa commented Apr 3, 2024

pedrohcgs commented Apr 3, 2024 via email

kdjiffa commented Apr 3, 2024

pedrohcgs commented Apr 3, 2024 via email

kdjiffa commented Apr 4, 2024

Error in mboot when clustering at two variables #175

Error in mboot when clustering at two variables #175

Comments

cmjoyce commented May 18, 2023

pedrohcgs commented May 18, 2023 via email

cmjoyce commented May 18, 2023

bcallaway11 commented Aug 29, 2023

cmjoyce commented Aug 30, 2023

bcallaway11 commented Sep 14, 2023

kdjiffa commented Apr 3, 2024

pedrohcgs commented Apr 3, 2024 via email

kdjiffa commented Apr 3, 2024

pedrohcgs commented Apr 3, 2024 via email

kdjiffa commented Apr 4, 2024